MINEPI algorithm

Example

For readability, episodes like $<A, B>$ are written as $AB$.

Yesterday, you monitored 4 activities that you have done for 7 hours.

The 4 events are the following:

Your day looked like the following:

Minepi Example

With the MINEPI algorithm, try to identify the different episodes with the following constraints:

$max_{span} = 3$ means that the maximum duration of an episode is 3 hours (3 hours means potentially 4 events).
$min_{gap} = 1$ means that 2 events must be separated by at least 1 hour. Simultaneous events are not allowed.
$min_{freq} = 2$ means that an episode must be present at least 2 times in the dataset.

The first step is to identify frequent episodes of size 1. In our example, we have 4 events, so we have 4 episodes of size 1.

The candidate episodes of size 1 are:

$C_1 = \{M, E, S, C\}$

Therefore, the frequent episodes of size 1 are: $F_1 = \{M, E, C\}$

Starting here, we don’t have to look anymore at the sequence, we can just compute each frequency by looking at the size 1 results.

The candidates are, all the combinations of $M$, $E$ and $C$.

$C_2 = \{MM, ME, MC, EM, EC, CM, CE, CC\}$

$MM$ is present 2 times: $id(MM) = \{(1, 3), (3, 5)\}$
$ME$ is present 2 times: $id(ME) = \{(1, 3), (5, 7)\}$ (not $(3, 3)$ because $min_{gap} = 1$)
$MC$ is present 3 times: $id(MC) = \{(1, 2), (3, 5), (5, 6)\}$
$EM$ is present 1 time: $id(EM) = \{(3, 5)\}$
$EC$ is present 1 time: $id(EC) = \{(3, 5)\}$
$CM$ is present 1 time: $id(CM) = \{(2, 3)\}$
$CE$ is present 2 times: $id(CE) = \{(2, 3), (6, 7)\}$
$CC$ is present 2 times: $id(CC) = \{(2, 5), (5, 6)\}$

The frequent episodes of size 2 are:

$F_2 = \{MM, ME, MC, CE, CC\}$

The candidate episodes of size 3 are:

$C_3 = \{MME, MMC, MCE, MCC, CCE\}$

Note that if $ME$ was not in $F_2$, having $MC$ and $CE$ would not have been sufficient to have $MCE$ in $C_3$.

$MME$ is present 1 time: $id(MME) = \{(3,7)\}$
$MMC$ is present 1 time: $id(MMC) = \{(3,6)\}$
$MCE$ is present 2 times:$id(MCE) = \{(1,3), (5, 7)\}$
$MCC$ is present 1 time: $id(MCC) = \{(3, 6)\}$
$CCE$ is present 1 time: $id(CCE) = \{(5, 7)\}$

The last frequent episode of size 3 is:

$F_3 = \{MCE\}$

They are no candidates for episodes of size 4.

The frequent episodes are:

$$\bigcup_{i=1}^3 F_i = \{M, E, C, MM, ME, MC, CE, CC, MCE\}$$

Mannila, Heikki et al. “Discovery of Frequent Episodes in Event Sequences.” Data Mining and Knowledge Discovery 1 (2004): 259-289.