MINEPI algorithm - Example

This article will present the application of the MINEPI algorithm on a simple example.

Example

For readability, episodes like $<A, B>$ are written as $AB$.

Yesterday, you monitored 4 activities that you have done for 7 hours.

The 4 events are the following:

  • “M” : Listening to music
  • “E” : Eating
  • “S” : Sleeping
  • “C” : Coding

Your day looked like the following:

Minepi Example

With the MINEPI algorithm, try to identify the different episodes with the following constraints:

  • $max_{span} = 3$ means that the maximum duration of an episode is 3 hours (3 hours means potentially 4 events).
  • $min_{gap} = 1$ means that 2 events must be separated by at least 1 hour. Simultaneous events are not allowed.
  • $min_{freq} = 2$ means that an episode must be present at least 2 times in the dataset.

Solution

Episodes of size 1

The first step is to identify frequent episodes of size 1. In our example, we have 4 events, so we have 4 episodes of size 1.

The candidate episodes of size 1 are:

$C_1 = \{M, E, S, C\}$

  • $M$ is present 3 times: $id(M) = \{1, 3, 5\}$
  • $E$ is present 2 times: $id(E) = \{3, 7\}$
  • $S$ is present 1 time: $id(S) = \{4\}$
  • $C$ is present 3 times: $id(C) = \{2, 5, 6\}$

Therefore, the frequent episodes of size 1 are: $F_1 = \{M, E, C\}$

Episodes of size 2

Starting here, we don’t have to look anymore at the sequence, we can just compute each frequency by looking at the size 1 results.

The candidates are, all the combinations of $M$, $E$ and $C$.

$C_2 = \{MM, ME, MC, EM, EC, CM, CE, CC\}$

  • $MM$ is present 2 times: $id(MM) = \{(1, 3), (3, 5)\}$
  • $ME$ is present 2 times: $id(ME) = \{(1, 3), (5, 7)\}$ (not $(3, 3)$ because $min_{gap} = 1$)
  • $MC$ is present 3 times: $id(MC) = \{(1, 2), (3, 5), (5, 6)\}$
  • $EM$ is present 1 time: $id(EM) = \{(3, 5)\}$
  • $EC$ is present 1 time: $id(EC) = \{(3, 5)\}$
  • $CM$ is present 1 time: $id(CM) = \{(2, 3)\}$
  • $CE$ is present 2 times: $id(CE) = \{(2, 3), (6, 7)\}$
  • $CC$ is present 2 times: $id(CC) = \{(2, 5), (5, 6)\}$

The frequent episodes of size 2 are:

$F_2 = \{MM, ME, MC, CE, CC\}$

Episodes of size 3

The candidate episodes of size 3 are:

$C_3 = \{MME, MMC, MCE, MCC, CCE\}$

Note that if $ME$ was not in $F_2$, having $MC$ and $CE$ would not have been sufficient to have $MCE$ in $C_3$.

  • $MME$ is present 1 time: $id(MME) = \{(3,7)\}$
  • $MMC$ is present 1 time: $id(MMC) = \{(3,6)\}$
  • $MCE$ is present 2 times:$id(MCE) = \{(1,3), (5, 7)\}$
  • $MCC$ is present 1 time: $id(MCC) = \{(3, 6)\}$
  • $CCE$ is present 1 time: $id(CCE) = \{(5, 7)\}$

The last frequent episode of size 3 is:

$F_3 = \{MCE\}$

Episodes of size 4

They are no candidates for episodes of size 4.

Conclusion

The frequent episodes are:

$$\bigcup_{i=1}^3 F_i = \{M, E, C, MM, ME, MC, CE, CC, MCE\}$$

Source:

Mannila, Heikki et al. “Discovery of Frequent Episodes in Event Sequences.” Data Mining and Knowledge Discovery 1 (2004): 259-289.