Pattern mining under Simon’s congruence
Sungmin Kim and Yo-Sub Han.
Presented at DLT'25.
Presented at DLT'25.
Definitions
Simon’s congruence: For an integer $k$,
two strings $w_1$ and $w_2$ are $k$-congruent ($w_1\sim_k w_2$)
if they have the same set of subsequences
of at most $k$.
A substring of a text $\texttt{T}$ matches a pattern $\texttt{P}$
if it is $k$-congruent to $\texttt{P}$.
Complete pattern mining problem under Simon’s congruence:
compute patterns $\texttt{P}_k$
for each $k\le |\texttt{T}$
that maximizes the number of matches with respect to each $k$.
Summary of results
- A $O(|\Sigma||\texttt{T}|^2\log^2|\texttt{T}|)$ time algorithm for the complete pattern mining problem under Simon's congruence
- A tree data structure constructable in the same time bound that captures the maximum $k$ allowing congruence between pairs of substrings of a text
Keywords
Simon’s congruence, pattern mining, string algorithm, data structure, lowest common ancestor