Pattern mining under Simon’s congruence

Sungmin Kim and Yo-Sub Han.
Presented at DLT'25.

Definitions

Simon’s congruence: For an integer $k$, two strings $w_1$ and $w_2$ are $k$-congruent ($w_1\sim_k w_2$) if they have the same set of subsequences of at most $k$. A substring of a text $\texttt{T}$ matches a pattern $\texttt{P}$ if it is $k$-congruent to $\texttt{P}$.
Complete pattern mining problem under Simon’s congruence: compute patterns $\texttt{P}_k$ for each $k\le |\texttt{T}$ that maximizes the number of matches with respect to each $k$.

Summary of results

  • A $O(|\Sigma||\texttt{T}|^2\log^2|\texttt{T}|)$ time algorithm for the complete pattern mining problem under Simon's congruence
  • A tree data structure constructable in the same time bound that captures the maximum $k$ allowing congruence between pairs of substrings of a text


Keywords Simon’s congruence, pattern mining, string algorithm, data structure, lowest common ancestor