Simon’s congruence pattern matching

Sungmin Kim, Sang-Ki Ko and Yo-Sub Han.
Presented at ISAAC'22.

Definitions

Simon’s congruence: For an integer $k$, two strings $w_1$ and $w_2$ are $k$-congruent ($w_1\sim_k w_2$) if they have the same set of subsequences of at most $k$.
Pattern matching under Simon’s congruence: find all substrings of a text $\texttt{T}$ that matches a pattern $\texttt{P}$ (i.e., $k$-congruent to $\texttt{P}$).

Summary of results

  • A $O(|\texttt{T}||\Sigma|(|\Sigma|^2+k))$ time algorithm for the pattern matching problem under Simon's congruence
  • Data structures that allow efficiently solving the pattern matching problem
  • Efficient solutions for the following variant problems:
    • finding longest / shortest matches of the text
    • finding the shortest subsequence of the text that is $k$-congruent to the pattern


Keywords Simon’s congruence, pattern matching, string algorithm, data structure

link to journal version