Simon’s congruence pattern matching

Sungmin Kim, Sang-Ki Ko and Yo-Sub Han.
Published in Theoretical Computer Science, 2024.

Definitions

Simon’s congruence: For an integer $k$, two strings $w_1$ and $w_2$ are $k$-congruent ($w_1\sim_k w_2$) if they have the same set of subsequences of at most $k$.
Pattern matching under Simon’s congruence: find all substrings of a text $\texttt{T}$ that matches a pattern $\texttt{P}$ (i.e., $k$-congruent to $\texttt{P}$).

Summary of results

  • A $O(|\texttt{T}||\Sigma|(|\Sigma|^2+k))$ time algorithm for the pattern matching problem under Simon's congruence
  • Data structures that allow efficiently solving the pattern matching problem
  • Efficient solutions for the following variant problems:
    • finding longest / shortest matches of the text
    • finding the shortest subsequence of the text that is $k$-congruent to the pattern
    • finding the most frequent string among the matches of the pattern
    • answering queries for the string with the $g^\text{th}$ largest multiplicity among the matches of the pattern


Keywords Simon’s congruence, state complexity, finite automata, shortlex normal forms

link to conference version