Simon’s congruence pattern matching
Sungmin Kim, Sang-Ki Ko and Yo-Sub Han.
Published in Theoretical Computer Science, 2024.
Published in Theoretical Computer Science, 2024.
Definitions
Simon’s congruence: For an integer $k$,
two strings $w_1$ and $w_2$ are $k$-congruent ($w_1\sim_k w_2$)
if they have the same set of subsequences
of at most $k$.
Pattern matching under Simon’s congruence:
find all substrings of a text $\texttt{T}$ that matches a pattern $\texttt{P}$
(i.e., $k$-congruent to $\texttt{P}$).
Summary of results
- A $O(|\texttt{T}||\Sigma|(|\Sigma|^2+k))$ time algorithm for the pattern matching problem under Simon's congruence
- Data structures that allow efficiently solving the pattern matching problem
- Efficient solutions for the following variant problems:
- finding longest / shortest matches of the text
- finding the shortest subsequence of the text that is $k$-congruent to the pattern
- finding the most frequent string among the matches of the pattern
- answering queries for the string with the $g^\text{th}$ largest multiplicity among the matches of the pattern
Keywords
Simon’s congruence, state complexity, finite automata, shortlex normal forms