| dc.contributor.author | Cormode Graham | |
| dc.contributor.author | Muthukrishnan S. | |
| dc.date.accessioned | 2018-02-05T14:33:52Z | |
| dc.date.available | 2018-02-05T14:33:52Z | |
| dc.date.issued | 2007 | |
| dc.identifier.uri | http://hdl.handle.net/123456789/7120 | |
| dc.description.abstract | The edit distance between two strings S and R is defined to be the minimum number of character inserts, deletes, and changes needed to convert R to S. Given a text string t of length n, and a pattern string p of length m, informally, the string edit distance matching problem is to compute the smallest edit distance between p and substrings of t. We relax the problem so that: (a) we allow an additional operation, namely, substring moves; and (b) we allow approximation of this string edit distance. Our result is a near-linear time deterministic algorithm to produce a factor of O(log n log * n) approximation to the string edit distance with moves. This is the first known significantly subquadratic algorithm for a string edit distance problem in which the distance involves nontrivial alignments. Our results are obtained by embedding strings into L 1 vector space using a simplified parsing technique, which we call edit-sensitive parsing (ESP). | |
| dc.format | application/pdf | |
| dc.language.iso | English | |
| dc.publisher | Association for Computing Machinery (ACM) | |
| dc.subject | Approximate pattern matching, data streams, edit distance, embedding, similarity search, string matching | |
| dc.title | The string edit distance matching problem with moves | |
| dc.type | journal-article | |
| dc.identifer.doi | 10.1145/1186810.1186812 | |
| dc.source.volume | 3 | |
| dc.source.issue | 1 |