-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Levenshtein substring matching #12
Comments
The test cases seem to be from here https://gist.github.com/tomstuart/9e4fd5cd96527debf7a685d0b5399635 :) From what I am seeing, there are other more sophisticated algorithms for fuzzy substring matching. |
Yes I stumbled upon the same test cases, however I didn't bother to read the Ruby implementation. I figure it would be better to implement it by slightly modifying your algorithm implementation. Thanks for the article, it looks like I'm using the method described there:
This is actually for a real world use case. I am attempting to match large human-typed lists of products with a smaller list of brand names. I figure I would contribute back to this project with this variant instead of forking. As for other more sophisticated algorithms, I haven't yet looked into anything like "filter-verification, hashing, Locality-sensitive hashing (LSH), Tries or other greedy and approximation algorithms". So far your slightly modified library seems to work great in my use case. |
Thanks for the explanation. Feel free to send a change with tests. I think you would need to refactor out the core logic to a helper function. Please also run the benchmarks and check if there is any regression for the function call overhead. |
It may become very useful to some to provide approximate substring matching. This reports the smallest edit distance of the
needle
in all possible substrings ofhaystack
. Here are some examples:All that is necessary is to initialize the row with all zeroes, run the main algorithm, and then return the smallest value in the row. I have developed some code and was wondering about your thoughts about including it (or some more optimized variant) in this package:
The text was updated successfully, but these errors were encountered: