I have a CSV database which contains names, addresses etc.
Now I will search in my Android App for something, let's say the address and then display the other records associated with it - name, phone...
The problem is, that in the CSV some of the entries are with missing characters, and there are white spaces instead - for example "G rmany Dresden" (with white space instead of "e")
Unfortunately, the database is frequently updated and I cannot correct it every time manually.
How can I match "Germany Dresden", "G rmany Dresden", "Germa y Dresden" etc when I search for "Germany"?
I suppose there has to be a limit of mismatched characters, so let's assume there are no more than two - at least I never saw more than that
To match any character except a list of excluded characters, put the excluded charaters between [^ and ] . The caret ^ must immediately follow the [ or else it stands for just itself. The character '. ' (period) is a metacharacter (it sometimes has a special meaning).
You can use this regex /^[ A-Za-z0-9_@./#&+-]*$/.
$ means "Match the end of the string" (the position after the last character in the string). Both are called anchors and ensure that the entire string is matched instead of just a substring.
[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9. (a-z0-9) -- Explicit capture of a-z0-9 .
The first thing that comes to mind is Levenshtein distance (you're looking for something with a distance of 1 from Germany, aside from insertions and deletions). You can't do it directly with a regex, but you could generate the regex programmatically.
There's another answer here that might be of use: Levenshtein distance in regular expression
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With