Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Case-insensitive string comparison in Julia

Tags:

string

julia

I'm sure this has a simple answer, but how does one compare two string and ignore case in Julia? I've hacked together a rather inelegant solution:

function case_insensitive_match{S<:AbstractString}(a::S,b::S)
    lowercase(a) == lowercase(b)
end

There must be a better way!

like image 518
ahwillia Avatar asked Sep 08 '16 19:09

ahwillia


Video Answer


1 Answers

Efficiency Issues

The method that you have selected will indeed work well in most settings. If you are looking for something more efficient, you're not apt to find it. The reason is that capital vs. lowercase letters are stored with different bit encoding. Thus it isn't as if there is just some capitalization field of a character object that you can ignore when comparing characters in strings. Fortunately, the difference in bits between capital vs. lowercase is very small, and thus the conversions are simple and efficient. See this SO post for background on this:

How do uppercase and lowercase letters differ by only one bit?

Accuracy Issues

In most settings, the method that you have will work accurately. But, if you encounter characters such as capital vs. lowercase Greek letters, it could fail. For that, you would be better of with the normalize function (see docs for details) with the casefold option:

normalize("ad", casefold=true)

See this SO post in the context of Python which addresses the pertinent issues here and thus need not be repeated:

How do I do a case-insensitive string comparison?

Since it's talking about the underlying issues with utf encoding, it is applicable to Julia as well as Python.

See also this Julia Github discussion for additional background and specific examples of places where lowercase() can fail:

https://github.com/JuliaLang/julia/issues/7848

like image 61
Michael Ohlrogge Avatar answered Nov 15 '22 09:11

Michael Ohlrogge