Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C#: Issues using dictionary with languages other than english

Tags:

c#

dictionary

Ok, so i'm basically trying to load the contents of a .txt file that contains 1 word per line into a dictionary.

I had no problems doing so when the words in that file were in english, but changing the file to a language with accents, i started having problems.

Had to change the encoding while creating the stream reader, also the culture in the ToLower method while adding the word to the dictionary.

Basically i now have something similar to this:

if (!dict.ContainsKey(word.ToLower(culture)))
    dict.Add(word.ToLower(culture), true);

The problem is that words like "esta" and "está" are being considered the same. So, is there any way to set the ContainsKey method to a specific language or do we need to implement something in the lines of a comparable? Either way i'm kinda new to c# so i would apreciate an example please.

Another issue submerge with the new file... after like a hundred words it stops adding the rest of the file, leaving a word incomplete... but i cant see any special chars in that word to end the execution of the method, any ideas about this problem?

Many thanks.

EDIT: 1st Problem solved using Jon Skeet sugestion.

In regards of the 2nd problem: Ok, changed the file format to UTF8 and removed the encoding in the stream reader since it now recognizes the accents just right. Testing some stuff regarding the 2nd issue now.

2nd problem also solved, it was a bug on my part... the shame...

Thnks for the quick answers everyone, and especially Jon Skeet.

like image 462
brokencoding Avatar asked Jan 22 '23 19:01

brokencoding


2 Answers

I assume you're trying to get case insensitivity for the dictionary. Instead of calling ToLower, use the constructor of Dictionary which takes an equality comparer - and use StringComparer.Create(culture, true) to construct a suitable comparer.

I don't know what your second problem is about - we'd need more detail to diagnose it, including the code you're using, ideally.

EDIT: UTF-7 is almost certainly not the correct encoding. Don't just guess at the encoding; find out what it's really meant to be. Where did this text file come from? What can you open it successfully in?

I suspect that at least some of your problems are due to using UTF-7.

like image 170
Jon Skeet Avatar answered Jan 27 '23 03:01

Jon Skeet


The problem is with the enconding you are using when opening the file to read. Looks like you may be using ASCIIEncoding.

.NET handles strings internally as UTF-8, so this kind of issue would not happen internally.

like image 31
Oded Avatar answered Jan 27 '23 01:01

Oded