Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sort: string comparison failed Invalid or incomplete multibyte or wide character

I'm trying to use the following command on a text file:

$ sort <m.txt | uniq -c | sort -nr >m.dict 

However I get the following error message:

sort: string comparison failed: Invalid or incomplete multibyte or wide character
sort: Set LC_ALL='C' to work around the problem.
sort: The strings compared were ‘enwedig\r’ and ‘mwy\r’.

I'm using Cygwin on Windows 7 and was having trouble earlier editing m.txt to put each word within the file on a new line. Please see:

Using AWK to place each word in a text file on a new line

I'm not sure if I'm getting these errors due to this, or because m.txt contains characters from the Welsh alphabet (When I was working with Welsh text in Python, I was required t change the encoding to 'Latin-1').

I tried following the error message's advice and changing LC_ALL='C' however this has not helped. Can anyone elaborate on the errors I'm receiving and provide any advice on how I might go about trying to solve this problem.

UPDATE:

When trying dos2unix, errors were being displayed about invalid characters at certain lines. It turns out these were not Welsh characters, but other strange characters (arrows etc). I went through my text file removing these characters until I was able to use the dos2unix command without error. However, after using the dos2unix command all the text was concatenated (no spaces/newlines or anything, whereas it should have been so that each word in the file was on a seperate line) I then used unix2dos and the text file was back to normal. How can I each word on its own individual line and use the sort command without it giving me errors about '\r' characters?

like image 407
hjalpmig Avatar asked Mar 29 '16 18:03

hjalpmig


2 Answers

I know it's an old question, but just running the command export LC_ALL='C' does the trick as described by sort: Set LC_ALL='C' to work around the problem..

like image 76
Philip Rollins Avatar answered Nov 13 '22 12:11

Philip Rollins


Looks like a Windows line-ending related problem (\r\n versus \n). You can convert m.txt to Unix line-endings with

dos2unix m.txt

and then rerun your command.

like image 3
Jens Avatar answered Nov 13 '22 12:11

Jens