Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Free program to grep unicode text files in Windows? [closed]

I have a collection of unicode text files (exported from regedit) and I'd like to pull out all the lines with a certain text on them.

I've tried Grep for Windows and findstr but both can't seem to handle the unicode encoding. My results are empty, but when I use the -v option (show non-matching lines), the output shows a NUL between each character.

Are there any free options to perform a simple grep on Unicode files in Windows?

like image 818
jacobsee Avatar asked Jul 28 '09 21:07

jacobsee


3 Answers

Well, while findstr can't handle Unicode files directly, type does and findstr actually handles Unicode input without problems.

So what you need to do would just be

type myfile.txt | findstr /c:"I'm searching for this"
> type uc-test.txt
Unicode test. äöüß
Another line
Something else
> findstr "Something" uc-test.txt

> findstr /v "Something" uc-test.txt
 ■U n i c o d e   t e s t .   õ ÷ ³ ▀
 A n o t h e r   l i n e
 S o m e t h i n g   e l s e
> type uc-test.txt | findstr "Another"
Another line
like image 190
Joey Avatar answered Oct 24 '22 10:10

Joey


Just ran across grepWin which works perfectly for what I want here. Wish I would have found it earlier!

like image 9
jacobsee Avatar answered Oct 24 '22 10:10

jacobsee


definitely go with cygwin (using x server) - the latest supports utf8. At my last gig, I was doing a lot of work with CJK characters. Using cygwin's x server, you can search on any characters and display any characters that you have a fixed width font for. Also check out od and xxd which makes it easy to enter your searches using hex characters eg: $ echo '?' | grep $(echo '3f' | xxd -p -r)

like image 3
andersonbd1 Avatar answered Oct 24 '22 11:10

andersonbd1