I have a collection of unicode text files (exported from regedit) and I'd like to pull out all the lines with a certain text on them.
I've tried Grep for Windows and findstr but both can't seem to handle the unicode encoding. My results are empty, but when I use the -v option (show non-matching lines), the output shows a NUL between each character.
Are there any free options to perform a simple grep on Unicode files in Windows?
Well, while findstr
can't handle Unicode files directly, type
does and findstr
actually handles Unicode input without problems.
So what you need to do would just be
type myfile.txt | findstr /c:"I'm searching for this"
> type uc-test.txt Unicode test. äöüß Another line Something else > findstr "Something" uc-test.txt > findstr /v "Something" uc-test.txt ■U n i c o d e t e s t . õ ÷ ³ ▀ A n o t h e r l i n e S o m e t h i n g e l s e > type uc-test.txt | findstr "Another" Another line
Just ran across grepWin which works perfectly for what I want here. Wish I would have found it earlier!
definitely go with cygwin (using x server) - the latest supports utf8. At my last gig, I was doing a lot of work with CJK characters. Using cygwin's x server, you can search on any characters and display any characters that you have a fixed width font for. Also check out od and xxd which makes it easy to enter your searches using hex characters eg: $ echo '?' | grep $(echo '3f' | xxd -p -r)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With