An encoding-savvy grep replacement?

Question

I am frustrated that grep fails to find a word like "hello" in my UTF-16 documents.

Can anyone recommend a version of grep that attempts to guess the file encoding and then properly handle it?

popcnt · Accepted Answer

ack as perl-based grep replacement?

You'll definitely want to check out ack.

It supports Unicode encodings, and is basically grep, but better.

try a matching Unicode locale with grep

If you are under Linux, Unix, etc. you may want to change your LANG envariable to an encoding to match your documents.

Check your locale first. Here is what mine is set to by default on my MacBook Pro:

 $ locale 
 LANG="en_US.UTF-8"
 LC_COLLATE="en_US.UTF-8"
 LC_CTYPE="en_US.UTF-8"
 LC_MESSAGES="en_US.UTF-8"
 LC_MONETARY="en_US.UTF-8"
 LC_NUMERIC="en_US.UTF-8"
 LC_TIME="en_US.UTF-8" 
 LC_ALL=

say, under bash:

$ LANG="foo" grep 'gotta be found now' file.name

something a little more permanent (be careful with this):

$ export LANG="foo"
$ grep 'bar' mitz.vah

Mecki · Answer

Perl has a way better regex syntax than grep (much more powerful), it has UTF8 and UTF16 support, but I'm not sure how good it is at guessing the encoding... if you tell it which encoding to use, though, it can read these files without any issues and run regexes over them. You'll have to write yourself a tiny Perl program for that (your own micro-grep implementation in Perl so to say), but that isn't too hard. Perl exists for all major operating systems.

An encoding-savvy grep replacement?

Tags:

grep

character-encoding

fish

2 Answers

ack as perl-based grep replacement?

try a matching Unicode locale with grep

popcnt

Mecki

Recent Activity

Donate For Us

An encoding-savvy grep replacement?

Tags:

grep

character-encoding

fish

2 Answers

ack as perl-based grep replacement?

try a matching Unicode locale with grep

popcnt

Mecki

Related questions

Recent Activity

Donate For Us