Some lines of a file do not seem to match \t in a regex. Would anyone have an idea why ?
Let's take the example file that you can download from http://download.geonames.org/export/dump/countryInfo.txt.
$ wget http://download.geonames.org/export/dump/countryInfo.txt
--2011-02-03 16:24:08-- http://download.geonames.org/export/dump/countryInfo.txt
Resolving download.geonames.org... 178.63.52.141
Connecting to download.geonames.org|178.63.52.141|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 31204 (30K) [text/plain]
Saving to: `countryInfo.txt'
100%[===================================================================================================================================================================================================>] 31,204 75.0K/s in 0.4s
2011-02-03 16:24:10 (75.0 KB/s) - `countryInfo.txt' saved [31204/31204]
$ cat countryInfo.txt | grep -E 'AD.AND'
AD AND 200 AN Andorra Andorra la Vella 468 84000 EU .ad EUR Euro 376 AD### ^(?:AD)*(\d{3})$ ca 3041565 ES,FR
sdalouche@samxps:/tmp$ cat countryInfo.txt | grep -E 'AD\tAND'
(no result)
output of vi :set list
AD^IAND^I200^IAN^IAndorra^IAndorra la Vella^I468^I84000^IEU^I.ad^IEUR^IEuro^I376^IAD###^I^(?:AD)*(\d{3})$^Ica^I3041565^IES,FR^I$
Use \t to match a tab character (ASCII 0x09), \r for carriage return (0x0D) and \n for line feed (0x0A).
Using regex \B-\B matches - between the word color - coded . Using \b-\b on the other hand matches the - in nine-digit and pass-key .
In regex, the uppercase metacharacter denotes the inverse of the lowercase counterpart, for example, \w for word character and \W for non-word character; \d for digit and \D or non-digit.
Example: The regex "aa\n" tries to match two consecutive "a"s at the end of a line, inclusive the newline character itself. Example: "a\+" matches "a+" and not a series of one or "a"s. ^ the caret is the anchor for the start of the string, or the negation symbol.
Try using the -P
option instead of -E
:
cat countryInfo.txt | grep -P 'AD\tAND'
This will use Perl style regular expressions, which will catch the \t
.
$ echo -e '-\t-' | grep -E '\t'
(no result)
$ echo -e '-\t-' | grep -P '\t'
- -
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With