Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to efficiently search/replace on a large txt file?

I have a relatively large csv/text data file (33mb) that I need to do a global search and replace the delimiting character on. (The reason is that there doesn't seem to be a way to get SQLServer to escape/handle double quotes in the data during a table export, but that's another story...)

I successfully accomplished a Textmate search and replace on a smaller file, but it's choking on this larger file.

It seems like command line grep may be the answer, but I can't quite grasp the syntax, ala:

grep -rl OLDSTRING . | xargs perl -pi~ -e ‘s/OLDSTRING/NEWSTRING/’

So in my case I'm searching for the '^' (caret) character and replacing with '"' (double-quote).

grep -rl " grep_test.txt | xargs perl -pi~ -e 's/"/^'

That doesn't work and I'm assuming it has to do with the escaping of the doublequote or something, but I'm pretty lost. Help anyone?

(I suppose if anyone knows how to get SQLServer2005 to handle double quotes in a text column during export to csv, that'd really solve the core issue.)

like image 813
Robert Travis Pierce Avatar asked Nov 26 '25 18:11

Robert Travis Pierce


2 Answers

Your perl substitution seems to be wrong. Try:

grep -rl \" . | xargs perl -pi~ -e 's/\^/"/g'

Explanation:

grep : command to find matches
-r : to recursively search
-l : to print only the file names where match is found
\" : we need to escape " as its a shell meta char
. : do the search in current working dir
perl : used here to do the inplace replacement
-i~ : to do the replacement inplace and create a backup file with extension ~
-p : to print each line after replacement
-e : one line program
\^ : we need to escape caret as its a regex meta char to mean start anchor
like image 120
codaddict Avatar answered Nov 28 '25 16:11

codaddict


sed -i.bak 's/\^/"/g' mylargefile.csv

Update: you can also use Perl as rein has suggested

perl -i.bak -pe 's/\^/"/g' mylargefile.csv

But on big files, sed may run a bit faster than Perl, as my result shows on a 6million line file

$ tail -4 file
this is a line with ^
this is a line with ^
this is a line with ^

$ wc -l<file
6136650

$ time sed 's/\^/"/g' file  >/dev/null

real    0m14.210s
user    0m12.986s
sys     0m0.323s
$ time perl  -pe 's/\^/"/g' file >/dev/null

real    0m23.993s
user    0m22.608s
sys     0m0.630s
$ time sed 's/\^/"/g' file  >/dev/null

real    0m13.598s
user    0m12.680s
sys     0m0.362s

$ time perl  -pe 's/\^/"/g' file >/dev/null

real    0m23.690s
user    0m22.502s
sys     0m0.393s
like image 29
ghostdog74 Avatar answered Nov 28 '25 15:11

ghostdog74