Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

replace characters in a big log file

How can I change very large log files under Windows from this:

   3334-444-(4)  anything   anything2
   4444-444-(4)  anything   anything2
   4744-454-(4)  anything   anything2
   48444 44-(4)  anything   anything2
   8444-444-(4)  anything   anything2
   4464-(444)-2  anything   anything2

to this:

33344444         anything   anything2
44444444         anything   anything2
47444544         anything   anything2
48444444         anything   anything2
84444444         anything   anything2
44644442         anything   anything2

Remove everything to position 18 in each line except digits and keep the position of a second column?

\\Edit: The problem is that from position 1 to 17 could be also space between digits. This is the logic that I suppose might work:
1. From pos. 1 to 17 replace '(', ')', '-' to ' ' [space]
2. From pos. 1 to 17 replace ' ' [space] to '' [nothing] and count changes
3. From pos. 1 to 17 add space after digits in accordance with each change from previous step

like image 653
Driver Avatar asked Jan 30 '26 20:01

Driver


1 Answers

Well, if you install cygwin, you can use the power of commandline-tools

$ sed 's/[-)(]//g' input
33344444  anything   anything2
44444444  anything   anything2
47444544  anything   anything2
48444444  anything   anything2
84444444  anything   anything2
44644442  anything   anything2

update

Sometimes it is easier to divide a complex task into smaller parts.

Assume the input looks like this (adding a ruler)

         1         2         3         4         5
12345678901234567890123456789012345678901234567890
   3334-444-(4)  anything   anything2
   4444-444-(4)  anything   anything2
   4744-454-(4)  anything   anything2
   48444 44-(4)  anything   anything2
   8444-444-(4)  anything   anything2
   4464-(444)-2  anything   anything2

Step 1 is to use cut to well, cut out the first 17 characters, remove the unwanted ones and store in tmp-file.

Step 2 is to cut characters 18 to end-of-line and store in tmp-file.

Step 3 is to combine the tmp-files into one file.

Something like this:

$ cut -c1-17 input | sed 's/[-)( ]*//g' > c1

$ cut -c18- input > c2

$ paste c1 c2

If this doesn't approve to your aesthetic senses, you can do everything in one go using awk. Put the following lines in a file called "col.awk" or choose a better namne if you fell like it:

{
  x = substr($0, 0, 17)
  y = substr($0, 18, length($0))
  gsub(/[-)( ]*/, "", x)
}
{ printf "%-18s%s\n", x, y }

then call it like this:

$ awk -f col.awk input

output (again with ruler):

         1         2         3         4         5
12345678901234567890123456789012345678901234567890
33344444         anything   anything2
44444444         anything   anything2
47444544         anything   anything2
48444444         anything   anything2
84444444         anything   anything2
44644442         anything   anything2

Note that cygwin likes everything to have unix-style line endings so you might need to convert your input from windows-style to unix-style. One tool that might help you here is dos2unix or fromdos (google is your friend here).

like image 102
Fredrik Pihl Avatar answered Feb 01 '26 12:02

Fredrik Pihl



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!