How can I change very large log files under Windows from this:
3334-444-(4) anything anything2 4444-444-(4) anything anything2 4744-454-(4) anything anything2 48444 44-(4) anything anything2 8444-444-(4) anything anything2 4464-(444)-2 anything anything2
to this:
33344444 anything anything2 44444444 anything anything2 47444544 anything anything2 48444444 anything anything2 84444444 anything anything2 44644442 anything anything2
Remove everything to position 18 in each line except digits and keep the position of a second column?
\\Edit: The problem is that from position 1 to 17 could be also space between digits. This is the logic that I suppose might work:
1. From pos. 1 to 17 replace '(', ')', '-' to ' ' [space]
2. From pos. 1 to 17 replace ' ' [space] to '' [nothing] and count changes
3. From pos. 1 to 17 add space after digits in accordance with each change from previous step
Well, if you install cygwin, you can use the power of commandline-tools
$ sed 's/[-)(]//g' input
33344444 anything anything2
44444444 anything anything2
47444544 anything anything2
48444444 anything anything2
84444444 anything anything2
44644442 anything anything2
update
Sometimes it is easier to divide a complex task into smaller parts.
Assume the input looks like this (adding a ruler)
1 2 3 4 5
12345678901234567890123456789012345678901234567890
3334-444-(4) anything anything2
4444-444-(4) anything anything2
4744-454-(4) anything anything2
48444 44-(4) anything anything2
8444-444-(4) anything anything2
4464-(444)-2 anything anything2
Step 1 is to use cut to well, cut out the first 17 characters, remove the unwanted ones and store in tmp-file.
Step 2 is to cut characters 18 to end-of-line and store in tmp-file.
Step 3 is to combine the tmp-files into one file.
Something like this:
$ cut -c1-17 input | sed 's/[-)( ]*//g' > c1
$ cut -c18- input > c2
$ paste c1 c2
If this doesn't approve to your aesthetic senses, you can do everything in one go using awk. Put the following lines in a file called "col.awk" or choose a better namne if you fell like it:
{
x = substr($0, 0, 17)
y = substr($0, 18, length($0))
gsub(/[-)( ]*/, "", x)
}
{ printf "%-18s%s\n", x, y }
then call it like this:
$ awk -f col.awk input
output (again with ruler):
1 2 3 4 5
12345678901234567890123456789012345678901234567890
33344444 anything anything2
44444444 anything anything2
47444544 anything anything2
48444444 anything anything2
84444444 anything anything2
44644442 anything anything2
Note that cygwin likes everything to have unix-style line endings so you might need to convert your input from windows-style to unix-style. One tool that might help you here is dos2unix or fromdos (google is your friend here).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With