Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using awk (or sed) to remove newlines based on first character of next line

Tags:

bash

shell

sed

awk

here's my situation: I had a big text file that I wanted to pull certain information from. I used sed to pull all the relevant information based on regexp's, but each "piece" of information I pulled is on a separate line, I'd like for each "record" to be on its own line so it can be easily imported into a DB.
Here's a sample of my data right now:

92831,499,000
,0644321
79217,999,000
,5417178
,PK91622
,PK90755

Ideally, I would want this output to look like:

92831,499,000 ,0644321
79217,999,000 ,5417178 ,PK91622
79217,999,000 ,5417178 ,PK90755

This may be harder to do, so I would settle for the output of that last "record" to only appear once with the additional "PK..." to be the 4th "field" of that line.
In the end, the simplest way I could think of doing is if the line starts with a comma ( ^, ) the newline before it should be removed... I'm not too familiar with awk though so if you could give me a start on this it would really be appreciated! Thanks!

like image 616
Mike Avatar asked Feb 05 '10 15:02

Mike


People also ask

Which is faster awk or sed?

I find awk much faster than sed . You can speed up grep if you don't need real regular expressions but only simple fixed strings (option -F). If you want to use grep, sed, awk together in pipes, then I would place the grep command first if possible.


2 Answers

$ perl -0pe 's/\n,/,/g' < test.dat
92831,499,000,0644321
79217,999,000,5417178,PK91622,PK90755

Translation: Read in bulk without line separation, swap out each comma following a newline with just a comma.

Shortest code here!

like image 177
Demosthenex Avatar answered Sep 27 '22 22:09

Demosthenex


Well, guess I should have taken a closer look at using Records in awk when I was trying to figure this out last night... 10 minutes after looking at them I got it working. For anyone interested here's how I did this: In my original sed script I put an extra newline infront of the beginning of each record so there's now a blank line seperating each one. I then use the following awk command:

awk 'BEGIN {RS = ""; FS = "\n"}
{
if (NF >= 3)
for (i = 3; i <= NF; i++)
print $1,$2,$i
}'

and it works like a charm outputting exactly the way I wanted!

like image 43
Mike Avatar answered Sep 27 '22 20:09

Mike