here's my situation: I had a big text file that I wanted to pull certain information from. I used sed to pull all the relevant information based on regexp's, but each "piece" of information I pulled is on a separate line, I'd like for each "record" to be on its own line so it can be easily imported into a DB. Here's a sample of my data right now: <blockquote> 92831,499,000 ,0644321 79217,999,000 ,5417178 ,PK91622 ,PK90755 </blockquote> Ideally, I would want this output to look like: <blockquote> 92831,499,000 ,0644321 79217,999,000 ,5417178 ,PK91622 79217,999,000 ,5417178 ,PK90755 </blockquote> This may be harder to do, so I would settle for the output of that last "record" to only appear once with the additional "PK..." to be the 4th "field" of that line. In the end, the simplest way I could think of doing is if the line starts with a comma ( ^, ) the newline before it should be removed... I'm not too familiar with awk though so if you could give me a start on this it would really be appreciated! Thanks!

<pre class="prettyprint"> $ perl -0pe 's/\n,/,/g' < test.dat 92831,499,000,0644321 79217,999,000,5417178,PK91622,PK90755 </pre> Translation: Read in bulk without line separation, swap out each comma following a newline with just a comma. Shortest code here!

Using awk (or sed) to remove newlines based on first character of next line

Tags:

bash

shell

sed

awk

here's my situation: I had a big text file that I wanted to pull certain information from. I used sed to pull all the relevant information based on regexp's, but each "piece" of information I pulled is on a separate line, I'd like for each "record" to be on its own line so it can be easily imported into a DB.
Here's a sample of my data right now:

92831,499,000
,0644321
79217,999,000
,5417178
,PK91622
,PK90755

Ideally, I would want this output to look like:

92831,499,000 ,0644321
79217,999,000 ,5417178 ,PK91622
79217,999,000 ,5417178 ,PK90755

This may be harder to do, so I would settle for the output of that last "record" to only appear once with the additional "PK..." to be the 4th "field" of that line.
In the end, the simplest way I could think of doing is if the line starts with a comma ( ^, ) the newline before it should be removed... I'm not too familiar with awk though so if you could give me a start on this it would really be appreciated! Thanks!

616

asked Feb 05 '10 15:02

Mike

2 Answers

$ perl -0pe 's/\n,/,/g' < test.dat
92831,499,000,0644321
79217,999,000,5417178,PK91622,PK90755

Translation: Read in bulk without line separation, swap out each comma following a newline with just a comma.

Shortest code here!

177

answered Sep 27 '22 22:09

Demosthenex

Well, guess I should have taken a closer look at using Records in awk when I was trying to figure this out last night... 10 minutes after looking at them I got it working. For anyone interested here's how I did this: In my original sed script I put an extra newline infront of the beginning of each record so there's now a blank line seperating each one. I then use the following awk command:

awk 'BEGIN {RS = ""; FS = "\n"}
{
if (NF >= 3)
for (i = 3; i <= NF; i++)
print $1,$2,$i
}'

and it works like a charm outputting exactly the way I wanted!

answered Sep 27 '22 20:09

Mike

Related questions
                            
                                Capturing output and exit codes in BASH / SHELL
                            
                                How to send custom signal to bash daemon process?
                            
                                Looping on empty directory content in Bash [duplicate]
                            
                                How to urlencode data into a URL, with bash or curl
                            
                                Bash script size limitation?
                            
                                bash read returns with exit code 1 even though it runs as expected
                            
                                How does Bash manage its memory?
                            
                                How to create a map of key:array in shell?
                            
                                git pre-status or post-status hook
                            
                                If errexit is on, how do I run a command that might fail and get its exit code?
                            
                                "printf -v" inside function not working with redirected output
                            
                                java.lang.NoSuchMethodError: No such DSL method 'bash' found among steps
                            
                                How to create Github Pull Request using curl?
                            
                                ‘ls’ terminated by signal 13 when using find command [duplicate]
                            
                                $BASH_VERSION reports old version of bash on macOS, is this a problem that should be fixed?
                            
                                Can I use stat to show human readable size of file?
                            
                                autoconf using sh, I need SHELL=BASH, how do I force autoconf to use bash?
                            
                                script-file vs command-line: rsync and --exclude
                            
                                Is there an "escape converter" for file and directory names available?
                            
                                reading multiple items semi-interactively from the user in bash

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Using awk (or sed) to remove newlines based on first character of next line

Tags:

bash

shell

sed

awk

Mike

People also ask

2 Answers

Demosthenex

Mike

Recent Activity

Donate For Us