Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

linux sed command - adding string on every end of the line of csv

Tags:

linux

unix

csv

sed

Currently I'm having a problem on the ff CSV data.

COLUMN1,COLUMN2,COLUMN3,COLUMN4
apple1,apple2,apple3,apple4
banana1,banana2,banana3,
caimito1,"caimito21
caimito22","caimito31
caimito32",caimito4

That would look like this:

╔══════════╦═══════════╦═══════════╦══════════╗
║ COLUMN1  ║  COLUMN2  ║  COLUMN3  ║ COLUMN4  ║
╠══════════╬═══════════╬═══════════╬══════════╬
║ apple1   ║ apple2    ║ apple3    ║ apple4   ║
║ banana1  ║ banana2   ║ banana3   ║          ║
║ caimito1 ║ caimito21 ║ caimito31 ║ caimito4 ║
║          ║ caimito22 ║ caimito32 ║          ║
╚══════════╩═══════════╩═══════════╩══════════╝

So my plan is to add COLUMN5 and every row of it will have a value "FRUIT".

Command used:

sed "1 s/$/,COLUMN5/g" FILE.csv | sed "2,$ s/$/,FRUIT/g" > OUTPUT.csv

Output:

╔══════════╦════════════════╦════════════════╦══════════╦═════════╗
║ COLUMN1  ║  COLUMN2       ║  COLUMN3       ║ COLUMN4  ║ COLUMN5 ║
╠══════════╬════════════════╬════════════════╬══════════╬═════════╣
║ apple1   ║ apple2         ║ apple3         ║ apple4   ║ FRUIT   ║
║ banana1  ║ banana2        ║ banana3        ║          ║ FRUIT   ║
║ caimito1 ║ caimito21FRUIT ║ caimito31FRUIT ║ caimito4 ║ FRUIT   ║
║          ║ caimito22      ║ caimito32      ║          ║         ║
╚══════════╩════════════════╩════════════════╩══════════╩═════════╝

Is there any way to add the "FRUIT" without affecting "caimito" row?

I also tried the ff. command but it didn't work out. Added "," before "$".

sed "1 s/$/,COLUMN5/g" FILE.csv | sed "2,$ s/,$/,FRUIT/g" > OUTPUT.csv
like image 619
banana Avatar asked Feb 12 '15 09:02

banana


People also ask

How do you add a line at the end of the file using sed?

There are different ways to insert a new line in a file using sed, such as using the “a” command, the “i” command, or the substitution command, “s“. sed's “a” command and “i” command are pretty similar.


2 Answers

Sed is probably not the right tool to handle csv files since the rules are way more complicated than how it may look (it's probably possible, but such scripts are in general rather error-prone, etc.). You can however use csvtools to handle this:

file="FILE.csv"
nr=$(csvtool height $file)
ot=$(perl -e "print \"COLUMN5\\n\";for\$i(2..$nr){print \"FRUIT\\n\";}")
echo "$ot" | csvtool paste "$file" -

The script works as follows:

  1. First we calculate the number of rows with csvtool height,
  2. Next we generate the additional column by printing COLUMN5 followed by n-1 times FRUIT.
  3. Finally we paste that content to the right of the file.
like image 199
Willem Van Onsem Avatar answered Sep 25 '22 00:09

Willem Van Onsem


EDIT: I only just saw the csvtool solution; it is of course much more practical. I leave this solution up mainly because it would be a pity to hide it and its Lovecraftian beauty.

Well, here goes. This is a way to do it in sed:

sed ':a $!{ N; ba }; s/"[^"]*"/{&}/g; :b s/\({"[^"]*\)\n\([^"]*"}\)/\1~"~\2/g; tb; s/\n\|$/,FRUIT&/g; s/,FRUIT\(\n\|$\)/,COLUMN5\1/; :c s/\({"[^"]\)*~"~/\1\n/g; tc; s/{"\|"}/"/g' filename

This is going to be a bit of a ride. It works as follows:

:a $!{ N; ba }                         # assemble the whole file in the
                                       # hold buffer

s/"[^"]*"/{&}/g                        # encase all "-enclosed fields in
                                       # {"..."} to make matching the beginning
                                       # and end separately possible.

:b                                     # jump mark for looping
s/\({"[^"]*\)\n\([^"]*"}\)/\1~"~\2/g   # replace the first newline in all
                                       # {"..."} fields with ~"~
tb                                     # loop until all were replaced

s/\n\|$/,FRUIT&/g                      # Put FRUIT at the end of all lines
s/,FRUIT\(\n\|$\)/,COLUMN5\1/          # Replace the first ,FRUIT with ,COLUMN5
                                       # The \(\n\|$\) bit is so that this
                                       # works with empty files (that only
                                       # have a header line)

:c                                     # Jump mark for looping
s/\({"[^"]\)*~"~/\1\n/g                # replace the first ~"~ in all {"..."}
                                       # fields with a newline
tc                                     # loop until all were replaced

s/{"\|"}/"/g                           # replace all {", "} markers with "
                                       # again.
like image 27
Wintermute Avatar answered Sep 25 '22 00:09

Wintermute