Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

printing contents of variable to a specified line in outputfile with sed/awk

Tags:

bash

csv

sed

awk

I have been working on a script to concatenate multiple csv files into a single, large csv. The csv's contain names of folders and their respective sizes, in a 2-column setup with the format "Size, Projectname"

Example of a single csv file:

49747851728,ODIN
32872934580,_WORK
9721820722,LIBRARY
4855839655,BASELIGHT
1035732096,ARCHIVE
907756578,USERS
123685100,ENV
3682821,SHOTGUN
1879186,SALT
361558,SOFTWARE
486,VFX
128,DNA

For my current test I have 25 similar files, with different numbers in the first column.

I am trying to get this script to do the following:

  • Read each csv file
  • For each Project it sees, scan the outputfile if that Project was already printed to the file. If not, print the Projectname
  • For each file, for each Project, if the Project was found, print the Size to the output csv.

However, I need the Projects to all be on textline 1, comma separated, so I can use this outputfile as input for a javascript graph. The Sizes should be added in the column below their projectname.

My current script:

csv_folder=$(echo "$1" | sed 's/^[ \t]*//;s/\/[ \t]*$//')
csv_allfiles="$csv_folder/*.csv"
csv_outputfile=$csv_folder.csv
echo -n "" > $csv_outputfile

for csv_inputfile in $csv_allfiles; do
  while read line && [[ $line != "" ]]; do
    projectname=$(echo $line | sed 's/^\([^,]*\),//')
    projectfound1=$(cat $csv_outputfile | grep -w $projectname)
if [[ ! $projectfound1 ]]; then
  textline=1
  sed "${textline}s/$/${projectname}, /" >> $csv_outputfile
    for csv_foundfile in $csv_allfiles; do
    textline=$(echo $textline + 1 | bc )
    projectfound2=$(cat $csv_foundfile | grep -w $projectname)
    projectdata=$(echo $projectfound2 | sed 's/\,.*$//')
        if [[ $projectfound2 ]]; then
          sed "${textline}s/$/$projectdata, /" >> $csv_outputfile
        fi
      done
    fi
  done < $csv_inputfile
done

My current script finds the right information (projectname, projectdata) and if I just 'echo' those variables, it prints the correct data to a file. However, with echo it only prints in a long list per project. I want it to 'jump back' to line 1 and print the new project at the end of the current line, then run the loop to print data at the end of each next line.

I was thinking this should be possible with sed or awk. sed should have a way of inserting text to a specific line with

sed '{n}s/search/replace/'

where {n} is the line to insert to

awk should be able to do the same thing with something like

awk -v l2="$textline" -v d="$projectdata" 'NR == l2 {print d} {print}' >> $csv_outputfile

However, while replacing the sed commands in the script with

echo $projectname 
echo $projectdata 

spit out the correct information (so I know my variables are filled correctly) the sed and awk commands tend to spit out the entire contents of their current inputcsv; not just the line that I want them to.

Pastebin outputs per variant of writing to file

  • https://pastebin.com/XwxiAqvT - sed output
  • https://pastebin.com/xfLU6wri - echo, plain output (single column)
  • https://pastebin.com/wP3BhgY8 - echo, detailed output per variable
  • https://pastebin.com/5wiuq53n - desired output

As you see, the sed output tends to paste the whole contents of inputcsv, making the loop stop after one iteration. (since it finds the other Projects after one loop)

So my question is one of these;

  • How do I make sed / awk behave the way I want it to; i.e. print only the info in my var to the current textline, instead of the whole input csv. Is sed capable of this, printing just one line of variable? Or
  • Should I output the variables through 'echo' into a temp file, then loop over the temp file to make sed sort the lines the way I want them to? (Bear in mind that more .csv files will be added in the future, I can't just make it loop x times to sort the info)
  • Is there a way to echo/print text to a specific text line without using sed or awk? Is there a printf option I'm missing? Other thoughts?

Any help would be very much appreciated.

like image 960
JasonD Avatar asked Nov 08 '22 09:11

JasonD


1 Answers

A way to accomplish this transposition is to save the data to an associative array.

In the following example, we use a two dimensional array to keep track of our data. Because ordering seems to be important, we create a col array and create a new increment whenever we see a new projectname -- this col array ends up being our first index into our data. We also create a row array which we increment whenever we see a new data for the current column. The row number is our second index into data. At the end, we print out all the records.

#! /usr/bin/awk -f
BEGIN {
    FS  = ","
    OFS = ", "
    rows=0
    cols=0
    head=""
    split("", data)
    split("", row)
    split("", col)
}
!($2 in col) { # new project
    if (head == "")
        head = $2
    else
        head = head OFS $2
    i = col[$2] = cols++
    row[i] = 0
}
{
    i = col[$2]
    j = row[i]++
    data[i,j] = $1
    if (j > rows)
        rows = j
}
END {
    print head
    for (j=0; j<=rows; ++j) {
        if ((0,j) in data)
            x = data[0,j]
        else
            x = ""
        for (i=1; i<cols; ++i) {
            if ((i,j) in data)
                x = x OFS data[i,j]
            else
                x = x OFS
        }
        print x
    }
}

As a bonus, here is a script to reproduce the detailed output from one of your pastebins.

#! /usr/bin/awk -f
BEGIN {
    FS = ","
    split("", data) # accumulated data for a project
    split("", line) # keep track of textline for data
    split("", idx)  # index into above to maintain input order
    sz = 0
}
$2 in idx { # have seen this projectname
    i = idx[$2]
    x =   ORS "textline = " ++line[i]
    x = x ORS "textdata = " $1
    data[i] = data[i] x
    next
}
{ # new projectname
    i = sz++
    idx[$2] = i
    x =       "textline = 1"
    x = x ORS "projectname = " $2
    x = x ORS "textline = 2"
    x = x ORS "projectdata = " $1
    data[i] = x
    line[i] = 2
}
END {
    for (i=0; i<sz; ++i)
        print data[i]
} 
like image 100
Michael Back Avatar answered Nov 12 '22 18:11

Michael Back