Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sorting csv columns in bash, reading bash output into python variables

Tags:

python

bash

shell

Hi I have a ton of data in multiple csv files and filter out a data set using grep:

user@machine:~/$ cat data.csv | grep -a "63[789]\...;"
637.05;1450.2
637.32;1448.7
637.60;1447.7
637.87;1451.5
638.14;1454.2
638.41;1448.6
638.69;1445.8
638.96;1440.0
639.23;1431.9
639.50;1428.8
639.77;1427.3

I want to figure out the data set which has the highest count, the column right of the ; and then know the corresponding value (left of the ;). In this case the set I'm looking for would be 638.14;1454.2

I tried different things and ended up using a combination of bash and python, which works, but isn't very pretty:

os.system('ls | grep csv > filelist')
files = open("filelist")
files = files.read()
files = files.split("\n")

for filename in files[0:-1]:
  os.system('cat ' + filename + ' | grep -a "63[6789]\...;" > filtered.csv')
  filtered = csv.reader(open('filtered.csv'), delimiter=';')
  sortedlist = sorted(filtered_file, key=operator.itemgetter(1), reverse=True)
  dataset = sortedlist[0][0] + ';' + sortedlist[0][1] + '\n'

I would love to have a bash only solution (cut, awk, arrays?!?) but couldn't figure it out. Also I don't like the work around writing the bash commands into files and then reading them into python variables. Can I read them into variables directly or are there better solutions to this problem? (probably perl etc... but I am really interested in a bash solution..)

Thank you very much!!

like image 840
gletscher Avatar asked Dec 16 '22 20:12

gletscher


1 Answers

A quick one-liner would be:

grep -a "63[789]\...;" data.csv | sort -n -r -t ';' -k 2 | head --lines=1

This simply sorts the file numerically based on the second column and then prints out the first row. Hope that helps.

like image 150
Hakop Palyan Avatar answered Mar 29 '23 23:03

Hakop Palyan