I have a CSV file million_songs_metadata_and_sales.csv
having the following schema.
track_id
sales_date
sales_count
title
song_id
release
artist_id
artist_mbid
artist_name
duration
artist_familiarity
artist_hotttnesss
year
Sample data:
TRZZZZZ12903D05E3A,2014-06-19,79,Infra Stellar,SOZPUEF12AF72A9F2A,Archives Vol. 2,ARBG8621187FB54842,4279aba0-1bde-40a9-8fb2-c63d165dc554,Delerium,495.22893,0.69652442519,0.498471038842,2001
I need to write a query in BASH to find the artist_name with maximum sales using the file million_songs_metadata_and_sales.csv
.
I have written the following script but it fails to give me the correct data:
awk 'max=="" || $3 > max {max=$3} END{ print $9}' FS="," million_songs_metadata_and_sales.csv
Any work around to this issue? Thanks!
$N
can be used only when awk
is processing a line.
$ cat file.csv
TRZZZZZ12903D05E3A,2014-06-19,77,Infra Stellar,SOZPUEF12AF72A9F2A,Archives Vol. 2,ARBG8621187FB54842,4279aba0-1bde-40a9-8fb2-c63d165dc554,Delerium 1,495.22893,0.69652442519,0.498471038842,2001
TRZZZZZ12903D05E3A,2014-06-19,79,Infra Stellar,SOZPUEF12AF72A9F2A,Archives Vol. 2,ARBG8621187FB54842,4279aba0-1bde-40a9-8fb2-c63d165dc554,Delerium,495.22893,0.69652442519,0.498471038842,2001
TRZZZZZ12903D05E3A,2014-06-19,78,Infra Stellar,SOZPUEF12AF72A9F2A,Archives Vol. 2,ARBG8621187FB54842,4279aba0-1bde-40a9-8fb2-c63d165dc554,Delerium 2,495.22893,0.69652442519,0.498471038842,2001
$ awk 'BEGIN { max=0 } $3 > max { max=$3; name=$9 } END { print name }' FS="," file.csv
Delerium
$
The
cut -d, -f3,9 < data.csv | sort -nr | head -1
will do it.
And will fail immediately if some columns containing a comma. For correct CSV parsing you need to use some cvs-parsing library.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With