Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bash - How to count occurences in a column of a .csv file (without awk)

Tags:

bash

recently i've started to learn bash scripting and im wondering how i can count occurences in a column of a .csv file, the file is structured like this:

    DAYS,SOMEVALUE,SOMEVALUE
    sunday,something,something
    monday,something,something
    wednesday,something,something
    sunday,something,something
    monday,something,something

so my question is: how can i count each time every value of first column (days) appear? In this case the outputs must be:

    Sunday : 2
    Monday : 2
    Wednesday: 1

The first column is named DAYS, so the script has to not take care of the single value DAYS, DAYS is just a way to identify the column.

if possible i want to see a solution without the awk command and without phyton ecc..

Thx guys and sorry for my bad English

Edit: I thought to do this:

    count="$( cat "${FILE}" | grep -c "OCCURENCE")"
    echo "OCCURENCE": ${count}

Where OCCURENCE is the single values (sunday,monday...) But this solution is not automatic, i need to make a list of single occurences in the first column of .csv file and put each one on an array and then count each one with the code i written before. I need some help to do this thx

like image 916
mark Franz Avatar asked Mar 29 '18 06:03

mark Franz


People also ask

How do I count columns in a CSV file?

import csv f = 'testfile. csv' d = '\t' reader = csv. reader(f,delimiter=d) for row in reader: if reader. line_num == 1: fields = len(row) if len(row) !=

How do I count columns in bash?

All what has left is to simply use wc command to count number of characters.

How do I count a column in Linux?

Method 1: Using head + sed + wc Commands As you might have noted, the above command has outputted 6 commas. We will again pipe this output to the wc command which will count these comma characters plus the carriage (\n) character as the total number of columns.


2 Answers

cut -f1 -d, test.csv | tail -n +2 | sort | uniq -c

This gets you this far:

  2 monday
  2 sunday
  1 wednesday

To get your format (Sunday : 1), I think awk would be an easy and clear way (something like awk '{print $2 " : " $1}', but if you really really must, here's a complete non-awk version:

cut -f1 -d, test.csv | uniq -c | tail -n +2 | while read line; do words=($line); echo ${words[1]} : ${words[0]}; done
like image 57
sneep Avatar answered Oct 16 '22 09:10

sneep


A variation of @sneep's answer that uses sed to format the result:

cut -f1 -d, /tmp/data  | tail -n +2 | sort | uniq -c | sed 's|^ *\([0-9]*\) \(.*\)|\u\2: \1|g'

Output:

Monday: 2
Sunday: 2
Wednesday: 1

The sed is matching:

  • ^ *: Beginning of line and then any number of spaces
  • \([0-9]*\): Any number of numbers (storing them in a group \1)
  • : A single space
  • \(.*\): Any character until the end, storing it in group \2

And replaces the match with:

  • \u\2: Second group, capitalizing first character
  • : \1: Colon, space and the first group
like image 28
urban Avatar answered Oct 16 '22 10:10

urban