Getting the count of unique values in a column in bash

Tags:

I have tab delimited files with several columns. I want to count the frequency of occurrence of the different values in a column for all the files in a folder and sort them in decreasing order of count (highest count first). How would I accomplish this in a Linux command line environment?

It can use any common command line language like awk, perl, python etc.

487

asked Feb 07 '11 13:02

sfactor

3 Answers

To see a frequency count for column two (for example):

awk -F '\t' '{print $2}' * | sort | uniq -c | sort -nr

fileA.txt

z    z    a
a    b    c
w    d    e

fileB.txt

t    r    e
z    d    a
a    g    c

fileC.txt

z    r    a
v    d    c
a    m    c

Result:

185

answered Oct 16 '22 14:10

Dennis Williamson

Here is a way to do it in the shell:

FIELD=2
cut -f $FIELD * | sort| uniq -c |sort -nr

This is the sort of thing bash is great at.

answered Oct 16 '22 13:10

Thedward

The GNU site suggests this nice awk script, which prints both the words and their frequency.

Possible changes:

You can pipe through sort -nr (and reverse word and freq[word]) to see the result in descending order.
If you want a specific column, you can omit the for loop and simply write freq[3]++ - replace 3 with the column number.

Here goes:

 # wordfreq.awk --- print list of word frequencies

 {
     $0 = tolower($0)    # remove case distinctions
     # remove punctuation
     gsub(/[^[:alnum:]_[:blank:]]/, "", $0)
     for (i = 1; i <= NF; i++)
         freq[$i]++
 }

 END {
     for (word in freq)
         printf "%s\t%d\n", word, freq[word]
 }

answered Oct 16 '22 13:10

Adam Matan

Related questions
                            
                                What linux shell command returns a part of a string? [duplicate]
                            
                                nodemon not working: -bash: nodemon: command not found
                            
                                How to keep two folders automatically synchronized?
                            
                                How to give a pattern for new line in grep?
                            
                                List files by last edited date
                            
                                Finding empty directories
                            
                                Single script to run in both Windows batch and Linux Bash?
                            
                                Bash script error [: !=: unary operator expected
                            
                                Using the star sign in grep
                            
                                Sleep until a specific time/date
                            
                                psql: command not found Mac
                            
                                Build a JSON string with Bash variables
                            
                                How to change a command line argument in Bash?
                            
                                Batch equivalent of Bash backticks
                            
                                How to make .bashrc aliases available within a vim shell command? (:!...)
                            
                                Running programs in parallel using xargs
                            
                                Bash set +x without it being printed
                            
                                How to list variables declared in script in bash?
                            
                                How do I prevent commands from showing up in Bash history?
                            
                                HMAC-SHA1 in bash

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Getting the count of unique values in a column in bash

Tags:

bash

command-line

frequency