Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

List all the words in a text file with occurrence counts?

Tags:

bash

sed

awk

Suppose I have file text.txt as below:

she likes cats, and he likes cats too.

I'd like my result to look like:

she 1
likes 2
cats 2
and 1
he 1
too 1

If putting space , . into it would make the scripts easier, that would be fine.

Is there a simple shell pipeline that could achieve this?

like image 322
JackWM Avatar asked Mar 14 '13 03:03

JackWM


People also ask

How to count occurrences of all words in a file?

Step by step descriptive logic to count occurrences of all words in a file. Open source file to count occurrences of in r (read) mode. Store its reference in fptr. Declare an array of string words [] to store list of distinct words. Declare another integer array count [] to store count of all words in file.

What is the line content occurrence counter?

This is a Word counter that counts how many times each word is present in your text When selecting the "line content occurrences" checkbox below, you'll be looking at counting the occurrence of each line, instead of each word, allowing you to eliminate duplicate lines in your text Simply paste your text below, or select a text file

How do I Count the number of occurrences in a string?

foreach (string word in Value) { CountTheOccurrences.TryGetValue (word, out int count); CountTheOccurrences [word] = count + 1; } You don't have to check the returned value from TryGetValue () and you can add a new entry via the indexer.

How to count the number of words in a dictionary?

We iterate through each word in the file and add it to the dictionary with count as 1. If the word is already present in the dictionary we increment its count by 1. First we create a text file of which we want to count the words. Let this file be sample.txt with the following contents: Attention geek!


1 Answers

Here's a one-liner near and dear to my heart:

cat text.txt | sed 's|[,.]||g' | tr ' ' '\n' | sort | uniq -c

The sed strips punctuation (tune regex to taste), the tr puts the results one word per line.

like image 105
phs Avatar answered Nov 15 '22 05:11

phs