Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Awk: How to work on multiple files.txt in folder and subfolders?

Given a folder with subfolders themselves with multilangual .txt files such as:

But where is Esope the holly Bastard
But where is 생 지 옥 이 군
지 옥 이
지 옥
지
我 是 你 的 爸 爸 !
爸 爸 ! ! !
你 不 會 的 !

I already know how to count space-separated word-frequency within ONE file.txt :

$ grep -o '\w*' myfile.txt | awk '{a[$1]++}END{for(k in a)print a[k],k}' | sort > myoutput.txt

Getting the elegant :

1 생
1 군
1 Bastard
1 Esope
1 holly
1 the
1 不
1 我
1 是
1 會
2 이
2 But
2 is
2 where
2 你
2 的
3 옥
4 지
4 爸
5 !

How to change the code to work on multiples files within a folder and its subfolders, all presenting a similar pattern ( *.txt at least) ?

like image 477
Hugolpz Avatar asked Mar 24 '13 22:03

Hugolpz


1 Answers

You can use the find command for that. Like this:

find -iname '*.txt' -exec cat {} \; | grep -o '\w*' | awk '{a[$1]++}END{for(k in a)print a[k],k}' | sort 

I'm using the the option -exec to cat every *.txt file in the current directory and it's subdirs. The output will get piped to your grep|awk|sort pipe.

like image 60
hek2mgl Avatar answered Sep 20 '22 01:09

hek2mgl