Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Edited: Grep/Awk- Print specific info from table

(This example is edited, following a user's recommendation, considering a mistake in my table display)

I have a .csv table from where I need certain info. My table looks like this:

Name, Birth

James,2001/02/03 California
Patrick,2001/02/03 Texas
Sarah,2000/03/01 Alabama
Sean,2002/02/01 New York
Michael,2002/02/01 Ontario

From here, I would need to print only the unique birthdates, in an ascending order, like this:

2000/03/01
2001/02/03
2002/02/01

I have thought of a regular expression to identify the dates, such as:

awk '/[0-9]{4}/[0-9]{2}/[0-9]/{2}/' students.csv

However, I'm getting a syntax error in the regex, and I wouldn't know how to follow from this step.

Any hints?

like image 217
robert_gonzalez Avatar asked Mar 25 '21 18:03

robert_gonzalez


4 Answers

Use cut and sort with -u option to print unique values:

cut -d' ' -f2 students.csv | sort -u > out_file

You can also use grep instead of cut:

grep -Po '\d\d\d\d/\d\d/\d\d' students.csv | sort -u > out_file

Here, GNU grep uses the following options:
-P : Use Perl regexes.
-o : Print the matches only (1 match per line), not the entire lines.

SEE ALSO:
perlre - Perl regular expressions

like image 187
Timur Shtatland Avatar answered Oct 11 '22 19:10

Timur Shtatland


Here is a gnu awk solution to get this done in a single command:

awk 'NF > 2 && !seen[$2]++{} END {
PROCINFO["sorted_in"]="@ind_str_asc"; for (i in seen) print i}' file

2000/03/01
2001/02/03
2002/02/01
like image 22
anubhava Avatar answered Oct 11 '22 17:10

anubhava


Using any awk and whether your names have 1 word or more and whether blank chars exist after the commas or not:

$ awk -F', *' 'NR>1{sub(/ .*/,"",$2); print $2}' file | sort -u
2000/03/01
2001/02/03
2002/02/01
like image 2
Ed Morton Avatar answered Oct 11 '22 18:10

Ed Morton


With your shown samples, could you please try following. Written and tested in GNU awk, should work in any awk though.

awk '
match($0,/[0-9]{4}(\/[0-9]{2}){2}/){
  arrDate[substr($0,RSTART,RLENGTH)]
}
END{
  for(i in arrDate){
    print i
  }
}
'  Input_file

Explanation: Adding detailed explanation for above.

awk '                                   ##Starting awk program from here.
match($0,/[0-9]{4}(\/[0-9]{2}){2}/){    ##using match function to match regex to match only date format.
  arrDate[substr($0,RSTART,RLENGTH)]    ##Creating array arrDate which has index as sub string of matched one.
}
END{                                    ##Starting END block of this awk program from here.
  for(i in arrDate){                    ##Traversing through arrDate here.
    print i                             ##Printing index of array here.
  }
}
'  Input_file                           ##Mentioning Input_file name here.
like image 2
RavinderSingh13 Avatar answered Oct 11 '22 19:10

RavinderSingh13