(This example is edited, following a user's recommendation, considering a mistake in my table display)
I have a .csv table from where I need certain info. My table looks like this:
Name, Birth
James,2001/02/03 California
Patrick,2001/02/03 Texas
Sarah,2000/03/01 Alabama
Sean,2002/02/01 New York
Michael,2002/02/01 Ontario
From here, I would need to print only the unique birthdates, in an ascending order, like this:
2000/03/01
2001/02/03
2002/02/01
I have thought of a regular expression to identify the dates, such as:
awk '/[0-9]{4}/[0-9]{2}/[0-9]/{2}/' students.csv
However, I'm getting a syntax error in the regex, and I wouldn't know how to follow from this step.
Any hints?
Use cut
and sort
with -u
option to print unique values:
cut -d' ' -f2 students.csv | sort -u > out_file
You can also use grep
instead of cut
:
grep -Po '\d\d\d\d/\d\d/\d\d' students.csv | sort -u > out_file
Here, GNU grep
uses the following options:-P
: Use Perl regexes.-o
: Print the matches only (1 match per line), not the entire lines.
SEE ALSO:
perlre - Perl regular expressions
Here is a gnu awk
solution to get this done in a single command:
awk 'NF > 2 && !seen[$2]++{} END {
PROCINFO["sorted_in"]="@ind_str_asc"; for (i in seen) print i}' file
2000/03/01
2001/02/03
2002/02/01
Using any awk and whether your names have 1 word or more and whether blank chars exist after the commas or not:
$ awk -F', *' 'NR>1{sub(/ .*/,"",$2); print $2}' file | sort -u
2000/03/01
2001/02/03
2002/02/01
With your shown samples, could you please try following. Written and tested in GNU awk
, should work in any awk
though.
awk '
match($0,/[0-9]{4}(\/[0-9]{2}){2}/){
arrDate[substr($0,RSTART,RLENGTH)]
}
END{
for(i in arrDate){
print i
}
}
' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
match($0,/[0-9]{4}(\/[0-9]{2}){2}/){ ##using match function to match regex to match only date format.
arrDate[substr($0,RSTART,RLENGTH)] ##Creating array arrDate which has index as sub string of matched one.
}
END{ ##Starting END block of this awk program from here.
for(i in arrDate){ ##Traversing through arrDate here.
print i ##Printing index of array here.
}
}
' Input_file ##Mentioning Input_file name here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With