Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use sed or awk to fix date format

Tags:

regex

bash

sed

awk

I'm trying to convert a HTML containing a table to a .csv file using a bash script.

So far I've acomplished the following steps:

  1. Convert to Unix format (with dos2unix)
  2. Remove all spaces and tabs (with sed 's/[ \t]//g')
  3. Remove all the blank lines (with sed ':a;N;$!ba;s/\n//g') (this is necesary, because the HTML file has a blank line for each cell of the table... that's not my fault)
  4. Remove the unnecesary <td> and <tr> tags (with sed 's/<t.>//g')
  5. Replace </td> with ',' (with sed 's/<\/td/,/g')
  6. Replace </tr> with end-of-line (\n) characters (with sed 's/<\/tr/\n/g')

Of course, I'm putting all this in a pipeline. So far, it's working great. There's one final step I'm stuck with: The table has a column with dates, which has the format dd/mm/yyyy, and I'd like to convert them to yyyy-mm-dd.

Is there a (simple) way to do it (with sed or awk)?

Data sample (after the whole sed pipe):

500,2,13/09/2007,30000.00,12,B-1
501,2,15/09/2007,14000.00,8,B-2

Expected result:

500,2,2007-09-13,30000.00,12,B-1
501,2,2007-09-15,14000.00,8,B-2

The reason I need to do this is because I need to import this data to MySQL. I could open the file in Excel and change the format by hand, but I would like to skip that.

like image 837
Barranka Avatar asked Aug 26 '13 21:08

Barranka


People also ask

Should I use sed or awk?

AWK, like sed, is a programming language that deals with large bodies of text. But while people use sed to process and modify text, people mostly use AWK as a tool for analysis and reporting. Like sed, AWK was first developed at Bell Labs in the 1970s.

Which is faster sed or awk?

Generally I would say grep is the fastest one, sed is the slowest. Of course this depends on what are you doing exactly. I find awk much faster than sed . You can speed up grep if you don't need real regular expressions but only simple fixed strings (option -F).

What does sed and awk do?

The sed is a command line utility that parses and transforms text, using a simple, compact programming language. The awk is a command line utility designed for text processing that allows writing effective programs in the form of statements.


1 Answers

sed -E 's,([0-9]{2})/([0-9]{2})/([0-9]{4}),\3-\2-\1,g'
like image 83
ash Avatar answered Sep 30 '22 16:09

ash