I'm trying to convert a HTML containing a table to a .csv file using a bash
script.
So far I've acomplished the following steps:
dos2unix
)sed 's/[ \t]//g'
)sed ':a;N;$!ba;s/\n//g'
) (this is necesary, because the HTML file has a blank line for each cell of the table... that's not my fault)<td>
and <tr>
tags (with sed 's/<t.>//g'
)</td>
with ',' (with sed 's/<\/td/,/g'
)</tr>
with end-of-line (\n
) characters (with sed 's/<\/tr/\n/g'
)Of course, I'm putting all this in a pipeline. So far, it's working great. There's one final step I'm stuck with: The table has a column with dates, which has the format dd/mm/yyyy
, and I'd like to convert them to yyyy-mm-dd
.
Is there a (simple) way to do it (with sed
or awk
)?
Data sample (after the whole sed
pipe):
500,2,13/09/2007,30000.00,12,B-1
501,2,15/09/2007,14000.00,8,B-2
Expected result:
500,2,2007-09-13,30000.00,12,B-1
501,2,2007-09-15,14000.00,8,B-2
The reason I need to do this is because I need to import this data to MySQL. I could open the file in Excel and change the format by hand, but I would like to skip that.
AWK, like sed, is a programming language that deals with large bodies of text. But while people use sed to process and modify text, people mostly use AWK as a tool for analysis and reporting. Like sed, AWK was first developed at Bell Labs in the 1970s.
Generally I would say grep is the fastest one, sed is the slowest. Of course this depends on what are you doing exactly. I find awk much faster than sed . You can speed up grep if you don't need real regular expressions but only simple fixed strings (option -F).
The sed is a command line utility that parses and transforms text, using a simple, compact programming language. The awk is a command line utility designed for text processing that allows writing effective programs in the form of statements.
sed -E 's,([0-9]{2})/([0-9]{2})/([0-9]{4}),\3-\2-\1,g'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With