I have no bash experience, just want to know how to get started.
I have to write a bash script that properly formats an XHTML document. For example turns this:
<p>Test</p><ol><li>Test
</li><li>
Test</li></ol>
into this:
<p>Test</p>
<ol>
<li>Test</li>
<li>Test</li>
</ol>
Now I believe I have to do something like:
cat > format1 #create file
#!bin/bash
if tail of a line ends with "</A-a>": (like </li> or </ol> or </p> or </ul>)
add \n
fi
if head of a line = <ol> or <ul>
add \n
fi
Please help me understand it. This is all I can think of and I really would like to know how to solve it.
Given the constraints that the problem must be solved with a bash script and you cannot use htmltidy, then I'd get started by creating a file htmltidy.sh which contains:
#!/bin/bash
echo $( cat ) |\
sed 's/\s*\(<[^>]\+>\)\s*/\1/g' |\
sed 's/></>\n</g' |\
awk '{
if ( $0 ~ /^<\/[^>]+>$/ ) indent=substr(indent,2);
print indent$0;
if ( $0 ~ /^<[^\/>][^>]+>$/ ) indent=indent" ";
}'
To use this program you'll pipe the content into it like this:
cat sexist.html | ./xhtmltidy.sh
This will at least do the trick given the sample input that you provided.
Some explanation:
This toy program will break very quickly as soon as the complexity of the input starts getting more complex. But that will give you some idea why it's better to use an off the shelf utility rather than write your own.
Use html-tidy
. It would be a good idea to add this to your .bashrc
if you wish to use tidy
alias tidy="tidy -xml --indent auto --indent-spaces 1 --quiet yes -im"
The above command creates an alias
for tidy that says to indent the file as xml (ensures all tags have closing tags), indent with a single space and modifies the file in place.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With