I am building HTML table from the list through lxml.builder and striving to make a link in one of the table cells
List is generated in a following way:
with open('some_file.html', 'r') as f:
table = etree.parse(f)
p_list = list()
rows = table.iter('div')
p_list.append([c.text for c in rows])
rows = table.xpath("body/table")[0].findall("tr")
for row in rows[2:]:
p_list.append([c.text for c in row.getchildren()])
HTML file which I parse is the same that is generated further by lxml, i.e. I set up some sort of recursion for testing purposes.
And here is how I build table
from lxml.builder import E
page = (
E.html(
E.head(
E.title("title")
),
E.body(
....
*[E.tr(
*[
E.td(E.a(E.img(src=str(col)))) if ind == 8 else
E.td(E.a(str(col), href=str(col))) if ind == 9 else
E.td(str(col)) for ind, col in enumerate(row)
]
) for row in p_list ]
When I specify link via literals all is going fine.
E.td(E.a("link", href="url_address"))
However, when I try to output list element value (which is https://blahblahblah.com
) as a link
E.td(E.a(str(col), href=str(col)))
cell is empty, just nothing is showed in the cell.
If I specify link text as a literal and put str (col)
into href, the link is showed normally, but instead of real href it contains the name of the generated html file.
If I output just that col
value as a string
E.td(str(col))
it is showed normally, i.e. it is not empty. What is wrong with E.a
and E.img
elements?
Just noticed that this happens only if I build list from html file. When I build list manually, like this, all is output fine.
p_list = []
p_element = ['id']
p_element.append('value')
p_element.append('value2')
p_list.append(p_element)
Current output (pay attention to <a>
and <href>
tags)
<html>
<head>
<title>page</title>
</head>
<body>
<style type="text/css">
th {
background-color: DeepSkyBlue;
text-align: center;
vertical-align: bottom;
height: 150px;
padding-bottom: 3px;
padding-left: 5px;
padding-right: 5px;
}
.vertical {
text-align: center;
vertical-align: middle;
width: 20px;
margin: 0px;
padding: 0px;
padding-left: 3px;
padding-right: 3px;
padding-top: 10px;
white-space: nowrap;
-webkit-transform: rotate(-90deg);
-moz-transform: rotate(-90deg);
}</style>
<h1>title</h1>
<p>This is another paragraph, with a</p>
<table border="2">
<tr>
<th>
<div class="vertical">ID</div>
</th>
...
<th>
<div class="vertical">I blacklisted him</div>
</th>
</tr>
<tr>
<td>1020</td>
<td>ТаисияСтрахолет</td>
<td>No</td>
<td>Female</td>
<td>None</td>
<td>Санкт-Петербург</td>
<td>Росiя</td>
<td>None</td>
<td>
<a>
<img src=" "/>
</a>
</td>
<td>
<a href=" ">
</a>
</td>
...
</tr>
</table>
</body>
</html>
Desired output
<html>
<head>
<title>page</title>
</head>
<body>
<style type="text/css">
th {
background-color: DeepSkyBlue;
text-align: center;
vertical-align: bottom;
height: 150px;
padding-bottom: 3px;
padding-left: 5px;
padding-right: 5px;
}
.vertical {
text-align: center;
vertical-align: middle;
width: 20px;
margin: 0px;
padding: 0px;
padding-left: 3px;
padding-right: 3px;
padding-top: 10px;
white-space: nowrap;
-webkit-transform: rotate(-90deg);
-moz-transform: rotate(-90deg);
}</style>
<h1>title</h1>
<p>This is another paragraph, with a</p>
<table border="2">
<tr>
<th>
<div class="vertical">ID</div>
</th>
...
<th>
<div class="vertical">I blacklisted him</div>
</th>
</tr>
<tr>
<td>1019</td>
<td>МихаилПавлов</td>
<td>No</td>
<td>Male</td>
<td>None</td>
<td>Санкт-Петербург</td>
<td>Росiя</td>
<td>C.-Петербург</td>
<td>
<a>
<img src="http://i.imgur.com/rejChZW.jpg"/>
</a>
</td>
<td>
<a href="http://i.imgur.com/rejChZW.jpg">link</a>
</td>
...
</tr>
</table>
</body>
</html>
Got it myself. The problem was not in generating but in parsing HTML. Parsing function didn't fetch IMG
and A
tags nested in TD
and these elements of the list were empty. Due to the rigorous logic of the program (fetching from file + fetching from site API) I wasn't able to detect the cause of the issue.
The correct parsing logic should be:
for row in rows[1:]:
data.append([
c.find("a").text if c.find("a") is not None else
c.find("img").attrib['src'] if c.find("img") is not None else
c.text
for c in row.getchildren()
])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With