When I try to parse HTMLTableCell
the innerText
value is incorrect and it seem that I'm getting the class name instead of the text.
Now the strange thing is that when I look at the cell (in VS2010) in debug I see the proper value what am I doing wrong?
Further investigation brought this up, when I look up the values in VS2010 it looks like this cell.innerText is "center time" and ((mshtml.HTMLTableCellClass)(cell)).innerText is "23:45". Problem is that it won't compile when I cast and I to mshtml.HTMLTableCellClass so I have to use the interface (why is that?)
see code below:
mshtml.HTMLDocument doc = MainBrowser.Document as mshtml.HTMLDocument;
if (doc != null)
{
mshtml.HTMLTable table = doc.getElementById("ecEventsTable") as mshtml.HTMLTable;
List<List<string>> textRows = new List<List<string>>();
foreach (mshtml.HTMLTableRow row in table.rows)
{
if (row != null && row.id != null && row.id.Contains("eventRowId"))
{
List<string> temp = new List<string>();
foreach (mshtml.HTMLTableCell cell in row.cells)
{
string text = cell.innerText;
if (text != null && text != "" && text != " ")
{
if (text.Contains("\r\n"))
text = text.Replace("\r\n", "");
temp.Add(cell.innerText);
}
}
if (temp.Count > 0)
textRows.Add(temp);
}
}
foreach (var row in textRows)
{
string str = String.Join(" ", row);
}
}
}
HTML example row:
<tr id="eventRowId_34599" onclick="javascript:changeEventDisplay(34599, this, 'overview');" event_timestamp="2014-02-24 01:30:00" event_attr_id="752">
<td class="center time">01:30</td>
<td class="flagCur"><span title="China" class=" ceFlags China"> </span>CNY</td>
<td title="" class="sentiment"><i class="newSiteIconsSprite grayFullBullishIcon middle"></i><i class="newSiteIconsSprite grayEmptyBullishIcon middle"></i>
<i class="newSiteIconsSprite grayEmptyBullishIcon middle"></i></td>
<td class="left event">China House Prices (YoY)</td>
<td title="" class="bold act blackFont" id="eventActual_34599">9.6%</td>
<td class="fore" id="eventForecast_34599"> </td>
<td class="prev blackFont" id="eventPrevious_34599">9.9%</td>
<td class="diamond" id="eventRevisedFrom_34599"> </td> </tr>
Instead of using mshtml.HTMLTableCell I use mshtml.IHTMLElement and now it works.
code after the fix (see old version in the question):
foreach (mshtml.IHTMLElement cell in row.cells)
{
string text = cell.innerText;
if (text != null && text != "" && text != " ")
{
if (text.Contains("\r\n"))
text = text.Replace("\r\n", "");
temp.Add(text);
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With