Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert a HTML Table to JSON

I'm trying to convert a table I have extracted via BeautifulSoup into JSON.

So far I've managed to isolate all the rows, though I'm not sure how to work with the data from here. Any advice would be very much appreciated.

[<tr><td><strong>Balance</strong></td><td><strong>$18.30</strong></td></tr>, 
<tr><td>Card name</td><td>Name</td></tr>, 
<tr><td>Account holder</td><td>NAME</td></tr>, 
<tr><td>Card number</td><td>1234</td></tr>, 
<tr><td>Status</td><td>Active</td></tr>]

(Line breaks mine for readability)

This was my attempt:

result = []
allrows = table.tbody.findAll('tr')
for row in allrows:
    result.append([])
    allcols = row.findAll('td')
    for col in allcols:
        thestrings = [unicode(s) for s in col.findAll(text=True)]
        thetext = ''.join(thestrings)
        result[-1].append(thetext)

which gave me the following result:

[
 [u'Card balance', u'$18.30'],
 [u'Card name', u'NAMEn'],
 [u'Account holder', u'NAME'],
 [u'Card number', u'1234'],
 [u'Status', u'Active']
]
like image 277
declanjscott Avatar asked Aug 31 '13 04:08

declanjscott


1 Answers

Probably your data is something like:

html_data = """
<table>
  <tr>
    <td>Card balance</td>
    <td>$18.30</td>
  </tr>
  <tr>
    <td>Card name</td>
    <td>NAMEn</td>
  </tr>
  <tr>
    <td>Account holder</td>
    <td>NAME</td>
  </tr>
  <tr>
    <td>Card number</td>
    <td>1234</td>
  </tr>
  <tr>
    <td>Status</td>
    <td>Active</td>
  </tr>
</table>
"""

From which we can get your result as a list using this code:

from bs4 import BeautifulSoup
table_data = [[cell.text for cell in row("td")]
                         for row in BeautifulSoup(html_data)("tr")]

To convert the result to JSON, if you don't care about the order:

import json
print json.dumps(dict(table_data))

Result:

{
    "Status": "Active",
    "Card name": "NAMEn",
    "Account holder":
    "NAME", "Card number": "1234",
    "Card balance": "$18.30"
}

If you need the same order, use this:

from collections import OrderedDict
import json
print json.dumps(OrderedDict(table_data))

Which gives you:

{
    "Card balance": "$18.30",
    "Card name": "NAMEn",
    "Account holder": "NAME",
    "Card number": "1234",
    "Status": "Active"
}
like image 79
H.D. Avatar answered Sep 19 '22 11:09

H.D.