<p>I am trying to scrape table data from a website.</p> <p>Here is a simple example table:</p> <pre class="prettyprint"><code>t = '<html><table>' +\ '<tr><td class="label"> a </td> <td> 1 </td></tr>' +\ '<tr><td class="label"> b </td> <td> 2 </td></tr>' +\ '<tr><td class="label"> c </td> <td> 3 </td></tr>' +\ '<tr><td class="label"> d </td> <td> 4 </td></tr>' +\ '</table></html>' </code></pre> <p>Desired parse result is <code>{' a ': ' 1 ', ' b ': ' 2 ', ' c ': ' 3 ', ' d ' : ' 4' }</code></p> <hr> <p>This is my closest attempt so far:</p> <pre class="prettyprint"><code>for tr in s.findAll('tr'): k, v = BeautifulSoup(str(tr)).findAll('td') d[str(k)] = str(v) </code></pre> <p>Result is:</p> <pre class="prettyprint"><code>{'<td class="label"> a </td>': '<td> 1 </td>', '<td class="label"> d </td>': '<td> 4 </td>', '<td class="label"> b </td>': '<td> 2 </td>', '<td class="label"> c </td>': '<td> 3 </td>'} </code></pre> <p>I'm aware of the <code>text=True</code> parameter of <code>findAll()</code> but I'm not getting the expected results when I use it.</p> <p>I'm using python 2.6 and BeautifulSoup3.</p>

<p>Try this:</p> <pre class="prettyprint"><code>from BeautifulSoup import BeautifulSoup, Comment t = '<html><table>' +\ '<tr><td class="label"> a </td> <td> 1 </td></tr>' +\ '<tr><td class="label"> b </td> <td> 2 </td></tr>' +\ '<tr><td class="label"> c </td> <td> 3 </td></tr>' +\ '<tr><td class="label"> d </td> <td> 4 </td></tr>' +\ '</table></html>' bs = BeautifulSoup(t) results = {} for row in bs.findAll('tr'): aux = row.findAll('td') results[aux[0].string] = aux[1].string print results </code></pre>

BeautifulSoup, a dictionary from an HTML table

I am trying to scrape table data from a website.

Here is a simple example table:

t = '<html><table>' +\
    '<tr><td class="label"> a </td> <td> 1 </td></tr>' +\
    '<tr><td class="label"> b </td> <td> 2 </td></tr>' +\
    '<tr><td class="label"> c </td> <td> 3 </td></tr>' +\
    '<tr><td class="label"> d </td> <td> 4 </td></tr>' +\
    '</table></html>'

Desired parse result is {' a ': ' 1 ', ' b ': ' 2 ', ' c ': ' 3 ', ' d ' : ' 4' }

This is my closest attempt so far:

for tr in s.findAll('tr'):
  k, v = BeautifulSoup(str(tr)).findAll('td')
  d[str(k)] = str(v)

Result is:

{'<td class="label"> a </td>': '<td> 1 </td>', '<td class="label"> d </td>': '<td> 4 </td>', '<td class="label"> b </td>': '<td> 2 </td>', '<td class="label"> c </td>': '<td> 3 </td>'}

I'm aware of the text=True parameter of findAll() but I'm not getting the expected results when I use it.

I'm using python 2.6 and BeautifulSoup3.

Can BeautifulSoup parse HTML?

The following code opens an MHTML file, walks through all the parts in the file, uses BeautifulSoup4 to parse parts that have content type text/html , iterates through all the tables in the body, parses each table using html_table_extractor, and prints it out.

Can BeautifulSoup handle broken HTML?

BeautifulSoup is a Python package that parses broken HTML, just like lxml supports it based on the parser of libxml2.

What is BeautifulSoup prettify?

The prettify() method will turn a Beautiful Soup parse tree into a nicely formatted Unicode string, with a separate line for each tag and each string: Python3.

Try this:

from BeautifulSoup import BeautifulSoup, Comment

t = '<html><table>' +\
    '<tr><td class="label"> a </td> <td> 1 </td></tr>' +\
    '<tr><td class="label"> b </td> <td> 2 </td></tr>' +\
    '<tr><td class="label"> c </td> <td> 3 </td></tr>' +\
    '<tr><td class="label"> d </td> <td> 4 </td></tr>' +\
    '</table></html>'

bs = BeautifulSoup(t)

results = {}
for row in bs.findAll('tr'):
    aux = row.findAll('td')
    results[aux[0].string] = aux[1].string

print results

BeautifulSoup, a dictionary from an HTML table

Tags:

python

beautifulsoup

jon

People also ask

1 Answers

mvillaress

Recent Activity

Donate For Us

BeautifulSoup, a dictionary from an HTML table

Tags:

python

beautifulsoup

jon

People also ask

1 Answers

mvillaress

Related questions

Recent Activity

Donate For Us