BeautifulSoup HTML table parsing

Tags:

I am trying to parse information (html tables) from this site: http://www.511virginia.org/RoadConditions.aspx?j=All&r=1

Currently I am using BeautifulSoup and the code I have looks like this

from mechanize import Browser
from BeautifulSoup import BeautifulSoup

mech = Browser()

url = "http://www.511virginia.org/RoadConditions.aspx?j=All&r=1"
page = mech.open(url)

html = page.read()
soup = BeautifulSoup(html)

table = soup.find("table")

rows = table.findAll('tr')[3]

cols = rows.findAll('td')

roadtype = cols[0].string
start = cols.[1].string
end = cols[2].string
condition = cols[3].string
reason = cols[4].string
update = cols[5].string

entry = (roadtype, start, end, condition, reason, update)

print entry

The issue is with the start and end columns. They just get printed as "None"

Output:

(u'Rt. 613N (Giles County)', None, None, u'Moderate', u'snow or ice', u'01/13/2010 10:50 AM')

I know that they get stored in the columns list, but it seems that the extra link tag is messing up the parsing with the original html looking like this:

<td headers="road-type" class="ConditionsCellText">Rt. 613N (Giles County)</td>
<td headers="start" class="ConditionsCellText"><a href="conditions.aspx?lat=37.43036753&long=-80.51118005#viewmap">Big Stony Ck Rd; Rt. 635E/W (Giles County)</a></td>
<td headers="end" class="ConditionsCellText"><a href="conditions.aspx?lat=37.43036753&long=-80.51118005#viewmap">Cabin Ln; Rocky Mount Rd; Rt. 721E/W (Giles County)</a></td>
<td headers="condition" class="ConditionsCellText">Moderate</td>
<td headers="reason" class="ConditionsCellText">snow or ice</td>
<td headers="update" class="ConditionsCellText">01/13/2010 10:50 AM</td>

so what should be printed is:

(u'Rt. 613N (Giles County)', u'Big Stony Ck Rd; Rt. 635E/W (Giles County)', u'Cabin Ln; Rocky Mount Rd; Rt. 721E/W (Giles County)', u'Moderate', u'snow or ice', u'01/13/2010 10:50 AM')

Any suggestions or help is appreciated, and thank you in advance.

356

asked Jan 13 '10 18:01

Stephen Tanner

1 Answers

start = cols[1].find('a').string

or simpler

start = cols[1].a.string

or better

start = str(cols[1].find(text=True))

and

entry = [str(x) for x in cols.findAll(text=True)]

answered Oct 30 '22 02:10

Antony Hatchkins

Related questions
                            
                                Extending python with C: Pass a list to PyArg_ParseTuple
                            
                                How does one insert a key value pair into a python list?
                            
                                sys.stdin.readline() and input(): which one is faster when reading lines of input, and why?
                            
                                Create hash value for each row of data with selected columns in dataframe in python pandas
                            
                                How do you configure Django to send mail through Postfix? [closed]
                            
                                How do I dissolve a pattern in a numpy array?
                            
                                how to split a dataset into training and validation set keeping ratio between classes?
                            
                                How to change the range of the x-axis and y-axis in matlibplot?
                            
                                Django rest framework: override create() in ModelSerializer passing an extra parameter
                            
                                Error handling in Python-MySQL
                            
                                How to explore a decision tree built using scikit learn
                            
                                Binding list to params in Pandas read_sql_query with other params
                            
                                No schema has been selected to create in ... error
                            
                                Python Selenium - Wait until next page has loaded after form submit
                            
                                TypeError: the JSON object must be str, not 'dict'
                            
                                TensorFlow TypeError: Value passed to parameter input has DataType uint8 not in list of allowed values: float16, float32
                            
                                Cosine similarity between each row in a Dataframe in Python
                            
                                Dictionary in Go
                            
                                Pandas Dataframe: plot colors by column name
                            
                                Performance of list(...).insert(...)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

BeautifulSoup HTML table parsing

Tags:

python

html-table

html-parsing

beautifulsoup

mechanize

Stephen Tanner

People also ask

1 Answers

Antony Hatchkins

Recent Activity

Donate For Us