I have the following HTML excerpt in a format of a Python list that I'd like to turn into a dictionary. It is a timetable for everyday of the week.
[u'
<table class="hours table">\n
<tbody>\n
<tr>\n
<th scope="row">Mon</th>\n
<td>\n <span class="nowrap">2:00 pm</span> - <span class="nowrap">3:00 pm</span>
<br><span class="nowrap">5:00 pm</span> - <span class="nowrap">10:00 pm</span>\n </td>\n
<td class="extra">\n </td>\n </tr>\n\n
<tr>\n
<th scope="row">Tue</th>\n
<td>\n <span class="nowrap">2:00 pm</span> - <span class="nowrap">3:00 pm</span>
<br><span class="nowrap">5:00 pm</span> - <span class="nowrap">10:00 pm</span>\n </td>\n
<td class="extra">\n </td>\n </tr>\n\n
<tr>\n
<th scope="row">Wed</th>\n
<td>\n <span class="nowrap">2:00 pm</span> - <span class="nowrap">3:00 pm</span>
<br><span class="nowrap">5:00 pm</span> - <span class="nowrap">10:00 pm</span>\n </td>\n
<td class="extra">\n <span class="nowrap open">Open now</span>\n </td>\n </tr>\n\n
<tr>\n
<th scope="row">Thu</th>\n
<td>\n <span class="nowrap">2:00 pm</span> - <span class="nowrap">3:00 pm</span>
<br><span class="nowrap">5:00 pm</span> - <span class="nowrap">10:00 pm</span>\n </td>\n
<td class="extra">\n </td>\n </tr>\n\n
<tr>\n
<th scope="row">Fri</th>\n
<td>\n <span class="nowrap">2:00 pm</span> - <span class="nowrap">3:00 pm</span>
<br><span class="nowrap">5:00 pm</span> - <span class="nowrap">10:00 pm</span>\n </td>\n
<td class="extra">\n </td>\n </tr>\n\n
<tr>\n
<th scope="row">Sat</th>\n
<td>\n <span class="nowrap">5:00 pm</span> - <span class="nowrap">10:00 pm</span>\n </td>\n
<td class="extra">\n </td>\n </tr>\n\n
<tr>\n
<th scope="row">Sun</th>\n
<td>\n Closed\n </td>\n
<td class="extra">\n </td>\n </tr>\n\n </tbody>\n </table>']
The wishful output is:
{
'Mon': ['2:00pm - 3:00pm', '5:00pm - 10:00pm'],
'Tue': ['2:00pm - 3:00pm', '5:00pm - 10:00pm'],
'Wed': ['2:00pm - 3:00pm', '5:00pm - 10:00pm'],
'Thu': ['2:00pm - 3:00pm', '5:00pm - 10:00pm'],
'Fri': ['2:00pm - 3:00pm', '5:00pm - 10:00pm'],
'Sat': '5:00pm - 10:00pm',
'Sun': 'Closed'
}
How would you achieve this in Python 3.x? I would not mind if the 'Sat' and 'Sun' keys have values in a list format if that'd help at all. Thank you for your thoughts in advance.
Here's a solution which first reads into Pandas DataFrame, and then converts to dictionary as in your desired output:
import pandas as pd
dfs = pd.read_html(html_string)
df = dfs[0] # pd.read_html reads in all tables and returns a list of DataFrames
Giving:
0 1 2
0 Mon 2:00 pm - 3:00 pm 5:00 pm - 10:00 pm NaN
1 Tue 2:00 pm - 3:00 pm 5:00 pm - 10:00 pm NaN
2 Wed 2:00 pm - 3:00 pm 5:00 pm - 10:00 pm Open now
3 Thu 2:00 pm - 3:00 pm 5:00 pm - 10:00 pm NaN
4 Fri 2:00 pm - 3:00 pm 5:00 pm - 10:00 pm NaN
5 Sat 5:00 pm - 10:00 pm NaN
6 Sun Closed NaN
Then use groupby
and a dictionary comprehension:
summary = {k: v.iloc[0, 1].split(' ') for k, v in df.groupby(0)}
Giving:
{'Fri': ['2:00 pm - 3:00 pm', '5:00 pm - 10:00 pm'],
'Mon': ['2:00 pm - 3:00 pm', '5:00 pm - 10:00 pm'],
'Sat': ['5:00 pm - 10:00 pm'],
'Sun': ['Closed'],
'Thu': ['2:00 pm - 3:00 pm', '5:00 pm - 10:00 pm'],
'Tue': ['2:00 pm - 3:00 pm', '5:00 pm - 10:00 pm'],
'Wed': ['2:00 pm - 3:00 pm', '5:00 pm - 10:00 pm']}
You may need to edit slightly if splitting on exactly two spaces won't always work for your opening times data format.
from bs4 import BeautifulSoup
from collections import OrderedDict
from pprint import pprint
soup = BeautifulSoup(data, 'lxml')
d = OrderedDict()
for th, td in zip(soup.select('th'), soup.select('td')[::2]):
d[th.text.strip()] = td.text.strip().splitlines()
pprint(d)
Prints:
OrderedDict([('Mon', ['2:00 pm - 3:00 pm', '5:00 pm - 10:00 pm']),
('Tue', ['2:00 pm - 3:00 pm', '5:00 pm - 10:00 pm']),
('Wed', ['2:00 pm - 3:00 pm', '5:00 pm - 10:00 pm']),
('Thu', ['2:00 pm - 3:00 pm', '5:00 pm - 10:00 pm']),
('Fri', ['2:00 pm - 3:00 pm', '5:00 pm - 10:00 pm']),
('Sat', ['5:00 pm - 10:00 pm']),
('Sun', ['Closed'])])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With