I am trying to scrape a dynamic page using BeautifulSoup. After accessing the said page from https://www.nemlig.com/ with the help of Selenium (and thanks to the code advice from @cruisepandey) like this:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
driver = webdriver.Chrome(executable_path = r'C:\Users\user\lib\chromedriver_77.0.3865.40.exe')
wait = WebDriverWait(driver,10)
driver.maximize_window()
driver.get("https://www.nemlig.com/")
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".timeslot-prompt.initial-animation-done")))
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input[type='tel'][class^='pro']"))).send_keys('2300')
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, ".btn.prompt__button"))).click()
I am prompted with this page that I want to scrape.

More precisely, at this point, I want to scrape the rows from the right-hand side of the page. If you look through the HTML code behind these you will notice that the div class time-block__row has 3 different data-automation attributes for the main 3 times of the day.
<div class="time-block__row" data-automation="beforDinnerRowTmSlt">
<div class="time-block__row-header">Formiddag</div>
<div class="no-timeslots ng-hide" ng-show="$ctrl.timeslotDays[$ctrl.selectedDateIndex].morningHours == 0">
Ingen levering..
</div>
<!----><!----><div class="time-block__item duration-1 disabled" ng-repeat="item in $ctrl.selectedHours track by $index" ng-if="item.StartHour >= 0 && item.StartHour < 12" ng-click="$ctrl.setActiveTimeslot(item, $index)" ng-class="['duration-1', {'cheapest': item.IsCheapHour, 'event': item.IsEventSlot, 'selected': $ctrl.selectedTimeId == item.Id || $ctrl.selectedTimeIndex == $index, 'disabled': item.isUnavailable()}]" data-automation="notActiveSltTmSlt">
<div class="time-block__inner-container">
<div class="time-block__time">8-9</div>
<div class="time-block__attributes">
<!----></div>
<div class="time-block__cost">29 kr.</div>
So Formiddag (Morning) has data-automation = "beforDinnerRowTmSlt", Eftermiddag (Afternoon) has data-automation = "afternoonRowTmSlt" and Aften (Evening) has data-automation = "eveningRowTmSlt".
page_source = wait.until(driver.page_source)
soup = BeautifulSoup(page_source)
time_of_the_day = soup.find('div', class_='time-block__row').text
using the code above, time_of_the_day only contains information from the Morning rows.
How can I scrape these rows properly using the data-automation attribute? How can I possibly access the other 2 div classes and their child divs? My plan is to create a dataframe containing something like this:
Time_of_the_day Hours Price Day
Formiddag 8-9 29kr. Tor. 10/10
.... .... .... ....
Eftermiddag 12-13 29kr. Tor. 10/10
.... .... .... ....
The day column will contain the output from here: day = soup.find('div', class_='content').text
I know this is quite a lengthy post but hopefully I've made it easy to understand the task and you will be able to help me out with advice, tips or code!
Here is code to get all those values.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
import time
import pandas as pd
driver = webdriver.Chrome(executable_path = r'C:\Users\user\lib\chromedriver_77.0.3865.40.exe')
wait = WebDriverWait(driver,10)
driver.maximize_window()
driver.get("https://www.nemlig.com/")
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".timeslot-prompt.initial-animation-done")))
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input[type='tel'][class^='pro']"))).send_keys('2300')
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, ".btn.prompt__button"))).click()
time.sleep(3)
soup=BeautifulSoup(driver.page_source,'html.parser')
time_of_day=[]
price=[]
Hours=[]
day=[]
for morn in soup.select_one('[data-automation="beforDinnerRowTmSlt"]').select('.time-block__time'):
time_of_day.append(soup.select_one('[data-automation="beforDinnerRowTmSlt"] > .time-block__row-header').text)
Hours.append(morn.text)
price.append(morn.find_next(class_="time-block__cost").text)
day.append(soup.select_one('.date-block.selected [data-automation="dayNmTmSlt"]').text + " " + soup.select_one('.date-block.selected [data-automation="dayDateTmSlt"]').text)
df = pd.DataFrame({"time_of_day":time_of_day,"Hours":Hours,"price":price,"Day":day})
print(df)
time_of_day=[]
price=[]
Hours=[]
day=[]
for after in soup.select_one('[data-automation="afternoonRowTmSlt"]').select('.time-block__time'):
time_of_day.append(soup.select_one('[data-automation="afternoonRowTmSlt"] > .time-block__row-header').text)
Hours.append(after.text)
price.append(after.find_next(class_="time-block__cost").text)
day.append(soup.select_one('.date-block.selected [data-automation="dayNmTmSlt"]').text + " " + soup.select_one('.date-block.selected [data-automation="dayDateTmSlt"]').text)
df = pd.DataFrame({"time_of_day":time_of_day,"Hours":Hours,"price":price,"Day":day})
print(df)
time_of_day=[]
price=[]
Hours=[]
day=[]
for evenin in soup.select_one('[data-automation="eveningRowTmSlt"]').select('.time-block__time'):
time_of_day.append(soup.select_one('[data-automation="eveningRowTmSlt"] > .time-block__row-header').text)
Hours.append(evenin.text)
price.append(evenin.find_next(class_="time-block__cost").text)
day.append(soup.select_one('.date-block.selected [data-automation="dayNmTmSlt"]').text + " " + soup.select_one('.date-block.selected [data-automation="dayDateTmSlt"]').text)
df = pd.DataFrame({"time_of_day":time_of_day,"Hours":Hours,"price":price,"Day":day})
print(df)
Output:
Day Hours price time_of_day
0 fre. 11/10 8-9 29 kr. Formiddag
1 fre. 11/10 9-10 29 kr. Formiddag
2 fre. 11/10 10-11 39 kr. Formiddag
3 fre. 11/10 11-12 39 kr. Formiddag
Day Hours price time_of_day
0 fre. 11/10 12-13 29 kr. Eftermiddag
1 fre. 11/10 13-14 29 kr. Eftermiddag
2 fre. 11/10 14-15 29 kr. Eftermiddag
3 fre. 11/10 15-16 29 kr. Eftermiddag
4 fre. 11/10 16-17 29 kr. Eftermiddag
5 fre. 11/10 17-18 19 kr. Eftermiddag
Day Hours price time_of_day
0 fre. 11/10 18-19 29 kr. Aften
1 fre. 11/10 19-20 19 kr. Aften
2 fre. 11/10 20-21 29 kr. Aften
3 fre. 11/10 21-22 19 kr. Aften
Edited
soup=BeautifulSoup(driver.page_source,'html.parser')
time_of_day=[]
price=[]
Hours=[]
day=[]
disabled=[]
for morn,d in zip(soup.select_one('[data-automation="beforDinnerRowTmSlt"]').select('.time-block__time'),soup.select_one('[data-automation="beforDinnerRowTmSlt"]').select('.time-block__item')):
time_of_day.append(soup.select_one('[data-automation="beforDinnerRowTmSlt"] > .time-block__row-header').text)
Hours.append(morn.text)
price.append(morn.find_next(class_="time-block__cost").text)
day.append(soup.select_one('.date-block.selected [data-automation="dayNmTmSlt"]').text + " " + soup.select_one('.date-block.selected [data-automation="dayDateTmSlt"]').text)
if 'disabled' in d['class']:
disabled.append('1')
else:
disabled.append('0')
for after,d in zip(soup.select_one('[data-automation="afternoonRowTmSlt"]').select('.time-block__time'),soup.select_one('[data-automation="afternoonRowTmSlt"]').select('.time-block__item')):
time_of_day.append(soup.select_one('[data-automation="afternoonRowTmSlt"] > .time-block__row-header').text)
Hours.append(after.text)
price.append(after.find_next(class_="time-block__cost").text)
day.append(soup.select_one('.date-block.selected [data-automation="dayNmTmSlt"]').text + " " + soup.select_one('.date-block.selected [data-automation="dayDateTmSlt"]').text)
if 'disabled' in d['class']:
disabled.append('1')
else:
disabled.append('0')
for evenin,d in zip(soup.select_one('[data-automation="eveningRowTmSlt"]').select('.time-block__time'),soup.select_one('[data-automation="eveningRowTmSlt"]').select('.time-block__item')):
time_of_day.append(soup.select_one('[data-automation="eveningRowTmSlt"] > .time-block__row-header').text)
Hours.append(evenin.text)
price.append(evenin.find_next(class_="time-block__cost").text)
day.append(soup.select_one('.date-block.selected [data-automation="dayNmTmSlt"]').text + " " + soup.select_one('.date-block.selected [data-automation="dayDateTmSlt"]').text)
if 'disabled' in d['class']:
disabled.append('1')
else:
disabled.append('0')
df = pd.DataFrame({"time_of_day":time_of_day,"Hours":Hours,"price":price,"Day":day,"Disabled" : disabled})
print(df)
Output:
Day Disabled Hours price time_of_day
0 fre. 11/10 1 8-9 29 kr. Formiddag
1 fre. 11/10 1 9-10 29 kr. Formiddag
2 fre. 11/10 0 10-11 39 kr. Formiddag
3 fre. 11/10 0 11-12 39 kr. Formiddag
4 fre. 11/10 0 12-13 29 kr. Eftermiddag
5 fre. 11/10 0 13-14 29 kr. Eftermiddag
6 fre. 11/10 0 14-15 19 kr. Eftermiddag
7 fre. 11/10 0 15-16 29 kr. Eftermiddag
8 fre. 11/10 0 16-17 29 kr. Eftermiddag
9 fre. 11/10 0 17-18 29 kr. Eftermiddag
10 fre. 11/10 0 18-19 29 kr. Aften
11 fre. 11/10 0 19-20 19 kr. Aften
12 fre. 11/10 0 20-21 29 kr. Aften
13 fre. 11/10 0 21-22 19 kr. Aften
You can use soup.find_all:
from bs4 import BeautifulSoup as soup
import re
... #rest of your current selenium code
d = soup(driver.page_source, 'html.parser')
r, _day = [[i.div.text, [['disabled' in k['class'], k.find_all('div', {'class':re.compile('time-block__time|ime-block__cost')})] for k in i.find_all('div', {'class':'time-block__item'})]] for i in d.find_all('div', {'class':'time-block__row'})], d.find('div', {'class':'content'}).get_text(strip=True)
new_r = [[a, [[int(j), *[i.text for i in b]] for j, b in k]] for a, k in r]
new_data = [[a, *i, _day] for a, b in new_r for i in b]
To convert your results to a dataframe:
import pandas as pd
df = pd.DataFrame([dict(zip(['Time_of_the_day', 'Disabled', 'Hours', 'Price', 'Day'], i)) for i in new_data])
Output:
Day Disabled Hours Price Time_of_the_day
0 fre.11/10 1 8-9 29 kr. Formiddag
1 fre.11/10 1 9-10 29 kr. Formiddag
2 fre.11/10 1 10-11 39 kr. Formiddag
3 fre.11/10 0 11-12 39 kr. Formiddag
4 fre.11/10 0 12-13 29 kr. Eftermiddag
....
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With