I want to delete all rows after the row containing the string "End of the 4th Quarter". Currently, this is row 474 but it will change depending on the game.
from bs4 import BeautifulSoup
import requests
import pandas as pd
import re
url = "http://www.espn.com/nba/playbyplay?gameId=400900395"
r = requests.get(url)
data = r.text
soup = BeautifulSoup(data,"html.parser")
data_rows = soup.findAll("tr")[4:]
play_data = []
for i in range(len(data_rows)):
play_row = []
for td in data_rows[i].findAll('td'):
play_row.append(td.getText())
play_data.append(play_row)
df = pd.DataFrame(play_data)
df.to_html("pbp_data")
Here is how I would tackle it:
ur_row = your_df.ix[your_df['Column_Name_Here']=='End of the 4th Quarter'].index.tolist()
ur_row
is getting the index number of the row that meets the condition. Then we use slicing to get everythin up to that row. (The +1
is to capture the row including "End of 4th Quarter")
df.iloc[:ur_row[0]+1]
Hope this is simple to follow. I will gladly explain more if need be!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With