Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing rows after a certain string in pandas

I want to delete all rows after the row containing the string "End of the 4th Quarter". Currently, this is row 474 but it will change depending on the game.

from bs4 import BeautifulSoup
import requests
import pandas as pd
import re

url = "http://www.espn.com/nba/playbyplay?gameId=400900395"
r = requests.get(url)
data = r.text
soup = BeautifulSoup(data,"html.parser")

data_rows = soup.findAll("tr")[4:]

play_data = []
for i in range(len(data_rows)):
    play_row = []

    for td in data_rows[i].findAll('td'):
        play_row.append(td.getText())

    play_data.append(play_row)

df = pd.DataFrame(play_data)

df.to_html("pbp_data")
like image 753
jhaywoo8 Avatar asked Mar 15 '17 17:03

jhaywoo8


Video Answer


1 Answers

Here is how I would tackle it:

ur_row = your_df.ix[your_df['Column_Name_Here']=='End of the 4th Quarter'].index.tolist()

ur_row is getting the index number of the row that meets the condition. Then we use slicing to get everythin up to that row. (The +1 is to capture the row including "End of 4th Quarter")

df.iloc[:ur_row[0]+1]

Hope this is simple to follow. I will gladly explain more if need be!

like image 182
MattR Avatar answered Oct 04 '22 21:10

MattR