Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python 3.x - iloc throws error - "single positional indexer is out-of-bounds"

I am scraping election data from a website and trying to store it in a dataframe

import pandas as pd
import bs4
import requests

columns = ['Candidate','Party','Criminal Cases','Education','Age','Total Assets','Liabilities']

df = pd.DataFrame(columns = columns)

ind=1

url = requests.get("http://myneta.info/up2007/index.php?action=show_candidates&constituency_id=341")
soup = bs4.BeautifulSoup(url.content)

for content in soup.findAll("td")[16:]:
    df.iloc[ind//7,ind%7-1] = content.text
    ind=ind+1
print(df)

Essentially, each iteration of content.text will provide me a value which I will populate in the table. The loop will populate values to df in the following sequence -

df[0,0]
df[0,1]
df[0,2]
.
.
.
df[1,0]
df[1,1]
.
.

and so on. Unfortunately the iloc is throwing an error - "single positional indexer is out-of-bounds". The funny part is when I try df.iloc[0,0] = content.text outside the for loop (in a separate cell for testing purpose), the code works properly, but in the for loop it creates an error. I believe it might be something trivial but i am unable to understand.Please help

like image 697
Rohan Bapat Avatar asked Jun 22 '16 05:06

Rohan Bapat


1 Answers

DataFrame.iloc cannot enlarge its target object. This used to be the error message, but has changed since version 0.15.

In general a DataFrame is not meant to be built row at a time. It is very inefficient. Instead you should create a more traditional data structure and populate a DataFrame from it:

table = soup.find(id='table1')
rows = table.find_all('tr')[1:]
data = [[cell.text for cell in row.find_all('td')] for row in rows]
df = pd.DataFrame(data=data, columns=columns)

From inspecting the page in your request it seems you were after the table with the id "table1", which has as the first row the header (a poor choice from the authors of that page, should've been in <thead>, not the body). So skip the first row ([1:]) and then build a list of lists from the cells of the rows.

Of course you could also just let pandas worry about parsing and all:

url = "http://myneta.info/up2007/index.php?action=show_candidates&constituency_id=341"
df = pd.read_html(url, header=0)[2]  # Pick the 3rd table in the page
like image 186
Ilja Everilä Avatar answered Nov 20 '22 06:11

Ilja Everilä