I got error message:
5205
(5219, 25)
5221
(5219, 25)
Traceback (most recent call last):
File "/Users/Chu/Documents/dssg2018/sa4.py", line 44, in <module>
df.loc[idx,word]=len(df.iloc[indices[idx]][df[word]==1])/\
IndexError: index 5221 is out of bounds for axis 0 with size 5219
when I'm traversing the data frame, the index comes from the iterators. I don't know how is this even possible? idx
directly comes from the dataframe
bt = BallTree(df[['lat','lng']], metric="haversine")
indices = bt.query_radius(df[['lat','lng']],r=(float(10)/40000)*360)
for idx,row in df.iterrows():
for word in bag_of_words:
if word in row['caption']:
print(idx)
print(df.shape)
df.loc[idx,word]=len(df.iloc[indices[idx]][df[word]==1])/\
np.max([1,len(df.iloc[indices[idx]][df[word]!=1])])
changing iloc
to loc
gives
/Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6 /Users/Chu/Documents/dssg2018/sa4.py
(-124.60334244261675, 49.36453144316216, -121.67106179949566, 50.863501888419826)
27
(5219, 25)
/Users/Chu/Documents/dssg2018/sa4.py:42: FutureWarning:
Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.
See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
df.loc[idx,word]=len(df.loc[indices[idx]][df[word]==1])/\
/Users/Chu/Documents/dssg2018/sa4.py:42: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
df.loc[idx,word]=len(df.loc[indices[idx]][df[word]==1])/\
Traceback (most recent call last):
File "/Users/Chu/Documents/dssg2018/sa4.py", line 42, in <module>
df.loc[idx,word]=len(df.loc[indices[idx]][df[word]==1])/\
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/frame.py", line 2133, in __getitem__
return self._getitem_array(key)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/frame.py", line 2173, in _getitem_array
key = check_bool_indexer(self.index, key)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/indexing.py", line 2023, in check_bool_indexer
raise IndexingError('Unalignable boolean Series provided as '
pandas.core.indexing.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match
Your index
is not from 0
to len(df)-1
, this will making df.iloc[idx]
out of boundary
For example
df = pd.DataFrame({'a': [0, 1]},index=[1,100])
for idx,row in df.iterrows():
print(idx)
print(row)
1
a 0
Name: 1, dtype: int64
100
a 1
Name: 100, dtype: int64
Then when you do
df.iloc[100]
IndexError: single positional indexer is out-of-bounds
But when you do .loc
you get the expected output.
df.loc[100]
Out[23]:
a 1
Name: 100, dtype: int64
From the file :
.iloc
:iloc[] is primarily integer position based
.loc
:.loc[] is primarily label based
Solution:
Using .loc
or df=df.reset_index(drop=True)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With