I am doing what seems to be a simple group by in Pandas. The column is a string column with no NaN's or weird strings. However, I keep getting the below error. Does anyone know why this mights happen? I feel like it may have something to do with my data, but it all seems to be ok...
I am running by_user = df.groupby('User')
and the stack trace:
by_user = df.groupby('User')
File "c:\Anaconda\lib\site-packages\pandas\core\generic.py", line 2773, in groupby
sort=sort, group_keys=group_keys, squeeze=squeeze)
File "c:\Anaconda\lib\site-packages\pandas\core\groupby.py", line 1142, in groupby
return klass(obj, by, **kwds)
File "c:\Anaconda\lib\site-packages\pandas\core\groupby.py", line 388, in __init__ level=level, sort=sort)
File "c:\Anaconda\lib\site-packages\pandas\core\groupby.py", line 2041, in _get_grouper
gpr = obj[gpr]
File "c:\Anaconda\lib\site-packages\pandas\core\frame.py", line 1678, in __getitem__
return self._getitem_column(key)
File "c:\Anaconda\lib\site-packages\pandas\core\frame.py", line 1685, in _get item_column
return self._get_item_cache(key)
File "c:\Anaconda\lib\site-packages\pandas\core\generic.py", line 1052, in _ge
t_item_cache
values = self._data.get(item)
File "c:\Anaconda\lib\site-packages\pandas\core\internals.py", line 2565, in get
loc = self.items.get_loc(item)
File "c:\Anaconda\lib\site-packages\pandas\core\index.py", line 1181, in get_loc
return self._engine.get_loc(_values_from_object(key))
File "index.pyx", line 129, in pandas.index.IndexEngine.get_loc (pandas\index.
c:3656)
File "index.pyx", line 149, in pandas.index.IndexEngine.get_loc (pandas\index.
c:3534)
File "hashtable.pyx", line 696, in pandas.hashtable.PyObjectHashTable.get_item
(pandas\hashtable.c:11911)
File "hashtable.pyx", line 704, in pandas.hashtable.PyObjectHashTable.get_item
(pandas\hashtable.c:11864)
KeyError: 'User'
df.info():
User Code 175167 non-null object
Version 175167 non-null object
Date Accessed 175167 non-null datetime64[ns]
Series 175167 non-null object
Software 175167 non-null object
User 175167 non-null object
[moved from comments]
It's easy to miss trailing whitespace in column names, but you can check df.columns
manually:
>>> df = pd.DataFrame({"User": [1,2]})
>>> df2 = pd.DataFrame({"User ": [1,2]})
>>> df
User
0 1
1 2
>>> df2
User
0 1
1 2
>>> df.columns
Index([u'User'], dtype='object')
>>> df2.columns
Index([u'User '], dtype='object')
(To peel back the curtain a bit, I suspected something like this might be going on because when I mocked up my own DataFrame and looked at df.info()
, I didn't see as much space between the column names and the numbers as your output seemed to show.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With