Not sure if I'm doing something wrong, or if this is an issue with Pandas. I am seeing a problem where I can
For example, start with a 4x3 dataframe filled with None's:
In [1]: import pandas as pd
In [2]: nones = [None]*4
In [3]: df = pd.DataFrame(dict(A=nones,B=nones,C=nones))
In [4]: df
Out[4]:
A B C
0 None None None
1 None None None
2 None None None
3 None None None
Now set an individual cell to a tuple:
In [5]: df.loc[0,'A'] = ('x','y')
In [6]: df
Out[6]:
A B C
0 (x, y) None None
1 None None None
2 None None None
3 None None None
No problem. But if we repeat the above process, but set a column first, it doesn't work:
In [1]: import pandas as pd
In [2]: nones = [None]*4
In [3]: df = pd.DataFrame(dict(A=nones,B=nones,C=nones))
In [4]: df
Out[4]:
A B C
0 None None None
1 None None None
2 None None None
3 None None None
In [5]: df['B'] = [4,3,2,1]
In [6]: df
Out[6]:
A B C
0 None 4 None
1 None 3 None
2 None 2 None
3 None 1 None
In [7]: df.loc[0,'A'] = ('x','y')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-32-767de72f6ae1> in <module>
----> 1 df.loc[0,'A'] = ('x','y')
~/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py in __setitem__(self, key, value)
669 key = com.apply_if_callable(key, self.obj)
670 indexer = self._get_setitem_indexer(key)
--> 671 self._setitem_with_indexer(indexer, value)
672
673 def _validate_key(self, key, axis: int):
~/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py in _setitem_with_indexer(self, indexer, value)
1017 if len(labels) != len(value):
1018 raise ValueError(
-> 1019 "Must have equal len keys and value "
1020 "when setting with an iterable"
1021 )
ValueError: Must have equal len keys and value when setting with an iterable
(It seems to me, based on the code raising the exception, that pandas now thinks that, since I have a tuple on the right side, I am trying to set more than one cell and I don't have the correct number of elements)
Notice also if when setting the entire column, I leave at least one of the elements as None, then the problem does not appear, and I can set a single cell to an iterable:
In [1]: import pandas as pd
In [2]: nones = [None]*4
In [3]: df = pd.DataFrame(dict(A=nones,B=nones,C=nones))
In [4]: df
Out[4]:
A B C
0 None None None
1 None None None
2 None None None
3 None None None
In [5]: df['B'] = (4,3,None,1)
In [6]: df
Out[6]:
A B C
0 None 4 None
1 None 3 None
2 None None None
3 None 1 None
In [7]: df.loc[0,'A'] = ('x','y')
In [8]: df
Out[8]:
A B C
0 (x, y) 4 None
1 None 3 None
2 None None None
3 None 1 None
There also seems to be a difference if I set the entire column using a list instead of a tuple:
In [4]: df
Out[4]:
A B C
0 None None None
1 None None None
2 None None None
3 None None None
In [5]: df['B'] = [4,3,None,1]
In [6]: df
Out[6]:
A B C
0 None 4 None
1 None 3 None
2 None NaN None
3 None 1 None
In [7]: df.loc[0,'A'] = ('x','y')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
... SAME ERROR STUFF AS BEFORE
ValueError: Must have equal len keys and value when setting with an iterable
I notice also that setting the entire column with a tuple that includes None, results in a None in that cell, but setting the entire column with a list that includes None, results in NaN in that cell.
Does anyone know what's going on here? Why does the behavior of setting an individual cell with .loc[] appear to be inconsistent, depending upon what I did to the dataframe BEFORE that??
Thanks in advance.
P.S. I also tried the above using .loc[]
to set the entire column:
In [5] df.loc[:,'B'] = [4,3,2,1]
And I get the exact same result as when using simply df['B'] = [4,3,2,1]
P.P.S. I've noticed that df.at
works:
In [7]: df.at[0,'A'] = ('x','y')
In [8]: df
Out[8]:
A B C
0 (x, y) 4 None
1 None 3 None
2 None 2 None
3 None 1 None
But the question still remains (even if .at[]
is the preferred method) why does .loc[]
work sometimes and not others??
You could always try using df.at
to set values.
at
always accesses a single value for a row/column label pair. Similar to loc
, in that both provide label-based lookups.
In all cases you described above, it will not fail. Please check this:
In [458]: df = pd.DataFrame(dict(A=nones,B=nones,C=nones))
In [459]: df['B'] = [4,3,2,1]
In [461]: df.at[0,'A'] = ('x','y')
In [462]: df
Out[462]:
A B C
0 (x, y) 4 None
1 None 3 None
2 None 2 None
3 None 1 None
From the df.at docs
:
Use at if you only need to get or set a single value in a DataFrame or Series.
This seems exactly the case with your question, trying to set just one-value.
Advantage: at
is much faster than loc
.
The question arises, Why use loc
at all then?
Answer: at is meant to access a scalar, i.e, a single element in the dataframe, while loc is meant to access several elements at the same time, potentially to perform vectorized operations.
Disadvantage:
You can't use arrays for indexers with at
as you can with loc
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With