Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas setting column prevents setting individual cell to iterable

Tags:

pandas

Not sure if I'm doing something wrong, or if this is an issue with Pandas. I am seeing a problem where I can

  • Set individual cells to a value that is an iterable (for example a tuple) using .loc[]
  • BUT if I first set an entire column, using [], then afterwords I can no longer set individual cells to an iterable using .loc[]

For example, start with a 4x3 dataframe filled with None's:

In [1]: import pandas as pd
In [2]: nones = [None]*4
In [3]: df = pd.DataFrame(dict(A=nones,B=nones,C=nones))
In [4]: df
Out[4]:
      A     B     C
0  None  None  None
1  None  None  None
2  None  None  None
3  None  None  None

Now set an individual cell to a tuple:

In [5]: df.loc[0,'A'] = ('x','y')
In [6]: df
Out[6]:
        A     B     C
0  (x, y)  None  None
1    None  None  None
2    None  None  None
3    None  None  None

No problem. But if we repeat the above process, but set a column first, it doesn't work:

In [1]: import pandas as pd
In [2]: nones = [None]*4
In [3]: df = pd.DataFrame(dict(A=nones,B=nones,C=nones))
In [4]: df
Out[4]:
      A     B     C
0  None  None  None
1  None  None  None
2  None  None  None
3  None  None  None

In [5]: df['B'] = [4,3,2,1]
In [6]: df
Out[6]:
      A  B     C
0  None  4  None
1  None  3  None
2  None  2  None
3  None  1  None


In [7]: df.loc[0,'A'] = ('x','y')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-32-767de72f6ae1> in <module>
----> 1 df.loc[0,'A'] = ('x','y')

~/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py in __setitem__(self, key, value)
    669             key = com.apply_if_callable(key, self.obj)
    670         indexer = self._get_setitem_indexer(key)
--> 671         self._setitem_with_indexer(indexer, value)
    672
    673     def _validate_key(self, key, axis: int):

~/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py in _setitem_with_indexer(self,     indexer, value)
   1017                     if len(labels) != len(value):
   1018                         raise ValueError(
-> 1019                             "Must have equal len keys and value "
   1020                             "when setting with an iterable"
   1021                         )

ValueError: Must have equal len keys and value when setting with an iterable

(It seems to me, based on the code raising the exception, that pandas now thinks that, since I have a tuple on the right side, I am trying to set more than one cell and I don't have the correct number of elements)

Notice also if when setting the entire column, I leave at least one of the elements as None, then the problem does not appear, and I can set a single cell to an iterable:

In [1]: import pandas as pd
In [2]: nones = [None]*4
In [3]: df = pd.DataFrame(dict(A=nones,B=nones,C=nones))
In [4]: df
Out[4]:
      A     B     C
0  None  None  None
1  None  None  None
2  None  None  None
3  None  None  None

In [5]: df['B'] = (4,3,None,1)
In [6]: df
Out[6]:
      A     B     C
0  None  4     None
1  None  3     None
2  None  None  None
3  None  1     None

In [7]: df.loc[0,'A'] = ('x','y')
In [8]: df
Out[8]:
        A     B     C
0  (x, y)     4  None
1    None     3  None
2    None  None  None
3    None     1  None

There also seems to be a difference if I set the entire column using a list instead of a tuple:

In [4]: df
Out[4]:
      A     B     C
0  None  None  None
1  None  None  None
2  None  None  None
3  None  None  None

In [5]: df['B'] = [4,3,None,1]
In [6]: df
Out[6]:
      A    B     C
0  None  4    None
1  None  3    None
2  None  NaN  None
3  None  1    None

In [7]: df.loc[0,'A'] = ('x','y')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
... SAME ERROR STUFF AS BEFORE
ValueError: Must have equal len keys and value when setting with an iterable

I notice also that setting the entire column with a tuple that includes None, results in a None in that cell, but setting the entire column with a list that includes None, results in NaN in that cell.

Does anyone know what's going on here? Why does the behavior of setting an individual cell with .loc[] appear to be inconsistent, depending upon what I did to the dataframe BEFORE that??

Thanks in advance.


P.S. I also tried the above using .loc[] to set the entire column:

In [5] df.loc[:,'B'] = [4,3,2,1]

And I get the exact same result as when using simply df['B'] = [4,3,2,1]


P.P.S. I've noticed that df.at works:

In [7]: df.at[0,'A'] = ('x','y')
In [8]: df
Out[8]:
        A  B     C
0  (x, y)  4  None
1    None  3  None
2    None  2  None
3    None  1  None

But the question still remains (even if .at[] is the preferred method) why does .loc[] work sometimes and not others??

like image 854
Daniel Goldfarb Avatar asked Sep 06 '25 19:09

Daniel Goldfarb


1 Answers

You could always try using df.at to set values.

at always accesses a single value for a row/column label pair. Similar to loc, in that both provide label-based lookups.

In all cases you described above, it will not fail. Please check this:

In [458]: df = pd.DataFrame(dict(A=nones,B=nones,C=nones))                                                                                                                                                  

In [459]: df['B'] = [4,3,2,1]  

In [461]: df.at[0,'A'] = ('x','y')                                                                                                                                                                          

In [462]: df                                                                                                                                                                                                
Out[462]: 
        A  B     C
0  (x, y)  4  None
1    None  3  None
2    None  2  None
3    None  1  None

From the df.at docs:

Use at if you only need to get or set a single value in a DataFrame or Series.

This seems exactly the case with your question, trying to set just one-value.

Advantage: at is much faster than loc.

The question arises, Why use loc at all then?

Answer: at is meant to access a scalar, i.e, a single element in the dataframe, while loc is meant to access several elements at the same time, potentially to perform vectorized operations.

Disadvantage:

You can't use arrays for indexers with at as you can with loc.

like image 96
Mayank Porwal Avatar answered Sep 11 '25 04:09

Mayank Porwal



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!