Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cannot set using loc with a string labelled index

I am using pandas v23.4 and python 3.7.0. I have two tables which have identical shapes and columns. I'd like to set an index-selected subset of one table with the same index-selected subset of another table.

It fails to work sometimes with a string index, though I'm not sure it has to do with it being a string index. Commenting out one unused column of the dataframe causes it to work.

The traceback below shows that it's somewhere in the indexing that it gets confused in figuring out if the target and source are the same length. The code is lengthy and a bit convoluted.

$ cat foo.py
import numpy as np
import pandas as pd

m = np.array([1., 2., 1., 3., 5., 5., 6., 2., 2., 1., 7., 2.,
              5., 4., 2., 5., 5., 5., 3., 8., 7., 2., 7., 6.], )
dma_l = [501, 501, 501, 501, 501, 501, 501, 501, 501, 501, 501, 501,
         502, 502, 502, 502, 502, 502, 502, 502, 502, 502, 502, 502]
size_l = [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4,
          1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4]

age_l = ['20-25', '30-35', '40-45',
         '20-25', '30-35', '40-45',
         '20-25', '30-35', '40-45',
         '20-25', '30-35', '40-45',
         '20-25', '30-35', '40-45',
         '20-25', '30-35', '40-45',
         '20-25', '30-35', '40-45',
         '20-25', '30-35', '40-45']

df = pd.DataFrame()
df['dma'] = dma_l  # <-- comment out this line and it works
df['size'] = size_l
df['age'] = age_l
df['total'] = m

df2 = df.copy() # Make a second dataframe with the same shape.

# Works with an integer index.
df.set_index('size', inplace=True)
df2.set_index('size', inplace=True)
df.loc[(1,), 'total'] = df2.loc[(1,), 'total']

# Does not work with my string index.  Removing the dma column 
# causes it to work again.    
df.set_index('age', inplace=True)
df2.set_index('age', inplace=True)
df.loc[('20-25',), 'total'] = df2.loc[('20-25',), 'total']

$ python foo.py
Traceback (most recent call last):
  File "...", line 34, in <module>
    df.loc[('20-25',), 'total'] = df2.loc[('20-25',), 'total']
  File ".../lib/python3.7/site-packages/pandas/core/indexing.py", line 189, in __setitem__
    self._setitem_with_indexer(indexer, value)
  File ".../lib/python3.7/site-packages/pandas/core/indexing.py", line 606, in _setitem_with_indexer
    raise ValueError('Must have equal len keys and value '
ValueError: Must have equal len keys and value when setting with an iterable
like image 246
HoosierDaddy Avatar asked Dec 07 '25 10:12

HoosierDaddy


1 Answers

All solutions working if same number of rows and same index values in both DataFrames.


Problem is duplicated index values, solution is create boolean mask if want working with one value of index:

df2['total'] *= 10
df.loc[df.index == 1, 'total'] = df2.loc[1, 'total']
print (df)
      dma    age  total
size                   
1     501  20-25   10.0
1     501  30-35   20.0
1     501  40-45   10.0
2     501  20-25    3.0
2     501  30-35    5.0
2     501  40-45    5.0
3     501  20-25    6.0
3     501  30-35    2.0
3     501  40-45    2.0
4     501  20-25    1.0
4     501  30-35    7.0
4     501  40-45    2.0
1     502  20-25   50.0
1     502  30-35   40.0
1     502  40-45   20.0
2     502  20-25    5.0
2     502  30-35    5.0
2     502  40-45    5.0
3     502  20-25    3.0
3     502  30-35    8.0
3     502  40-45    7.0
4     502  20-25    2.0
4     502  30-35    7.0
4     502  40-45    6.0

Or create new column by assign for more general solution:

df2['total'] *= 10
df['total1'] = df2['total']
#working with one DataFrame
df.loc[[1, 4], 'total'] = df.loc[[1, 4], 'total1']
print (df)
      dma    age  total  total1
size                           
1     501  20-25   10.0    10.0
1     501  30-35   20.0    20.0
1     501  40-45   10.0    10.0
2     501  20-25    3.0    30.0
2     501  30-35    5.0    50.0
2     501  40-45    5.0    50.0
3     501  20-25    6.0    60.0
3     501  30-35    2.0    20.0
3     501  40-45    2.0    20.0
4     501  20-25   10.0    10.0
4     501  30-35   70.0    70.0
4     501  40-45   20.0    20.0
1     502  20-25   50.0    50.0
1     502  30-35   40.0    40.0
1     502  40-45   20.0    20.0
2     502  20-25    5.0    50.0
2     502  30-35    5.0    50.0
2     502  40-45    5.0    50.0
3     502  20-25    3.0    30.0
3     502  30-35    8.0    80.0
3     502  40-45    7.0    70.0
4     502  20-25   20.0    20.0
4     502  30-35   70.0    70.0
4     502  40-45   60.0    60.0

Another solution is create mask and filter in both DataFrames:

df2['total'] *= 10

mask = df.index.isin([1,4])
df.loc[mask, 'total'] = df2.loc[mask, 'total']
print (df)
      dma    age  total
size                   
1     501  20-25   10.0
1     501  30-35   20.0
1     501  40-45   10.0
2     501  20-25    3.0
2     501  30-35    5.0
2     501  40-45    5.0
3     501  20-25    6.0
3     501  30-35    2.0
3     501  40-45    2.0
4     501  20-25   10.0
4     501  30-35   70.0
4     501  40-45   20.0
1     502  20-25   50.0
1     502  30-35   40.0
1     502  40-45   20.0
2     502  20-25    5.0
2     502  30-35    5.0
2     502  40-45    5.0
3     502  20-25    3.0
3     502  30-35    8.0
3     502  40-45    7.0
4     502  20-25   20.0
4     502  30-35   70.0
4     502  40-45   60.0
like image 167
jezrael Avatar answered Dec 10 '25 03:12

jezrael