Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Assign new values to slice from MultiIndex DataFrame

I would like to modify some values from a column in my DataFrame. At the moment I have a view from select via the multi index of my original df (and modifying does change df).

Here's an example:

In [1]: arrays = [np.array(['bar', 'bar', 'baz', 'qux', 'qux', 'bar']),
                  np.array(['one', 'two', 'one', 'one', 'two', 'one']),
                  np.arange(0, 6, 1)]
In [2]: df = pd.DataFrame(randn(6, 3), index=arrays, columns=['A', 'B', 'C'])

In [3]: df
                  A         B         C
bar one 0 -0.088671  1.902021 -0.540959
    two 1  0.782919 -0.733581 -0.824522
baz one 2 -0.827128 -0.849712  0.072431
qux one 3 -0.328493  1.456945  0.587793
    two 4 -1.466625  0.720638  0.976438
bar one 5 -0.456558  1.163404  0.464295

I try to modify a slice of df to a scalar value:

In [4]: df.ix['bar', 'two', :]['A']
Out[4]:
1    0.782919
Name: A, dtype: float64

In [5]: df.ix['bar', 'two', :]['A'] = 9999
# df is unchanged

I really want to modify several values in the column (and since indexing returns a vector, not a scalar value, I think this would make more sense):

In [6]: df.ix['bar', 'one', :]['A'] = [999, 888]
# again df remains unchanged

I'm using pandas 0.11. Is there is a simple way to do this?

The current solution is to recreate df from a new one and modify values I want to. But it's not elegant and can be very heavy on complex dataframe. In my opinion the problem should come from .ix and .loc not returning a view but a copy.

like image 703
hadim Avatar asked May 30 '13 10:05

hadim


People also ask

How do you slice in MultiIndex?

You can slice a MultiIndex by providing multiple indexers. You can provide any of the selectors as if you are indexing by label, see Selection by Label, including slices, lists of labels, labels, and boolean indexers. You can use slice(None) to select all the contents of that level.

How do you slice data from a DataFrame in Python?

Slicing a DataFrame in Pandas includes the following steps:Ensure Python is installed (or install ActivePython) Import a dataset. Create a DataFrame. Slice the DataFrame.

How do you slice a DataFrame based on columns?

To slice the columns, the syntax is df. loc[:,start:stop:step] ; where start is the name of the first column to take, stop is the name of the last column to take, and step as the number of indices to advance after each extraction; for example, you can select alternate columns.


1 Answers

Sort the frame, then select/set using a tuple for the multi-index

In [12]: df = pd.DataFrame(randn(6, 3), index=arrays, columns=['A', 'B', 'C'])

In [13]: df
Out[13]: 
                  A         B         C
bar one 0 -0.694240  0.725163  0.131891
    two 1 -0.729186  0.244860  0.530870
baz one 2  0.757816  1.129989  0.893080
qux one 3 -2.275694  0.680023 -1.054816
    two 4  0.291889 -0.409024 -0.307302
bar one 5  1.697974 -1.828872 -1.004187

In [14]: df = df.sortlevel(0)

In [15]: df
Out[15]: 
                  A         B         C
bar one 0 -0.694240  0.725163  0.131891
        5  1.697974 -1.828872 -1.004187
    two 1 -0.729186  0.244860  0.530870
baz one 2  0.757816  1.129989  0.893080
qux one 3 -2.275694  0.680023 -1.054816
    two 4  0.291889 -0.409024 -0.307302

In [16]: df.loc[('bar','two'),'A'] = 9999

In [17]: df
Out[17]: 
                     A         B         C
bar one 0    -0.694240  0.725163  0.131891
        5     1.697974 -1.828872 -1.004187
    two 1  9999.000000  0.244860  0.530870
baz one 2     0.757816  1.129989  0.893080
qux one 3    -2.275694  0.680023 -1.054816
    two 4     0.291889 -0.409024 -0.307302

You can also do it with out sorting if you specify the complete index, e.g.

In [23]: df.loc[('bar','two',1),'A'] = 999

In [24]: df
Out[24]: 
                    A         B         C
bar one 0   -0.113216  0.878715 -0.183941
    two 1  999.000000 -1.405693  0.253388
baz one 2    0.441543  0.470768  1.155103
qux one 3   -0.008763  0.917800 -0.699279
    two 4    0.061586  0.537913  0.380175
bar one 5    0.857231  1.144246 -2.369694

To check the sort depth

In [27]: df.index.lexsort_depth
Out[27]: 0

In [28]: df.sortlevel(0).index.lexsort_depth
Out[28]: 3

The last part of your question, assigning with a list (note that you must have the same number of elements as you are trying to replace), and this MUST be sorted for this to work

In [12]: df.loc[('bar','one'),'A'] = [999,888]

In [13]: df
Out[13]: 
                    A         B         C
bar one 0  999.000000 -0.645641  0.369443
        5  888.000000 -0.990632 -0.577401
    two 1   -1.071410  2.308711  2.018476
baz one 2    1.211887  1.516925  0.064023
qux one 3   -0.862670 -0.770585 -0.843773
    two 4   -0.644855 -1.431962  0.232528
like image 144
Jeff Avatar answered Oct 10 '22 02:10

Jeff