Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: Index updating and changing value accessed by location

I have two index-related questions on Python Pandas dataframes.

import pandas as pd
import numpy as np
df = pd.DataFrame({'id' : range(1,9),
                'B' : ['one', 'one', 'two', 'three',
                       'two', 'three', 'one', 'two'],
                'amount' : np.random.randn(8)})

df = df.ix[df.B != 'three'] # remove where B = three
df.index
>>  Int64Index([0, 1, 2, 4, 6, 7], dtype=int64) # the original index is preserved.

1) I do not understand why the indexing is not automatically updated after I modify the dataframe. Is there a way to automatically update the indexing while modifying a dataframe? If not, what is the most efficient manual way to do this?

2) I want to be able to set the B column of the 5th element of df to 'three'. But df.iloc[5]['B'] = 'three' does not do that. I checked on the manual but it does not cover how to change a specific cell value accessed by location.

If I were accessing by row name, I could do: df.loc[5,'B'] = 'three' but I don't know what the index access equivalent is.

P.S. Link1 and link2 are relevant answers to my second question. However, they do not answer my question.

like image 463
Zhubarb Avatar asked Jan 08 '14 13:01

Zhubarb


People also ask

How do I change the index value of a column in pandas?

To reset the index in pandas, you simply need to chain the function . reset_index() with the dataframe object. On applying the . reset_index() function, the index gets shifted to the dataframe as a separate column.

How do you replace a value in an index in a data frame?

You can easily replace a value in pandas data frames by just specifying its column and its index. Having the dataframe above, we will replace some of its values. We are using the loc function of pandas. The first variable is the index of the value we want to replace and the second is its column.

How can you change the index of pandas series?

It is possible to specify or change the index labels of a pandas Series object after creation also. It can be done by using the index attribute of the pandas series constructor.

Does pandas apply work in place?

No, the apply() method doesn't contain an inplace parameter, unlike these pandas methods which have an inplace parameter: df.


2 Answers

1) I do not understand why the indexing is not automatically updated after I modify the dataframe.

If you want to reset the index after removing/adding rows you can do this:

df = df[df.B != 'three'] # remove where B = three
df.reset_index(drop=True)

       B    amount  id
0    one    -1.176137    1
1    one     0.434470    2
2    two    -0.887526    3
3    two     0.126969    5
4    one     0.090442    7
5    two    -1.511353    8

Indexes are meant to label/tag/id a row... so you might think about making your 'id' column the index, and then you'll appreciate that Pandas doesn't 'automatically update' the index when deleting rows.

df.set_index('id')

       B    amount
id      
1    one    -0.410671
2    one     0.092931
3    two    -0.100324
4    three   0.322580
5    two    -0.546932
6    three  -2.018198
7    one    -0.459551
8    two     1.254597

2) I want to be able to set the B column of the 5th element of df to 'three'. But df.iloc[5]['B'] = 'three' does not do that. I checked on the manual but it does not cover how to change a specific cell value accessed by location.

Jeff already answered this...

like image 181
Brian Wylie Avatar answered Oct 13 '22 08:10

Brian Wylie


In [5]: df = pd.DataFrame({'id' : range(1,9),
   ...:                 'B' : ['one', 'one', 'two', 'three',
   ...:                        'two', 'three', 'one', 'two'],
   ...:                 'amount' : np.random.randn(8)})

In [6]: df
Out[6]: 
       B    amount  id
0    one -1.236735   1
1    one -0.427070   2
2    two -2.330888   3
3  three -0.654062   4
4    two  0.587660   5
5  three -0.719589   6
6    one  0.860739   7
7    two -2.041390   8

[8 rows x 3 columns]

Your question 1) your code above is correct (see @Briford Wylie for resetting the index, which is what I think you want)

In [7]: df.ix[df.B!='three']
Out[7]: 
     B    amount  id
0  one -1.236735   1
1  one -0.427070   2
2  two -2.330888   3
4  two  0.587660   5
6  one  0.860739   7
7  two -2.041390   8

[6 rows x 3 columns]

In [8]: df = df.ix[df.B!='three']

In [9]: df.index
Out[9]: Int64Index([0, 1, 2, 4, 6, 7], dtype='int64')

In [10]: df.iloc[5]
Out[10]: 
B             two
amount   -2.04139
id              8
Name: 7, dtype: object

Question 2):

You are trying to set a copy; In 0.13 this will raise/warn. see here

In [11]: df.iloc[5]['B'] = 5
/usr/local/bin/ipython:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.

In [24]: df.iloc[5,df.columns.get_indexer(['B'])] = 'foo'

In [25]: df
Out[25]: 
     B    amount  id
0  one -1.236735   1
1  one -0.427070   2
2  two -2.330888   3
4  two  0.587660   5
6  one  0.860739   7
7  foo -2.041390   8

[6 rows x 3 columns]

You can also do this. This is NOT setting a copy and since it selects a Series (that is what df['B'] is, then it CAN be set directly

In [30]: df['B'].iloc[5] = 5

In [31]: df
Out[31]: 
     B    amount  id
0  one -1.236735   1
1  one -0.427070   2
2  two -2.330888   3
4  two  0.587660   5
6  one  0.860739   7
7    5 -2.041390   8

[6 rows x 3 columns]
like image 42
Jeff Avatar answered Oct 13 '22 06:10

Jeff