I have two index-related questions on Python Pandas dataframes.
import pandas as pd
import numpy as np
df = pd.DataFrame({'id' : range(1,9),
'B' : ['one', 'one', 'two', 'three',
'two', 'three', 'one', 'two'],
'amount' : np.random.randn(8)})
df = df.ix[df.B != 'three'] # remove where B = three
df.index
>> Int64Index([0, 1, 2, 4, 6, 7], dtype=int64) # the original index is preserved.
1) I do not understand why the indexing is not automatically updated after I modify the dataframe. Is there a way to automatically update the indexing while modifying a dataframe? If not, what is the most efficient manual way to do this?
2) I want to be able to set the B
column of the 5th element of df
to 'three'. But df.iloc[5]['B'] = 'three'
does not do that. I checked on the manual but it does not cover how to change a specific cell value accessed by location.
If I were accessing by row name, I could do: df.loc[5,'B'] = 'three'
but I don't know what the index access equivalent is.
P.S. Link1 and link2 are relevant answers to my second question. However, they do not answer my question.
To reset the index in pandas, you simply need to chain the function . reset_index() with the dataframe object. On applying the . reset_index() function, the index gets shifted to the dataframe as a separate column.
You can easily replace a value in pandas data frames by just specifying its column and its index. Having the dataframe above, we will replace some of its values. We are using the loc function of pandas. The first variable is the index of the value we want to replace and the second is its column.
It is possible to specify or change the index labels of a pandas Series object after creation also. It can be done by using the index attribute of the pandas series constructor.
No, the apply() method doesn't contain an inplace parameter, unlike these pandas methods which have an inplace parameter: df.
1) I do not understand why the indexing is not automatically updated after I modify the dataframe.
If you want to reset the index after removing/adding rows you can do this:
df = df[df.B != 'three'] # remove where B = three
df.reset_index(drop=True)
B amount id
0 one -1.176137 1
1 one 0.434470 2
2 two -0.887526 3
3 two 0.126969 5
4 one 0.090442 7
5 two -1.511353 8
Indexes are meant to label/tag/id a row... so you might think about making your 'id' column the index, and then you'll appreciate that Pandas doesn't 'automatically update' the index when deleting rows.
df.set_index('id')
B amount
id
1 one -0.410671
2 one 0.092931
3 two -0.100324
4 three 0.322580
5 two -0.546932
6 three -2.018198
7 one -0.459551
8 two 1.254597
2) I want to be able to set the B column of the 5th element of df to 'three'. But df.iloc[5]['B'] = 'three' does not do that. I checked on the manual but it does not cover how to change a specific cell value accessed by location.
Jeff already answered this...
In [5]: df = pd.DataFrame({'id' : range(1,9),
...: 'B' : ['one', 'one', 'two', 'three',
...: 'two', 'three', 'one', 'two'],
...: 'amount' : np.random.randn(8)})
In [6]: df
Out[6]:
B amount id
0 one -1.236735 1
1 one -0.427070 2
2 two -2.330888 3
3 three -0.654062 4
4 two 0.587660 5
5 three -0.719589 6
6 one 0.860739 7
7 two -2.041390 8
[8 rows x 3 columns]
Your question 1) your code above is correct (see @Briford Wylie for resetting the index, which is what I think you want)
In [7]: df.ix[df.B!='three']
Out[7]:
B amount id
0 one -1.236735 1
1 one -0.427070 2
2 two -2.330888 3
4 two 0.587660 5
6 one 0.860739 7
7 two -2.041390 8
[6 rows x 3 columns]
In [8]: df = df.ix[df.B!='three']
In [9]: df.index
Out[9]: Int64Index([0, 1, 2, 4, 6, 7], dtype='int64')
In [10]: df.iloc[5]
Out[10]:
B two
amount -2.04139
id 8
Name: 7, dtype: object
Question 2):
You are trying to set a copy; In 0.13 this will raise/warn. see here
In [11]: df.iloc[5]['B'] = 5
/usr/local/bin/ipython:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
In [24]: df.iloc[5,df.columns.get_indexer(['B'])] = 'foo'
In [25]: df
Out[25]:
B amount id
0 one -1.236735 1
1 one -0.427070 2
2 two -2.330888 3
4 two 0.587660 5
6 one 0.860739 7
7 foo -2.041390 8
[6 rows x 3 columns]
You can also do this. This is NOT setting a copy and since it selects a Series (that is what df['B']
is, then it CAN be set directly
In [30]: df['B'].iloc[5] = 5
In [31]: df
Out[31]:
B amount id
0 one -1.236735 1
1 one -0.427070 2
2 two -2.330888 3
4 two 0.587660 5
6 one 0.860739 7
7 5 -2.041390 8
[6 rows x 3 columns]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With