Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Set value of first item in slice in python pandas

So I would like make a slice of a dataframe and then set the value of the first item in that slice without copying the dataframe. For example:

df = pandas.DataFrame(numpy.random.rand(3,1))
df[df[0]>0][0] = 0

The slice here is irrelevant and just for the example and will return the whole data frame again. Point being, by doing it like it is in the example you get a setting with copy warning (understandably). I have also tried slicing first and then using ILOC/IX/LOC and using ILOC twice, i.e. something like:

df.iloc[df[0]>0,:][0] = 0
df[df[0]>0,:].iloc[0] = 0

And neither of these work. Again- I don't want to make a copy of the dataframe even if it id just the sliced version.

EDIT: It seems there are two ways, using a mask or IdxMax. The IdxMax method seems to work if your index is unique, and the mask method if not. In my case, the index is not unique which I forgot to mention in the initial post.

like image 620
RexFuzzle Avatar asked Feb 28 '17 18:02

RexFuzzle


People also ask

How do you slice first row in Pandas?

Get the First Row of Pandas using iloc[] This method is used to access the row by using row numbers. We can get the first row by using 0 indexes.

How do you get the first element in a Pandas series?

Accessing the First Element The first element is at the index 0 position. So it is accessed by mentioning the index value in the series. We can use both 0 or the custom index to fetch the value.

What is first () in Pandas?

Pandas DataFrame first() Method The first() method returns the first n rows, based on the specified value. The index have to be dates for this method to work as expected.

How do I change the index of a data frame?

To reset the index in pandas, you simply need to chain the function . reset_index() with the dataframe object. On applying the . reset_index() function, the index gets shifted to the dataframe as a separate column.


1 Answers

I think you can use idxmax for get index of first True value and then set by loc:

np.random.seed(1)
df = pd.DataFrame(np.random.randint(4, size=(5,1)))
print (df)
   0
0  1
1  3
2  0
3  0
4  3

print ((df[0] == 0).idxmax())
2

df.loc[(df[0] == 0).idxmax(), 0] = 100
print (df)
     0
0    1
1    3
2  100
3    0
4    3

df.loc[(df[0] == 3).idxmax(), 0] = 200
print (df)
     0
0    1
1  200
2    0
3    0
4    3

EDIT:

Solution with not unique index:

np.random.seed(1)
df = pd.DataFrame(np.random.randint(4, size=(5,1)), index=[1,2,2,3,4])
print (df)
   0
1  1
2  3
2  0
3  0
4  3

df = df.reset_index()
df.loc[(df[0] == 3).idxmax(), 0] = 200
df = df.set_index('index')
df.index.name = None
print (df)
     0
1    1
2  200
2    0
3    0
4    3

EDIT1:

Solution with MultiIndex:

np.random.seed(1)
df = pd.DataFrame(np.random.randint(4, size=(5,1)), index=[1,2,2,3,4])
print (df)
   0
1  1
2  3
2  0
3  0
4  3

df.index = [np.arange(len(df.index)), df.index]
print (df)
     0
0 1  1
1 2  3
2 2  0
3 3  0
4 4  3

df.loc[(df[0] == 3).idxmax(), 0] = 200
df = df.reset_index(level=0, drop=True)

print (df)
     0
1    1
2  200
2    0
3    0
4    3

EDIT2:

Solution with double cumsum:

np.random.seed(1)
df = pd.DataFrame([4,0,4,7,4], index=[1,2,2,3,4])
print (df)
   0
1  4
2  0
2  4
3  7
4  4

mask = (df[0] == 0).cumsum().cumsum()
print (mask)
1    0
2    1
2    2
3    3
4    4
Name: 0, dtype: int32

df.loc[mask == 1, 0] = 200
print (df)
     0
1    4
2  200
2    4
3    7
4    4
like image 117
jezrael Avatar answered Sep 20 '22 21:09

jezrael