Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Taking the first records for each group in pandas dataframe and putting 0 in other records

Tags:

python

pandas

I have a pandas dataframe df:

s = {'id': [243,243, 243, 243,443,443,443],
 'st': [1,3,5,9,2,6,7],
 'value':[2.4, 3.8, 3.7, 5.6, 1.2, 0.2, 2.1]}
df = pd.DataFrame(s)

which looks like this:

    id  st  value
0  243   1    2.4
1  243   3    3.8
2  243   5    3.7
3  243   9    5.6
4  443   2    1.2
5  443   6    0.2
6  443   7    2.1

I want to put 0 as value for all the records except of the first records for each id . My expected output is:

    id  st  value
0  243   1    2.4
1  243   3    0
2  243   5    0
3  243   9    0
4  443   2    1.2
5  443   6    0
6  443   7    0

How can I do this with a pandas dataframe?

like image 575
Archit Avatar asked May 08 '19 08:05

Archit


People also ask

How do I iterate over a group in pandas?

In above example, we'll use the function groups. get_group() to get all the groups. First we'll get all the keys of the group and then iterate through that and then calling get_group() method for each key. get_group() method will return group corresponding to the key.

What does First () do in pandas?

Pandas Series: first() function The first() function (convenience method ) is used to subset initial periods of time series data based on a date offset. Keep labels from axis which are in items. in the dataset,and therefore data for 2019-02-13 was not returned.

Can I use ILOC and loc together?

loc and iloc are interchangeable when labels are 0-based integers.

How do you take first n rows in pandas?

You can use df. head() to get the first N rows in Pandas DataFrame. Alternatively, you can specify a negative number within the brackets to get all the rows, excluding the last N rows.


2 Answers

Here's one way checking for duplicates in id and multiplying the boolean result by value:

df['value'] = (~df.id.duplicated('first')).mul(df.value)

    id  st  value
0  243   1    2.4
1  243   3    0.0
2  243   5    0.0
3  243   9    0.0
4  443   2    1.2
5  443   6    0.0
6  443   7    0.0
like image 155
yatu Avatar answered Oct 19 '22 05:10

yatu


Another way of doing this is:

df.loc[df.id.eq(df.id.shift()),'value']=0
print(df)

    id  st  value
0  243   1    2.4
1  243   3    0.0
2  243   5    0.0
3  243   9    0.0
4  443   2    1.2
5  443   6    0.0
6  443   7    0.0
like image 23
anky Avatar answered Oct 19 '22 03:10

anky