I have a pandas dataframe df
:
s = {'id': [243,243, 243, 243,443,443,443],
'st': [1,3,5,9,2,6,7],
'value':[2.4, 3.8, 3.7, 5.6, 1.2, 0.2, 2.1]}
df = pd.DataFrame(s)
which looks like this:
id st value
0 243 1 2.4
1 243 3 3.8
2 243 5 3.7
3 243 9 5.6
4 443 2 1.2
5 443 6 0.2
6 443 7 2.1
I want to put 0 as value
for all the records except of the first records for each id
. My expected output is:
id st value
0 243 1 2.4
1 243 3 0
2 243 5 0
3 243 9 0
4 443 2 1.2
5 443 6 0
6 443 7 0
How can I do this with a pandas dataframe?
In above example, we'll use the function groups. get_group() to get all the groups. First we'll get all the keys of the group and then iterate through that and then calling get_group() method for each key. get_group() method will return group corresponding to the key.
Pandas Series: first() function The first() function (convenience method ) is used to subset initial periods of time series data based on a date offset. Keep labels from axis which are in items. in the dataset,and therefore data for 2019-02-13 was not returned.
loc and iloc are interchangeable when labels are 0-based integers.
You can use df. head() to get the first N rows in Pandas DataFrame. Alternatively, you can specify a negative number within the brackets to get all the rows, excluding the last N rows.
Here's one way checking for duplicates in id
and multiplying the boolean
result by value
:
df['value'] = (~df.id.duplicated('first')).mul(df.value)
id st value
0 243 1 2.4
1 243 3 0.0
2 243 5 0.0
3 243 9 0.0
4 443 2 1.2
5 443 6 0.0
6 443 7 0.0
Another way of doing this is:
df.loc[df.id.eq(df.id.shift()),'value']=0
print(df)
id st value
0 243 1 2.4
1 243 3 0.0
2 243 5 0.0
3 243 9 0.0
4 443 2 1.2
5 443 6 0.0
6 443 7 0.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With