I have a pandas dataframe <code>df</code>: <pre class="prettyprint"><code>s = {'id': [243,243, 243, 243,443,443,443], 'st': [1,3,5,9,2,6,7], 'value':[2.4, 3.8, 3.7, 5.6, 1.2, 0.2, 2.1]} df = pd.DataFrame(s) </code></pre> which looks like this: <pre class="prettyprint"><code> id st value 0 243 1 2.4 1 243 3 3.8 2 243 5 3.7 3 243 9 5.6 4 443 2 1.2 5 443 6 0.2 6 443 7 2.1 </code></pre> I want to put 0 as <code>value</code> for all the records except of the first records for each <code>id</code> . My expected output is: <pre class="prettyprint"><code> id st value 0 243 1 2.4 1 243 3 0 2 243 5 0 3 243 9 0 4 443 2 1.2 5 443 6 0 6 443 7 0 </code></pre> How can I do this with a pandas dataframe?

Here's one way checking for duplicates in <code>id</code> and multiplying the <code>boolean</code> result by <code>value</code>: <pre class="prettyprint"><code>df['value'] = (~df.id.duplicated('first')).mul(df.value) id st value 0 243 1 2.4 1 243 3 0.0 2 243 5 0.0 3 243 9 0.0 4 443 2 1.2 5 443 6 0.0 6 443 7 0.0 </code></pre>

Another way of doing this is: <pre class="prettyprint"><code>df.loc[df.id.eq(df.id.shift()),'value']=0 print(df) </code></pre> <hr> <pre class="prettyprint"><code> id st value 0 243 1 2.4 1 243 3 0.0 2 243 5 0.0 3 243 9 0.0 4 443 2 1.2 5 443 6 0.0 6 443 7 0.0 </code></pre>

Taking the first records for each group in pandas dataframe and putting 0 in other records

Tags:

python

pandas

I have a pandas dataframe df:

s = {'id': [243,243, 243, 243,443,443,443],
 'st': [1,3,5,9,2,6,7],
 'value':[2.4, 3.8, 3.7, 5.6, 1.2, 0.2, 2.1]}
df = pd.DataFrame(s)

which looks like this:

    id  st  value
0  243   1    2.4
1  243   3    3.8
2  243   5    3.7
3  243   9    5.6
4  443   2    1.2
5  443   6    0.2
6  443   7    2.1

I want to put 0 as value for all the records except of the first records for each id . My expected output is:

    id  st  value
0  243   1    2.4
1  243   3    0
2  243   5    0
3  243   9    0
4  443   2    1.2
5  443   6    0
6  443   7    0

How can I do this with a pandas dataframe?

575

asked May 08 '19 08:05

Archit

2 Answers

Here's one way checking for duplicates in id and multiplying the boolean result by value:

df['value'] = (~df.id.duplicated('first')).mul(df.value)

    id  st  value
0  243   1    2.4
1  243   3    0.0
2  243   5    0.0
3  243   9    0.0
4  443   2    1.2
5  443   6    0.0
6  443   7    0.0

155

answered Oct 19 '22 05:10

yatu

Another way of doing this is:

df.loc[df.id.eq(df.id.shift()),'value']=0
print(df)

    id  st  value
0  243   1    2.4
1  243   3    0.0
2  243   5    0.0
3  243   9    0.0
4  443   2    1.2
5  443   6    0.0
6  443   7    0.0

answered Oct 19 '22 03:10

anky

Related questions
                            
                                Skip directory name in import path by importing subpackage in __init__.py
                            
                                Numpy array with different standard deviation per row
                            
                                Pyspark error on creating dataframe: 'StructField' object has no attribute 'encode'
                            
                                How draw box across multiple axes on matplotlib using ax position as reference
                            
                                Why does custom Python object cannot be used with ParDo Fn?
                            
                                How to I make my AI algorithm play 9 board tic-tac-toe?
                            
                                ImageDataGenerator: how to add the 4th dimension to a numpy array?
                            
                                S3 Select retrieve headers in the CSV
                            
                                Building Python3.7.3 from source missing '_ctypes'
                            
                                what is the default encoding when python Requests post data is string type?
                            
                                ValueError: No module named 'notmigrations' during unit tests
                            
                                Most pythonic way to collect warnings from a function
                            
                                Create an ordered Index in sqlite db using SQLAlchemy
                            
                                ctypes.ArgumentError when using kivy with pywinauto
                            
                                Fastest way to replace part of a string in Pandas series if it contains a word in a list
                            
                                Convert column text data into features using python to use for machine learning
                            
                                Integrate Keras to SKLearn Pipeline?
                            
                                time.time() not working to run while loop for predetermined time in Cython
                            
                                Geopandas set geometry: ValueError for MultiPolygon "equal len keys and value"
                            
                                python ConfigParser read file doesn't exist

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With