If I have following dataframe <pre class="prettyprint"><code>| id | timestamp | code | id2 | 10 | 2017-07-12 13:37:00 | 206 | a1 | 10 | 2017-07-12 13:40:00 | 206 | a1 | 10 | 2017-07-12 13:55:00 | 206 | a1 | 10 | 2017-07-12 19:00:00 | 206 | a2 | 11 | 2017-07-12 13:37:00 | 206 | a1 ... </code></pre> I need to group by <code>id, id2</code> columns and get the first occurrence of <code>timestamp</code> value, e.g. for <code>id=10, id2=a1, timestamp=2017-07-12 13:37:00</code>. I googled it and found some possible solutions, but cant figure out how to realize them properly. This probably should be something like: <pre class="prettyprint"><code>df.groupby(["id", "id2"])["timestamp"].apply(lambda x: ....) </code></pre>

I think you need <code>GroupBy.first</code>: <pre class="prettyprint"><code>df.groupby(["id", "id2"])["timestamp"].first() </code></pre> Or <code>drop_duplicates</code>: <pre class="prettyprint"><code>df.drop_duplicates(subset=['id','id2']) </code></pre> For same output: <pre class="prettyprint"><code>df1 = df.groupby(["id", "id2"], as_index=False)["timestamp"].first() print (df1) id id2 timestamp 0 10 a1 2017-07-12 13:37:00 1 10 a2 2017-07-12 19:00:00 2 11 a1 2017-07-12 13:37:00 df1 = df.drop_duplicates(subset=['id','id2'])[['id','id2','timestamp']] print (df1) id id2 timestamp 0 10 a1 2017-07-12 13:37:00 1 10 a2 2017-07-12 19:00:00 2 11 a1 2017-07-12 13:37:00 </code></pre>

Pandas: get the first occurrence grouping by keys

Tags:

python

pandas

If I have following dataframe

| id | timestamp           | code | id2
| 10 | 2017-07-12 13:37:00 | 206  | a1
| 10 | 2017-07-12 13:40:00 | 206  | a1
| 10 | 2017-07-12 13:55:00 | 206  | a1
| 10 | 2017-07-12 19:00:00 | 206  | a2
| 11 | 2017-07-12 13:37:00 | 206  | a1
...

I need to group by id, id2 columns and get the first occurrence of timestamp value, e.g. for id=10, id2=a1, timestamp=2017-07-12 13:37:00.

I googled it and found some possible solutions, but cant figure out how to realize them properly. This probably should be something like:

df.groupby(["id", "id2"])["timestamp"].apply(lambda x: ....)

778

asked Jul 12 '17 12:07

Novitoll

1 Answers

I think you need GroupBy.first:

df.groupby(["id", "id2"])["timestamp"].first()

Or drop_duplicates:

df.drop_duplicates(subset=['id','id2'])

For same output:

df1 = df.groupby(["id", "id2"], as_index=False)["timestamp"].first()
print (df1)
   id id2            timestamp
0  10  a1  2017-07-12 13:37:00
1  10  a2  2017-07-12 19:00:00
2  11  a1  2017-07-12 13:37:00

df1 = df.drop_duplicates(subset=['id','id2'])[['id','id2','timestamp']]
print (df1)
   id id2            timestamp
0  10  a1  2017-07-12 13:37:00
1  10  a2  2017-07-12 19:00:00
2  11  a1  2017-07-12 13:37:00

answered Sep 30 '22 16:09

jezrael

Related questions
                            
                                Conditionally add items to a list when defining the list?
                            
                                Django REST Framework and python-social-auth for registration/login user
                            
                                How to get away with a multidimensional index in pandas
                            
                                how to slice a pandas data frame according to column values?
                            
                                Handle a request header in Django rest framework to get the secret key passed in the header?
                            
                                Pythonic way to manage arbitrary amount of variables, used for equation solving.
                            
                                How can I locate something on my screen quickly in Python?
                            
                                Calling Cython functions from Numba jitted code
                            
                                Why do people say "Don't use place()"?
                            
                                Print a postgresql table to standard output in python
                            
                                format value that could be number and/or string in python 3
                            
                                How to add custom stop word list to StopWordsRemover
                            
                                Connect to .onion websites on tor using python?
                            
                                How to use ssl client certificate (p12) with Scrapy?
                            
                                A value is trying to be set on a copy of a slice from a DataFrame. - pandas
                            
                                extract 7z file using python 3 [duplicate]
                            
                                nltk words corpus does not contain "okay"?
                            
                                Web scraping image inside canvas
                            
                                Docker unable to install numpy, scipy, or gensim
                            
                                setting ylim on seaborn boxplot

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With