I am loading a CSV file with pandas. It has three columns: a column with date and time, a column with a user id, and another 'campaignID'. Example rows:
date user_id campaign_id
2018-01-10 0:21:09 151312395 GOOGLE
2018-01-10 0:21:19 151312395 GOOGLE
2018-01-10 0:21:32 151312395 GOOGLE
I want to group the data by the user id, and then for each user id group the rows by time and the campaign ID, it should look as follows.
user_id date ad_campaign
151312395 2018-01-10 0:21:09 GOOGLE
2018-01-10 0:21:19 GOOGLE
2018-01-10 0:21:32 GOOGLE
This is what I have made until now: import pandas as pd import numpy as np import datetime
def dateparse(time_in_secs):
return datetime.datetime.fromtimestamp(float(time_in_secs))
columnnames = ['date','user_id', 'ad_campaign']
columnnames, sep='\t' ,usecols=[0,1,3],index_col = 'date')
df=pd.read_csv(r'C:\Users\L\Desktop\Data.csv' ,
sep='\t',names = columnnames, usecols=[0,1,3],
parse_dates=True,date_parser=dateparse)
df.date = pd.to_datetime(df.date)
df = df.sort_values(by = 'date')
g = df.groupby('user_id')['ad_campaign']
print(g)
This gives the following output:
<pandas.core.groupby.SeriesGroupBy object at 0x04EF26F0>
[Finished in 0.6s]
Why doesnt the print provide the sorted columns?
To group Pandas dataframe, we use groupby(). To sort grouped dataframe in ascending or descending order, use sort_values(). The size() method is used to get the dataframe size.
Sort within Groups of groupby() Result in DataFrameBy using DataFrame. sort_values() , you can sort DataFrame in ascending or descending order, before you use this first group the DataFrame rows by using DataFrame. groupby() method. Note that groupby preserves the order of rows within each group.
We will be using the sort_values() method to sort our dataset and the attribute that we will pass inside the function is the column name using which we want to sort our DataFrame.
Groupby preserves the order of rows within each group.
Firstly, if you are doing groupby
, you don't need to sort the column explicitly.
You can do:
Method 1:
df.date = pd.to_datetime(df.date)
g = df.groupby(['user_id','date'])['ad_campaign']
print(g.first())
Method 2:
df.set_index(['user_id','date']).sort_index()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With