I have a dataframe with columns:
diff
- difference between registration date and payment date,in dayscountry
- country of useruser_id
campaign_id
-- another categorical column, we will use it in groupbyI need to calculate count distinct users for every country
+campaign_id
group who has diff
<=n.
For example, for country
'A', campaign
'abc' and diff
7 i need to get count distinct users from country
'A', campaign
'abc' and diff
<= 7
My current solution(below) works too long
import pandas as pd
import numpy as np
## generate test dataframe
df = pd.DataFrame({
'country':np.random.choice(['A', 'B', 'C', 'D'], 10000),
'campaign': np.random.choice(['camp1', 'camp2', 'camp3', 'camp4', 'camp5', 'camp6'], 10000),
'diff':np.random.choice(range(10), 10000),
'user_id': np.random.choice(range(1000), 10000)
})
## main
result_df = pd.DataFrame()
for diff in df['diff'].unique():
tmp_df = df.loc[df['diff']<=diff,:]
tmp_df = tmp_df.groupby(['country', 'campaign'], as_index=False).apply(lambda x: x.user_id.nunique()).reset_index()
tmp_df['diff'] = diff
tmp_df.columns=['country', 'campaign', 'unique_ppl', 'diff']
result_df = pd.concat([result_df, tmp_df],ignore_index=True, axis=0)
Maybe there is better way to do this?
How to count unique values in a Pandas Groupby object? - GeeksforGeeks How to count unique values in a Pandas Groupby object? Groupby as the name suggests groups attributes on the basis of similarity in some value. We can count the unique values in pandas Groupby object using groupby (), agg (), and reset_index () method.
groupby () – groupby () function is used to split the data into groups based on some criteria. pandas objects can be split on any of their axes. Syntax: DataFrame.groupby (by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False, **kwargs)
DataFrameGroupBy.nunique(dropna=True)[source]¶ Return DataFrame with counts of unique elements in each position. Parameters dropnabool, default True Don’t include NaN in the counts. Returns nunique: DataFrame Examples
Selecting a groups In order to select a group, we can select group using GroupBy.get_group (). We can select a group by applying a function GroupBy.get_group this function select a single group. import pandas as pd
First use list comprehension with concat
and assign
for join all together and then groupby
with nunique
with adding column diff
, last rename columns and if necessary add reindex
for custom columns order:
df1 = pd.concat([df.loc[df['diff']<=x].assign(diff=x) for x in df['diff'].unique()])
df2 = (df1.groupby(['diff','country', 'campaign'], sort=False)['user_id']
.nunique()
.reset_index()
.rename(columns={'user_id':'unique_ppl'})
.reindex(columns=['country', 'campaign', 'unique_ppl', 'diff']))
One alternative below, but @jezrael's solution is optimal.
Performance benchmarking
%timeit original(df) # 149ms
%timeit jp(df) # 81ms
%timeit jez(df) # 47ms
def original(df):
result_df = pd.DataFrame()
for diff in df['diff'].unique():
tmp_df = df.loc[df['diff']<=diff,:]
tmp_df = tmp_df.groupby(['country', 'campaign'], as_index=False).apply(lambda x: x.user_id.nunique()).reset_index()
tmp_df['diff'] = diff
tmp_df.columns=['country', 'campaign', 'unique_ppl', 'diff']
result_df = pd.concat([result_df, tmp_df],ignore_index=True, axis=0)
return result_df
def jp(df):
result_df = pd.DataFrame()
lst = []
lst_append = lst.append
for diff in df['diff'].unique():
tmp_df = df.loc[df['diff']<=diff,:]
tmp_df = tmp_df.groupby(['country', 'campaign'], as_index=False).agg({'user_id': 'nunique'})
tmp_df['diff'] = diff
tmp_df.columns=['country', 'campaign', 'unique_ppl', 'diff']
lst_append(tmp_df)
result_df = result_df.append(pd.concat(lst, ignore_index=True, axis=0), ignore_index=True)
return result_df
def jez(df):
df1 = pd.concat([df.loc[df['diff']<=x].assign(diff=x) for x in df['diff'].unique()])
df2 = (df1.groupby(['diff','country', 'campaign'], sort=False)['user_id']
.nunique()
.reset_index()
.rename(columns={'user_id':'unique_ppl'})
.reindex(columns=['country', 'campaign', 'unique_ppl', 'diff']))
return df2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With