So, I want to find out the 'Avg. difficulty level in first vs. second half of each session', I cannot find a proper way to solve this. I have the divide the session into 2 halves using the epoch time and then find the Avg difficulty level.
session_id question_difficulty attempt_updated_at
5c822af21c1fba22 2 1557470128000
5c822af21c1fba22 3 1557469685000
5c822af21c1fba22 4 1557470079000
5c822af21c1fba22 5 1557472999000
5c822af21c1fba22 3 1557474145000
5c822af21c1fba22 3 1557474441000
5c822af21c1fba22 4 1557474299000
5c822af21c1fba22 4 1557474738000
5c822af21c1fba22 3 1557475430000
5c822af21c1fba22 4 1557476960000
5c822af21c1fba22 5 1557477458000
5c822af21c1fba22 2 1557478118000
5c822af21c1fba22 5 1557482556000
5c822af21c1fba22 4 1557482809000
5c822af21c1fba22 5 1557482886000
5c822af21c1fba22 5 1557484232000
I am working on python pandas(Jupter Notebook).
Code wise I don't know where to start. (Noobie Alert)
I expect the output like:
session_id first-half-difficulty second-half-difficulty
IIUC, you could use pandas.qcut to cut epochs into 2 equally sized bins (first half / second half). Then use groupby.mean:
df.groupby(['session_id', pd.qcut(df.attempt_updated_at, q=2)])['question_difficulty'].mean()
[out]
session_id attempt_updated_at
5c822af21c1fba22 (1557469684999.999, 1557475084000.0] 3.500
(1557475084000.0, 1557484232000.0] 4.125
Name: question_difficulty, dtype: float64
Alternatively, depending on how you define 'first half' / 'second half', you may require pandas.cut instead, with bins=2 argument (in this case the time bins will be equally spaced as opposed to equally sized as per qcut above):
df.groupby(['session_id', pd.cut(df.attempt_updated_at, bins=2)])['question_difficulty'].mean()
[out]
session_id attempt_updated_at
5c822af21c1fba22 (1557469670453.0, 1557476958500.0] 3.444444
(1557476958500.0, 1557484232000.0] 4.285714
Name: question_difficulty, dtype: float64
To calculate different time buckets for unique session_id, you may first have to group by session_id ; run the above method over each group ; then finally, concat the results. Here is an example using list comprehension:
groups_session_id = df.groupby('session_id')
pd.concat([g.groupby(['session_id', pd.cut(g['attempt_updated_at'], bins=2).astype(str)])
['question_difficulty'].mean() for _, g in groups_session_id])
To add these average values back into your original DataFrame, you can use DataFrame.merge:
df_avg_question_difficulty = pd.concat([g.groupby(['session_id', pd.cut(g['attempt_updated_at'], bins=2, labels = [1, 2]).astype(str)])
['question_difficulty'].mean().unstack(1) for _, g in groups_session_id])
df = df.merge(df_avg_question_difficulty, left_on='session_id', right_index=True)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With