I've been working on a DataFrame with User_IDs, DateTime objects and other information, like the following extract:
User_ID;Latitude;Longitude;Datetime
222583401;41.4020375;2.1478710;2014-07-06 20:49:20
287280509;41.3671346;2.0793115;2013-01-30 09:25:47
329757763;41.5453577;2.1175164;2012-09-25 08:40:59
189757330;41.5844998;2.5621569;2013-10-01 11:55:20
624921653;41.5931846;2.3030671;2013-07-09 20:12:20
414673119;41.5550136;2.0965829;2014-02-24 20:15:30
414673119;41.5550136;2.0975829;2014-02-24 20:16:30
414673119;41.5550136;2.0985829;2014-02-24 20:17:30
I've grouped Users with:
g = df.groupby(['User_ID','Datetime'])
and then check for no-single DataTime objects:
df = df.groupby('User_ID')['Datetime'].apply(lambda g: len(g)>1)
I've obtained the following boolean DataFrame:
User_ID
189757330 False
222583401 False
287280509 False
329757763 False
414673119 True
624921653 False
Name: Datetime, dtype: bool
which is fine for my purposes to keep only User_ID with a True masked value. Now I would like to keep only the User_ID values associated to the True values, and write them to a new DataFrame with pandas.to_csv
, for instance. The expected DataFrame would contain only the User_ID with more than one DateTime object:
User_ID;Latitude;Longitude;Datetime
414673119;41.5550136;2.0965829;2014-02-24 20:15:30
414673119;41.5550136;2.0975829;2014-02-24 20:16:30
414673119;41.5550136;2.0985829;2014-02-24 20:17:30
How may I have access to the boolean values for each User_ID? Thanks for your kind help.
The Groupby Rolling function does not preserve the original index and so when dates are the same within the Group, it is impossible to know which index value it pertains to from the original dataframe.
Groupby preserves the order of rows within each group.
So a groupby() operation can downcast to a Series, or if given a Series as input, can upcast to dataframe. For your first dataframe, you run unequal groupings (or unequal index lengths) coercing a series return which in the "combine" processing does not adequately yield a data frame.
DataFrame doesn't preserve the column order when converting from a DataFrames. DataFrame #72.
Assign the result of df.groupby('User_ID')['Datetime'].apply(lambda g: len(g)>1)
to a variable so you can perform boolean indexing and then use the index from this to call isin
and filter your orig df:
In [366]:
users = df.groupby('User_ID')['Datetime'].apply(lambda g: len(g)>1)
users
Out[366]:
User_ID
189757330 False
222583401 False
287280509 False
329757763 False
414673119 True
624921653 False
Name: Datetime, dtype: bool
In [367]:
users[users]
Out[367]:
User_ID
414673119 True
Name: Datetime, dtype: bool
In [368]:
users[users].index
Out[368]:
Int64Index([414673119], dtype='int64')
In [361]:
df[df['User_ID'].isin(users[users].index)]
Out[361]:
User_ID Latitude Longitude Datetime
5 414673119 41.555014 2.096583 2014-02-24 20:15:30
6 414673119 41.555014 2.097583 2014-02-24 20:16:30
7 414673119 41.555014 2.098583 2014-02-24 20:17:30
You can then call to_csv
on the above as normal
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With