pandas - keep only True values after groupby a DataFrame

Tags:

pandas

I've been working on a DataFrame with User_IDs, DateTime objects and other information, like the following extract:

User_ID;Latitude;Longitude;Datetime
222583401;41.4020375;2.1478710;2014-07-06 20:49:20
287280509;41.3671346;2.0793115;2013-01-30 09:25:47
329757763;41.5453577;2.1175164;2012-09-25 08:40:59
189757330;41.5844998;2.5621569;2013-10-01 11:55:20
624921653;41.5931846;2.3030671;2013-07-09 20:12:20
414673119;41.5550136;2.0965829;2014-02-24 20:15:30
414673119;41.5550136;2.0975829;2014-02-24 20:16:30
414673119;41.5550136;2.0985829;2014-02-24 20:17:30

I've grouped Users with:

g = df.groupby(['User_ID','Datetime'])

and then check for no-single DataTime objects:

df = df.groupby('User_ID')['Datetime'].apply(lambda g: len(g)>1)

I've obtained the following boolean DataFrame:

User_ID
189757330    False
222583401    False
287280509    False
329757763    False
414673119     True
624921653    False
Name: Datetime, dtype: bool

which is fine for my purposes to keep only User_ID with a True masked value. Now I would like to keep only the User_ID values associated to the True values, and write them to a new DataFrame with pandas.to_csv, for instance. The expected DataFrame would contain only the User_ID with more than one DateTime object:

User_ID;Latitude;Longitude;Datetime
414673119;41.5550136;2.0965829;2014-02-24 20:15:30
414673119;41.5550136;2.0975829;2014-02-24 20:16:30
414673119;41.5550136;2.0985829;2014-02-24 20:17:30

How may I have access to the boolean values for each User_ID? Thanks for your kind help.

626

asked Mar 04 '15 16:03

Fabio Lamanna

1 Answers

Assign the result of df.groupby('User_ID')['Datetime'].apply(lambda g: len(g)>1) to a variable so you can perform boolean indexing and then use the index from this to call isin and filter your orig df:

In [366]:

users = df.groupby('User_ID')['Datetime'].apply(lambda g: len(g)>1)
users

Out[366]:
User_ID
189757330    False
222583401    False
287280509    False
329757763    False
414673119     True
624921653    False
Name: Datetime, dtype: bool

In [367]:   
users[users]

Out[367]:
User_ID
414673119    True
Name: Datetime, dtype: bool

In [368]:
users[users].index

Out[368]:
Int64Index([414673119], dtype='int64')

In [361]:
df[df['User_ID'].isin(users[users].index)]

Out[361]:
     User_ID   Latitude  Longitude            Datetime
5  414673119  41.555014   2.096583 2014-02-24 20:15:30
6  414673119  41.555014   2.097583 2014-02-24 20:16:30
7  414673119  41.555014   2.098583 2014-02-24 20:17:30

You can then call to_csv on the above as normal

answered Oct 16 '22 13:10

EdChum

Related questions
                            
                                Managing pip in an RPM environment
                            
                                How to use Cleaner, lxml.html without returning div tag?
                            
                                How do I perform low level I/O on a Linux device file in Python?
                            
                                Decrypt Chrome Linux BLOB encrypted cookies in Python
                            
                                ArgParse Python Module: Change default argument value for inherted argument
                            
                                Setting numpoints in matplotlib legend does not work
                            
                                Matplotlib text bounding box dimensions
                            
                                Viewing Local Variables in Spyder's Variable Explorer
                            
                                What does 'yaml.parser.ParserError: expected '<document start>', but found '<block mapping start>'' mean?
                            
                                How to merge two dataframe in pandas to replace nan
                            
                                Communication between C++ and Python
                            
                                Bad Marshal error -- runsnake
                            
                                Make Python's `warnings.warn()` not mention itself
                            
                                ImportError: No module named lxml - Even though LXML Is installed
                            
                                Merging two pandas dataframes results in "duplicate" columns
                            
                                Pandas: parse merged header columns from Excel
                            
                                Using prepared statements with mysql in python
                            
                                getting raise KeyError(key) KeyError: 'SECRET_KEY' with django on production settings
                            
                                Extract string from tag with BeautifulSoup
                            
                                Extracting just a string element from a pandas dataframe

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With