Pandas: create a new df from another df contains specific value within group

Tags:

I have a df:

df2 = pd.DataFrame({
    'ID': ['James', 'James', 'James',
           'Max', 'Max', 'Max', 'Max', 'Max',
           'Park', 'Park', 'Park',
           'Tom', 'Tom', 'Tom', 'Tom'],
    'From_num': [78, 420, 'Started', 298, 78, 36, 298, 'Started', 28, 311, 'Started', 60, 520, 99, 'Started'],
    'To_num': [96, 78, 420, 36, 78, 78, 36, 298, 112, 28, 311, 150, 520, 78, 99],
    'Date': ['2020-05-12', '2020-02-02', '2019-06-18',
             '2019-08-26', '2019-06-20', '2019-01-30', '2018-10-23',
             '2018-08-29', '2020-05-21', '2019-11-22',
             '2019-04-12', '2019-10-16', '2019-08-26', '2018-12-11', '2018-10-09']})

And it looks like this:

       ID From_num  To_num        Date
0   James       78      96  2020-05-12
1   James      420      78  2020-02-02
2   James  Started     420  2019-06-18
3     Max      298      36  2019-08-26
4     Max       78      78  2019-06-20
5     Max       36      78  2019-01-30
6     Max      298      36  2018-10-23
7     Max  Started     298  2018-08-29
8    Park       28     112  2020-05-21
9    Park      311      28  2019-11-22
10   Park  Started     311  2019-04-12
11    Tom       60     150  2019-10-16
12    Tom      520     520  2019-08-26
13    Tom       99      78  2018-12-11
14    Tom  Started      99  2018-10-09

I wish to create a new dataframe for each ID (person's name) that either one column contains number 78 within the group (no matter 78 appears in From_num or To_num or both), and remove the person BOTH columns doesn't contain 78, in this case 'Park'. I have wrote code like this:

find_nn = df2.groupby('ID').apply(lambda x: x[['From_num', 'To_num']].isin([78]).any())
find_nn.columns = ['from_bool', 'to_bool']
find_nn['bool_result'] = find_nn['from_bool'] | find_nn['to_bool']
bool_nn = find_nn['bool_result'].reset_index()
df2_new = pd.merge(left=df2, right=bool_nn, on='ID', copy=False)
df2_new = df2_new[df2_new['bool_result'] == True]

It is working but very redundant and slow, as in my real case the dataset is more complex. If you have any better ideas please help. Many thanks!! Expect like this:

       ID From_num  To_num        Date
0   James       78      96  2020-05-12
1   James      420      78  2020-02-02
2   James  Started     420  2019-06-18
3     Max      298      36  2019-08-26
4     Max       78      78  2019-06-20
5     Max       36      78  2019-01-30
6     Max      298      36  2018-10-23
7     Max  Started     298  2018-08-29
11    Tom       60     150  2019-10-16
12    Tom      520     520  2019-08-26
13    Tom       99      78  2018-12-11
14    Tom  Started      99  2018-10-09

514

asked Jul 23 '20 02:07

XaviorL

2 Answers

Let us try filter

df1 = df2.groupby('ID').filter(lambda x : x[['From_num','To_num']].eq(78).any().any())
       ID From_num  To_num        Date
0   James       78      96  2020-05-12
1   James      420      78  2020-02-02
2   James  Started     420  2019-06-18
3     Max      298      36  2019-08-26
4     Max       78      78  2019-06-20
5     Max       36      78  2019-01-30
6     Max      298      36  2018-10-23
7     Max  Started     298  2018-08-29
11    Tom       60     150  2019-10-16
12    Tom      520     520  2019-08-26
13    Tom       99      78  2018-12-11
14    Tom  Started      99  2018-10-09

For speed

m=df2[['From_num','To_num']].eq(78).any(axis=1).groupby(df2.ID).transform('any')
df1=df2[m]

answered Oct 04 '22 01:10

BENY

Here is a simpler way to get the same data. You can apply 2 filters to df2. The first line is saying, filter df2 where either From_num or To_num = 78, then get the IDs of these rows. And on the next line we filter df2 by those IDs.

ids = df2[(df2.From_num == 78) | (df2.To_num == 78)]['ID'].unique()
df2_new = df2[df2['ID'].isin(ids)]

answered Oct 04 '22 01:10

ruby

Related questions
                            
                                About changing longitude array from 0 - 360 to -180 to 180 with Python xarray
                            
                                Failed to extract xcom from airflow pod - Kubernetes Pod Operator
                            
                                Sklearn Agglomerative Clustering Custom Affinity
                            
                                How to find the index of the maximum non-infinity value in a numpy array?
                            
                                Invalid format string Tensorboard
                            
                                TypeError: get() got multiple values for argument 'task_id'
                            
                                Finding the intersection of two circles
                            
                                django admin: how to disable edit and delete link for foreignkey
                            
                                Amazon textextract I can't find trp module
                            
                                Stuck when setting up to use anaconda with VS Code and Integrated Git terminal
                            
                                Finding coefficients for logistic regression in python
                            
                                Python: How to read excel file from Requests response?
                            
                                How to raise a column in pandas DataFrame to consecutive powers
                            
                                Python dynamodb ExpressionAttributeValues contains invalid key: Syntax error; key:
                            
                                Install an older version of Tensorflow GPU
                            
                                Cannot import name 'CRS' from 'pyproj' for using the osmnx library
                            
                                How to check if in list in dictionary is a key?
                            
                                How to display a GIF in jupyter notebook using google colab?
                            
                                How to change the default python version in Raspberry Pi
                            
                                How to split a list into sublists that begin with the delimiting character? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas: create a new df from another df contains specific value within group

Tags:

python

pandas

dataframe

group-by

filter

XaviorL

People also ask

2 Answers

BENY

ruby

Recent Activity

Donate For Us