<p>I have two dataframes that I want to merge/groupby. They are below:</p> <p></p> <div class="snippet" data-lang="js" data-hide="false" data-console="true" data-babel="false"> <div class="snippet-code"> <pre class="prettyprint snippet-code-html lang-html prettyprint-override"><code>df_1 words start stop 0 Oh, 6.72 7.21 1 okay, 7.26 8.01 2 go 12.82 12.90 3 ahead. 12.91 12.94 4 NaN 15.29 15.62 5 NaN 15.63 15.99 6 NaN 16.09 16.36 7 NaN 16.37 16.96 8 NaN 17.88 18.36 9 NaN 18.37 19.36</code></pre> </div> </div> <p></p> <div class="snippet" data-lang="js" data-hide="false" data-console="true" data-babel="false"> <div class="snippet-code"> <pre class="prettyprint snippet-code-html lang-html prettyprint-override"><code>df_2 data start stop 10 1.0 3.5 14 4.0 8.5 11 9.0 13.5 12 14.0 20.5</code></pre> </div> </div> <p>I want to merge df_1.words onto df_2, but group all values in df_1.words where df_1.start is in between df_2.start and df_2.stop. It should look like this:</p> <p></p> <div class="snippet" data-lang="js" data-hide="false" data-console="true" data-babel="false"> <div class="snippet-code"> <pre class="prettyprint snippet-code-html lang-html prettyprint-override"><code>df_2 data start stop words 10 1.0 3.5 NaN 14 4.0 8.5 Oh, okay, 11 9.0 13.5 go ahead. 12 14.0 20.5 NaN, NaN, NaN, NaN, NaN, NaN</code></pre> </div> </div>

<p>If the two dataframes are not too long, we can do a cross-join:</p> <pre class="prettyprint"><code>(df2.assign(dummy=1) .merge(df.assign(dummy=1), on='dummy', how='left', suffixes=['','_r'] ) .query('start<=start_r<=stop') .groupby(['data','start','stop'],as_index=False) .agg({'words':list}) ) </code></pre> <p>Output:</p> <pre class="prettyprint"><code> data start stop words 0 11 9.0 13.5 [go, ahead.] 1 12 14.0 20.5 [nan, nan, nan, nan, nan, nan] 2 14 4.0 8.5 [Oh,, okay,] </code></pre>

How to merge and groupby between seperate dataframes

Tags:

python

pandas

I have two dataframes that I want to merge/groupby. They are below:

Click to copy

df_1


        words      start   stop
0            Oh,    6.72   7.21
1          okay,    7.26   8.01
2             go  12.82   12.90
3         ahead.   12.91  12.94
4             NaN  15.29  15.62
5             NaN  15.63  15.99
6             NaN  16.09  16.36
7             NaN  16.37  16.96
8             NaN  17.88  18.36
9             NaN  18.37  19.36

Click to copy

df_2

data     start        stop
10         1.0        3.5
14         4.0       8.5
11         9.0       13.5
12        14.0       20.5

I want to merge df_1.words onto df_2, but group all values in df_1.words where df_1.start is in between df_2.start and df_2.stop. It should look like this:

Click to copy

df_2

data     start        stop   words
10         1.0        3.5     NaN
14         4.0       8.5      Oh, okay,
11         9.0       13.5     go ahead.
12        14.0       20.5     NaN, NaN, NaN, NaN, NaN, NaN

833

asked Dec 09 '19 21:12

connor449

2 Answers

If the two dataframes are not too long, we can do a cross-join:

Click to copy

(df2.assign(dummy=1)
    .merge(df.assign(dummy=1), on='dummy',
           how='left', suffixes=['','_r']
          )
    .query('start<=start_r<=stop')
    .groupby(['data','start','stop'],as_index=False)
    .agg({'words':list})
)

Output:

Click to copy

   data  start  stop                           words
0    11    9.0  13.5                    [go, ahead.]
1    12   14.0  20.5  [nan, nan, nan, nan, nan, nan]
2    14    4.0   8.5                    [Oh,, okay,]

134

answered Oct 16 '22 11:10

Quang Hoang

If the bin edges do not overlap as in your example, use pd.cut, with an IntervalIndex to group the first DataFrame. This allows you to be closed on both edges. Then select from with the 'stop' column from df_2 to get the aggregated result.

Click to copy

import pandas as pd

idx = pd.Index([pd.Interval(*x, closed='both') for x in zip(df_2.start, df_2.stop)])

s = df_1.groupby(pd.cut(df_1.start, idx)).words.agg(list)

# Closed on both, can use `'stop'` to align
df_2['words'] = s[df_2.stop].to_list()

Click to copy

print(df_2)
   data  start  stop                           words
0    10    1.0   3.5                              []
1    14    4.0   8.5                    [Oh,, okay,]
2    11    9.0  13.5                    [go, ahead.]
3    12   14.0  20.5  [nan, nan, nan, nan, nan, nan]

answered Oct 16 '22 12:10

ALollz

Related questions
                            
                                Python Setuptools and PBR - how to create a package release using the git tag as the version?
                            
                                Delete row/column from Excel with xlsxwriter
                            
                                Bert Embedding Layer raises `Type Error: unsupported operand type(s) for +: 'None Type' and 'int'` with BiLSTM
                            
                                How to build TensorFlow lite with select TensorFlow ops for x86_64 systems?
                            
                                How to extract data from a Tweepy object into a pandas dataframe?
                            
                                Generate a column based on a constraint in pandas
                            
                                Why does my Streamlit application open multiple times?
                            
                                How to convert nested json structure to dataframe
                            
                                Can I get() or xcom.pull() a variable in the MAIN part of an Airflow script (outside a PythonOperator)?
                            
                                Sort lines in text file between patterns
                            
                                Where is the class list_iterator defined?
                            
                                mount error when trying to access the Azure DBFS file system in Azure Databricks
                            
                                How to load numpy array in a tensorflow dataset
                            
                                pytorch debugging timeout with PyCharm
                            
                                Fixing 'Import [module] could not be resolved' in pyright
                            
                                Python: How to automate 'Allow' flash player content in Firefox?
                            
                                Python does not allow annotating the types of variables when unpacking
                            
                                How to measure xgboost regressor accuracy using accuracy_score (or other suggested function)
                            
                                Group and find all values that belong to n unique maximum values
                            
                                sklearn ColumnTransformer with MultilabelBinarizer

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to merge and groupby between seperate dataframes

Tags:

python

pandas

connor449

People also ask

2 Answers

Quang Hoang

ALollz

Recent Activity

Donate For Us