My problem is the following: Let's say I have two dataframes with same number of columns in pandas like for instance: <pre class="prettyprint"><code>A= 1 2 3 4 8 9 </code></pre> and <pre class="prettyprint"><code>B= 7 8 4 0 </code></pre> And also one boolean vector of length exactly num of rows from A + num of B rows = 5 , with the same number of <code>1</code>s as num of rows in B which means two <code>1</code>s in this example. Let's say <code>Bool= 0 1 0 1 0</code>. My goal is then to merge A and B into a bigger dataframe called C such that the rows of B corresponds to the 1s in Bool , so with this example it would give me: <pre class="prettyprint"><code>C= 1 2 7 8 3 4 4 0 8 9 </code></pre> Do you know how to do this please? If you know how this would help me tremendously. Thanks for your reading.

Here's a pandas-only solution that reindexes the original dataframes and then concatenates them: <pre class="prettyprint"><code>Bool = pd.Series([0, 1, 0, 1, 0], dtype=bool) B.index = Bool[ Bool].index A.index = Bool[~Bool].index pd.concat([A,B]).sort_index() # sort_index() is not really necessary # 0 1 #0 1 2 #1 7 8 #2 3 4 #3 4 0 #4 8 9 </code></pre>

Combine 2 pandas dataframes according to boolean Vector

Tags:

python

pandas

My problem is the following:
Let's say I have two dataframes with same number of columns in pandas like for instance:

A= 1 2
   3 4 
   8 9

and

B= 7 8
   4 0

And also one boolean vector of length exactly num of rows from A + num of B rows = 5 , with the same number of 1s as num of rows in B which means two 1s in this example. Let's say Bool= 0 1 0 1 0.

My goal is then to merge A and B into a bigger dataframe called C such that the rows of B corresponds to the 1s in Bool , so with this example it would give me:

Do you know how to do this please? If you know how this would help me tremendously. Thanks for your reading.

959

asked May 23 '17 17:05

Joan92

3 Answers

Here's a pandas-only solution that reindexes the original dataframes and then concatenates them:

Bool = pd.Series([0, 1, 0, 1, 0], dtype=bool) 
B.index = Bool[ Bool].index
A.index = Bool[~Bool].index
pd.concat([A,B]).sort_index() # sort_index() is not really necessary
#   0  1
#0  1  2
#1  7  8
#2  3  4
#3  4  0
#4  8  9

answered Sep 30 '22 08:09

DYZ

One option is to create an empty data frame with the expected shape and then fill the values from A and B in:

import pandas as pd
import numpy as np

# initialize a data frame with the same data types as A thanks to @piRSquared
df = pd.DataFrame(np.empty((A.shape[0] + B.shape[0], A.shape[1])), dtype=A.dtypes)
Bool = np.array([0, 1, 0, 1, 0]).astype(bool)

df.loc[Bool,:] = B.values
df.loc[~Bool,:] = A.values

df
#   0   1
#0  1   2
#1  7   8
#2  3   4
#3  4   0
#4  8   9

answered Sep 30 '22 09:09

Psidom

The following approach will generalize to larger groups than 2. Starting from

A = pd.DataFrame([[1,2],[3,4],[8,9]])    
B = pd.DataFrame([[7,8],[4,0]])    
C = pd.DataFrame([[9,9],[5,5]])
bb = pd.Series([0, 1, 0, 1, 2, 2, 0])

we can use

pd.concat([A, B, C]).iloc[bb.rank(method='first')-1].reset_index(drop=True)

which gives

In [269]: pd.concat([A, B, C]).iloc[bb.rank(method='first')-1].reset_index(drop=True)
Out[269]: 
   0  1
0  1  2
1  7  8
2  3  4
3  4  0
4  9  9
5  5  5
6  8  9

This works because when you use method='first', it ranks the values by their values in order and then by the order in which they're seen. This means that we get things like

In [270]: pd.Series([1, 0, 0, 1, 0]).rank(method='first')
Out[270]: 
0    4.0
1    1.0
2    2.0
3    5.0
4    3.0
dtype: float64

which is exactly (after subtracting one) the iloc order in which we want to select the rows.

answered Sep 30 '22 09:09

DSM

Related questions
                            
                                how to plot a line in python with an interval at each data point
                            
                                Storing keys with prefix that expire in redis
                            
                                Why while updating a dictionary getting None? [duplicate]
                            
                                How to remove carriage return in a dataframe
                            
                                Install gobject module?
                            
                                Comparing multiple numpy arrays
                            
                                Keras neural network outputs same result for every input
                            
                                Replace specific values in a dataframe column using Pandas
                            
                                Python: Good way to get function and all dependencies in a single file?
                            
                                gitpython: Command syntax for git commit
                            
                                Pandas: Cumulative return function
                            
                                Argument 1 has unexpected type 'NoneType'?
                            
                                run django in docker container
                            
                                numba - guvectorize barely faster than jit
                            
                                Outline plot area in plotly in Python
                            
                                HTTP status code to status message
                            
                                Clear widget area of a cell in a Jupyter notebook from within notebook
                            
                                Historical ethereum prices - Coinbase API
                            
                                Remove double space and replace with a single one in pandas
                            
                                Use temp table with SQLAlchemy

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With