I'm doing joining of two dataframe (A and B) in python's pandas.
The goal is to receive all the pure rows from B (sql analogue- right join B on A.client_id=B.client_id where A.client_id is null)
In pandas all I know for this operation is to do merging but I don't know how to set up the conditions (where clause):
x=pd.merge(A,B,how='right',on=['client_id','client_id']
Inner Join. Left Outer Join. Right Outer Join. Full Outer Join or simply Outer Join.
The Fastest Ways As it turns out, join always tends to perform well, and merge will perform almost exactly the same given the syntax is optimal.
An INNER JOIN between two pandas DataFrames will result into a set of records that have a mutual value in the specified joining column(s). In order to perform an inner join between two DataFrames using a single column, all we need is to provide the on argument when calling merge() .
We can join columns from two Dataframes using the merge() function. This is similar to the SQL 'join' functionality. A detailed discussion of different join types is given in the SQL lesson. You specify the type of join you want using the how parameter.
option 1indicator=True
A.merge(B, on='client_id', how='right', indicator=True) \
.query('_merge == "right_only"').drop('_merge', 1)
setup
A = pd.DataFrame(dict(client_id=[1, 2, 3], valueA=[4, 5, 6]))
B = pd.DataFrame(dict(client_id=[3, 4, 5], valueB=[7, 8, 9]))
results
more explanationindicator=True
puts another column in the results of the merge that indicates whether that rows results are from the left, right, or both.
A.merge(B, on='client_id', how='outer', indicator=True)
So, I just use query
to filter out the right_only
indicator then drop that column.
option 2
not really a merge. You can use query
again to only pull rows of B
where its 'client_id'
s are not in A
B.query('client_id not in @A.client_id')
or an equivalent way of saying the same thing (but faster)
B[~B.client_id.isin(A.client_id)]
For me, this is also a bit unsatisfying, but I think the recommended way is something like:
x = pd.merge(A[A["client_ID"].isnull()], B,
how='right', on=['client_id', 'client_id'])
More information can be found in the pandas documentation
Additionally, you might use something like A.where(A["client_ID"].isnull())
for filtering. Also, note my mistake in the previous version. I was comparing to None
but you should use the isnull()
function
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With