Now i need to merge two dataframe with the condition greater than(>=). But merge only support equal. Is there any way to deal with it? Thanks!
The pd. merge() function recognizes that each DataFrame has an "employee" column, and automatically joins using this column as a key. The result of the merge is a new DataFrame that combines the information from the two inputs.
pandas. DataFrame. merge (similar to a SQL join) is case sensitive, as are most Python functions.
A merge is also just as efficient as a join as long as: Merging is done on indexes if possible. The “on” parameter is avoided, and instead, both columns to merge on are explicitly stated using the keywords left_on, left_index, right_on, and right_index (when applicable).
merge() for combining data on common columns or indices. . join() for combining data on a key column or an index. concat() for combining DataFrames across rows or columns.
I don't know how to achieve the following with similar merge and join syntax in pandas,
SELECT *
FROM a
INNER JOIN b
ON a.column1 >= b.column1 AND a.column1 <= b.column2
But the query above can also be written implicitly as;
SELECT *
FROM a, b
WHERE a.column1 >= b.column1 AND a.column1 <= b.column2
Which is basically the old syntax and should do exactly same (performance wise). It takes the cartesian product of 2 tables (or cross join) and then select from that using the WHERE condition, which could be easily implemented in pandas. This could be a little heavy on memory, but should be fast.
First the FROM a, b
clause (we temporarily assign a column with same values in all rows, so we can cross join over it);
df = pd.merge(a.assign(key=0), b.assign(key=0), on='key').drop('key', axis=1)
and then use boolean indexing (our WHERE
clause) to slice the frame;
df[(df["column1_x"] >= df["column1_y"]) & (df["column1_x"] <= df["column2_y"])]
If you don't want the cartesian product and only want to compare the rows on same index of both tables, you can merge on index like this;
df = a.merge(b, left_index = True, right_index = True)
or concat on axis 1 if they are same length;
df = pd.concat([a, b], axis=1)
And use boolean indexing again to eliminate results;
df[(df["column1_x"] >= df["column1_y"]) & (df["column1_x"] <= df["column2_y"])]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With