Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can one use comparisons to merge two pandas data-frames?

With the following command:

pandas.merge(df_1, df_2, left_on=['date'], right_on=['from_date'])

I combine two rows from two tables if a value in date-column of the first table is equal to the value in the from_date-column of the second table.

Now I would like to make it slightly more complex. I need to combine a row from the first table with a row from the second table if the value in the date column of the first table is equal or lager than a value of the from_date-column of the second table and smaller than value in the upto_date-column of the second column.

In SQL one would use something like that:

select
    *
from
    table_1
join
    table_2
on
    table_1.date >= table_2.from_date
    and
    table_1.date <  table_2.upto_date

Is it possible to do it in pandas.

like image 215
Roman Avatar asked Sep 05 '14 14:09

Roman


1 Answers

pandasql is a pretty useful tool for querying pandas DataFrames using SQLite query syntax.

Resources

  • pandasql - PyPI Documentation
  • yhat/pandasql - Source on Github
  • Blog post with more examples

    pip install -U pandasql

Here's an example similar to the one you describe.

Imports

#!/usr/bin/env python
# -*- coding: utf-8 -*- 
import pandas as pd
from pandas.io.parsers import StringIO
from pandasql import sqldf

# helper func useful for saving keystrokes
# when running multiple queries
def dbGetQuery(q):
    return sqldf(q, globals())

Fake some data

sample_a = """timepoint,measure
2014-01-01 00:00:00,78
2014-01-03 00:00:00,5
2014-01-04 00:00:00,73
2014-01-05 00:00:00,40
2014-01-06 00:00:00,45
2014-01-08 00:00:00,2
2014-01-09 00:00:00,96
2014-01-10 00:00:00,82
2014-01-11 00:00:00,61
2014-01-12 00:00:00,68
2014-01-13 00:00:00,8
2014-01-14 00:00:00,94
2014-01-15 00:00:00,16
2014-01-16 00:00:00,31
2014-01-17 00:00:00,10
2014-01-18 00:00:00,34
2014-01-19 00:00:00,27
2014-01-20 00:00:00,75
2014-01-21 00:00:00,49
2014-01-23 00:00:00,28
2014-01-24 00:00:00,91
2014-01-25 00:00:00,88
2014-01-27 00:00:00,98
2014-01-28 00:00:00,39
2014-01-29 00:00:00,90
2014-01-30 00:00:00,63
2014-01-31 00:00:00,77
"""

sample_b = """from_date,to_date,measure
2014-01-02 00:00:00,2014-01-06 00:00:00,89
2014-01-03 00:00:00,2014-01-07 00:00:00,80
2014-01-04 00:00:00,2014-01-05 00:00:00,44
2014-01-05 00:00:00,2014-01-12 00:00:00,68
2014-01-06 00:00:00,2014-01-11 00:00:00,62
2014-01-07 00:00:00,2014-01-14 00:00:00,5
2014-01-08 00:00:00,2014-01-09 00:00:00,23
"""

Read datasets to create 2 DataFrames

df1 = pd.read_csv(StringIO(sample_a), parse_dates=['timepoint'])
df2 = pd.read_csv(StringIO(sample_b), parse_dates=['from_date', 'to_date'])

Write a SQL query

Note that this one uses the SQLite BETWEEN operator. You can also swap that out and use something like ON timepoint >= from_date AND timepoint < to_date if you prefer.

query = """
SELECT
    DATE(df1.timepoint) AS timepoint
    , DATE(df2.from_date) AS start
    , DATE(df2.to_date) AS end
    , df1.measure AS measure_a
    , df2.measure AS measure_b
FROM
    df1 
INNER JOIN df2
    ON df1.timepoint BETWEEN 
        df2.from_date AND df2.to_date
ORDER BY
    df1.timepoint;
"""

Run the query using the helper func

df3 = dbGetQuery(query)

df3
     timepoint       start         end  measure_a  measure_b
0   2014-01-03  2014-01-02  2014-01-06          5         89
1   2014-01-03  2014-01-03  2014-01-07          5         80
2   2014-01-04  2014-01-02  2014-01-06         73         89
3   2014-01-04  2014-01-03  2014-01-07         73         80
4   2014-01-04  2014-01-04  2014-01-05         73         44
5   2014-01-05  2014-01-02  2014-01-06         40         89
6   2014-01-05  2014-01-03  2014-01-07         40         80
7   2014-01-05  2014-01-04  2014-01-05         40         44
8   2014-01-05  2014-01-05  2014-01-12         40         68
9   2014-01-06  2014-01-02  2014-01-06         45         89
10  2014-01-06  2014-01-03  2014-01-07         45         80
11  2014-01-06  2014-01-05  2014-01-12         45         68
12  2014-01-06  2014-01-06  2014-01-11         45         62
13  2014-01-08  2014-01-05  2014-01-12          2         68
14  2014-01-08  2014-01-06  2014-01-11          2         62
15  2014-01-08  2014-01-07  2014-01-14          2          5
16  2014-01-08  2014-01-08  2014-01-09          2         23
17  2014-01-09  2014-01-05  2014-01-12         96         68
18  2014-01-09  2014-01-06  2014-01-11         96         62
19  2014-01-09  2014-01-07  2014-01-14         96          5
20  2014-01-09  2014-01-08  2014-01-09         96         23
21  2014-01-10  2014-01-05  2014-01-12         82         68
22  2014-01-10  2014-01-06  2014-01-11         82         62
23  2014-01-10  2014-01-07  2014-01-14         82          5
24  2014-01-11  2014-01-05  2014-01-12         61         68
25  2014-01-11  2014-01-06  2014-01-11         61         62
26  2014-01-11  2014-01-07  2014-01-14         61          5
27  2014-01-12  2014-01-05  2014-01-12         68         68
28  2014-01-12  2014-01-07  2014-01-14         68          5
29  2014-01-13  2014-01-07  2014-01-14          8          5
30  2014-01-14  2014-01-07  2014-01-14         94          5
like image 144
hernamesbarbara Avatar answered Nov 18 '22 16:11

hernamesbarbara