Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Merge and filter

Tags:

python

pandas

I used merge to create a dataframe between 2 of my query's. Now in the dataframe I want to put a filter on a column but I cannot get it to work. I am trying to add filter component date is null or blank.

    ##reading data from sql
    package = pd.read_sql(sql,con)
    component = pd.read_sql(sqla,con)

    ##doing the left join
    test2 = pd.merge(package,component, on = ['identifier','date'], how='left')

    ##shrinking the dataframe   
    test3 = test2[['identifier_x']].copy()

i've tried doing the below but cannot get it to work. Getting date_y not defined. I also tried with just date, as when I merge my dataframes the like columns get labelled x and y.

 test2 = pd.merge(package,component, on = ['identifier','date'], how='left'), component.query(date_y == '')

Now trying:

test2 = pd.merge(package,component, on = ['identifier','date'], how='left')
test2.query('date_y == \'\'')

And

test2 = pd.merge(package,component, on = ['identifier','date'], how='left')
test2[test2.date_y == '']

Also tried:

test2 = pd.merge(package,component, on = ['identifier','date'], how='left')
test2 = test2.date_y == ''

Sample data:

+------------+------------+------------+------------+
|   date_x   | identifier |   date_y   | identifier |
+------------+------------+------------+------------+
| 13/03/2019 | 3582191409 | 13/03/2019 | 3582191410 |
| 13/03/2019 | 3582191289 | 13/03/2019 | 3582191290 |
| 13/03/2019 | 3582190137 | 13/03/2019 | 3582190138 |
| 13/03/2019 | 3582185931 | 13/03/2019 | 3582185930 |
| 13/03/2019 | 3582184503 |            | 3582184502 |
| 13/03/2019 | 3582195631 |            | 3582195632 |
| 13/03/2019 | 3582191374 |            | 3582191373 |
| 13/03/2019 | 3582185917 |            | 3582185916 |
| 13/03/2019 | 3582185863 |            | 3582185862 |
+------------+------------+------------+------------+

Im trying to filter date y = blanks

like image 990
excelguy Avatar asked May 18 '19 18:05

excelguy


1 Answers

You're most likely reading in blanks. I would first make sure date_x and date_y are timestamps and replace all blanks with np.nan:

test2['date_y']=test2['date_y'].replace(' ',np.nan)
test2['date_x']=pd.to_datetime(test2['date_x'])
test2['date_y']=pd.to_datetime(test2['date_y'])
test2_filtered=test2.loc[test2['date_y'].isnull()]
like image 124
Nev1111 Avatar answered Oct 17 '22 13:10

Nev1111