Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove rows of a DataFrame based off of data from another DataFrame?

I'm new to pandas and I'm trying to figure this scenario out: I have a sample DataFrame with two products. df =

  Product_Num     Date   Description  Price 
          10    1-1-18   Fruit Snacks  2.99
          10    1-2-18   Fruit Snacks  2.99
          10    1-5-18   Fruit Snacks  1.99
          10    1-8-18   Fruit Snacks  1.99
          10    1-10-18  Fruit Snacks  2.99
          45    1-1-18         Apples  2.99 
          45    1-3-18         Apples  2.99
          45    1-5-18         Apples  2.99
          45    1-9-18         Apples  1.49
          45    1-10-18        Apples  1.49
          45    1-13-18        Apples  1.49
          45    1-15-18        Apples  2.99 

I also have another small DataFrame that looks like this (which shows promotional prices of the same products): df2=

  Product_Num   Price 
          10    1.99
          45    1.49 

Notice that df2 does not contain columns 'Date' nor 'Description.' What I want to do is to remove all promo prices from df1 (for all dates that are on promo), using the data from df1. What is the best way to do this?

So, I want to see this:

  Product_Num     Date   Description  Price 
          10    1-1-18   Fruit Snacks  2.99
          10    1-2-18   Fruit Snacks  2.99
          10    1-10-18  Fruit Snacks  2.99
          45    1-1-18         Apples  2.99 
          45    1-3-18         Apples  2.99
          45    1-5-18         Apples  2.99
          45    1-15-18        Apples  2.99 

I was thinking of doing a merge on columns Price and Product_Num, then seeing what I can do from there. But I was getting confused because of the multiple dates.

like image 541
Hana Avatar asked Jan 30 '18 22:01

Hana


People also ask

How do you delete a row from a Dataframe that exists in another Dataframe?

To remove rows from a data frame that exists in another data frame, we can use subsetting with single square brackets. This removal will help us to find the unique rows in the data frame based on the column of another data frame.

How do I delete rows in Pandas Dataframe based on condition?

Use pandas. DataFrame. drop() method to delete/remove rows with condition(s).

How do you remove a subset from a data frame?

How To Remove Rows In DataFrame. To remove rows in Pandas DataFrame, use the drop() method. The Pandas dataframe drop() is a built-in function that is used to drop the rows. The drop() removes the row based on an index provided to that function.

How to delete rows from a pandas Dataframe?

Pandas provide data analysts a way to delete and filter data frame using dataframe.drop () method. We can use this method to drop such rows that do not satisfy the given conditions. Let’s create a Pandas dataframe. Example 1 : Delete rows based on condition on a column. Example 2 : Delete rows based on multiple conditions on a column.

How to drop a row from a Dataframe in Python?

By Krunal Last updated Jan 22, 2020 Python Pandas dataframe drop () is an inbuilt function that is used to drop the rows. The drop () removes the row based on an index provided to that function. We can remove one or more than one row from a DataFrame using multiple ways.

How many rows are in a Dataframe in Python?

As you can see based on Table 1, our example data is a DataFrame and comprises six rows and three variables called “x1”, “x2”, and “x3”. This example shows how to delete certain rows of a pandas DataFrame based on a column of this DataFrame.

How to use boolean index to delete rows in Dataframe?

Boolean index is basically a list of boolean values (True or False). We can use boolean index to filter rows easily, here we can also use it to delete rows conveniently. This time we’ll delete the rows with “Jean Grey” from the dataframe, and assign the result into a new dataframe


2 Answers

isin with &

df.loc[~((df.Product_Num.isin(df2['Product_Num']))&(df.Price.isin(df2['Price']))),:]
Out[246]: 
    Product_Num     Date  Description  Price
0            10   1-1-18  FruitSnacks   2.99
1            10   1-2-18  FruitSnacks   2.99
4            10  1-10-18  FruitSnacks   2.99
5            45   1-1-18       Apples   2.99
6            45   1-3-18       Apples   2.99
7            45   1-5-18       Apples   2.99
11           45  1-15-18       Apples   2.99

Update

df.loc[~df.index.isin(df.merge(df2.assign(a='key'),how='left').dropna().index)]
Out[260]: 
    Product_Num     Date  Description  Price
0            10   1-1-18  FruitSnacks   2.99
1            10   1-2-18  FruitSnacks   2.99
4            10  1-10-18  FruitSnacks   2.99
5            45   1-1-18       Apples   2.99
6            45   1-3-18       Apples   2.99
7            45   1-5-18       Apples   2.99
11           45  1-15-18       Apples   2.99
like image 199
BENY Avatar answered Nov 01 '22 14:11

BENY


With Product_Num as index for both Dataframe, you can drop index from df1 for df2, then concatenate the dataframes :

import pandas as pd

df1 = pd.DataFrame({'Product_Num':[1,2,3,4], 'Date': ['01/01/2012','01/02/2013','02/03/2013','04/02/2013'], 'Price': [10,10,10,10]})
df1 = df1.set_index('Product_Num')
df2 = pd.DataFrame({'Product_Num':[2], 'Date':['03/3/2012'], 'Price': [5]})
df2 = df2.set_index('Product_Num')

Drop and concatenate:

df_new = df1.drop(df2.index)
df_new = pd.concat([df_new, df2])

Result:

               Date  Price
Product_Num                   
1            01/01/2012     10
3            02/03/2013     10
4            04/02/2013     10
2             03/3/2012      5
like image 31
piratefache Avatar answered Nov 01 '22 13:11

piratefache