Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python pandas: how to find rows in one dataframe but not in another?

Let's say that I have two tables: people_all and people_usa, both with the same structure and therefore the same primary key.

How can I get a table of the people not in the USA? In SQL I'd do something like:

select a.*
from people_all a

left outer join people_usa u
on a.id = u.id

where u.id is null

What would be the Python equivalent? I cannot think of a way to translate this where statement into pandas syntax.

The only way I can think of is to add an arbitrary field to people_usa (e.g. people_usa['dummy']=1), do a left join, then take only the records where 'dummy' is nan, then delete the dummy field - which seems a bit convoluted.

Thanks!

like image 428
Pythonista anonymous Avatar asked Sep 18 '15 12:09

Pythonista anonymous


People also ask

How do you find uncommon rows between two DataFrames in Python?

Line 21: We filter the uncommon rows from the above two DataFrames. We use the concat() method to do so. In this method, we input DataFrames in a list as a parameter to it and remove duplicate rows from the resultant data frame using the drop_duplicates() method.

How LOC and ILOC differ from each other?

The main distinction between loc and iloc is: loc is label-based, which means that you have to specify rows and columns based on their row and column labels. iloc is integer position-based, so you have to specify rows and columns by their integer position values (0-based integer position).

How do you filter rows in a DataFrame in Python?

Filter Rows by Condition You can use df[df["Courses"] == 'Spark'] to filter rows by a condition in pandas DataFrame. Not that this expression returns a new DataFrame with selected rows. You can also write the above statement with a variable.

How do I get rows from a Dataframe in pandas?

pandas get rows. We can use .loc [] to get rows. Note the square brackets here instead of the parenthesis (). The syntax is like this: df.loc [row, column]. column is optional, and if left blank, we can get the entire row. Because Python uses a zero-based index, df.loc [0] returns the first row of the dataframe.

How many rows are there in a Dataframe in Python?

Table 1 illustrates the output of the Python console and shows that our exemplifying data is made of six rows and three columns. This example shows how to get rows of a pandas DataFrame that have a certain value in a column of this DataFrame. In this specific example, we are selecting all rows where the column x3 is equal to the value 1.

How to get Dataframe records that do not exist in second Dataframe?

0 Getting dataframe records that do not exist in second data frame 0 Look for value in df1('col1') is equal to any value in df2('col3') and remove row from df1 if True [Python] 1 Comparing two different dataframes of different sizes using Pandas

What is the difference between DF columns and shape in pandas?

df.columns gives the list of the column (header) names. df.shape shows the dimension of the dataframe, in this case it’s 4 rows by 5 columns. There are several ways to get columns in pandas.


2 Answers

use isin and negate the boolean mask:

people_usa[~people_usa['ID'].isin(people_all ['ID'])]

Example:

In [364]:
people_all = pd.DataFrame({ 'ID' : np.arange(5)})
people_usa = pd.DataFrame({ 'ID' : [3,4,6,7,100]})
people_usa[~people_usa['ID'].isin(people_all['ID'])]

Out[364]:
    ID
2    6
3    7
4  100

so 3 and 4 are removed from the result, the boolean mask looks like this:

In [366]:
people_usa['ID'].isin(people_all['ID'])

Out[366]:
0     True
1     True
2    False
3    False
4    False
Name: ID, dtype: bool

using ~ inverts the mask

like image 125
EdChum Avatar answered Oct 11 '22 05:10

EdChum


Here is another similar to SQL Pandas method: .query():

people_all.query('ID not in @people_usa.ID')

or using NumPy's in1d() method:

people_all.[~np.in1d(people_all, people_usa)]

NOTE: for those who have experience with SQL it might be worth to read Pandas comparison with SQL

like image 27
MaxU - stop WAR against UA Avatar answered Oct 11 '22 06:10

MaxU - stop WAR against UA