I'd like to concatenate two dataframes A, B to a new one without duplicate rows (if rows in B already exist in A, don't add): Dataframe A: Dataframe B: <pre class="prettyprint"><code> I II I II 0 1 2 5 6 1 3 1 3 1 </code></pre> New Dataframe: <pre class="prettyprint"><code> I II 0 1 2 1 3 1 2 5 6 </code></pre> How can I do this?

The simplest way is to just do the concatenation, and then drop duplicates. <pre class="prettyprint"><code>>>> df1 A B 0 1 2 1 3 1 >>> df2 A B 0 5 6 1 3 1 >>> pandas.concat([df1,df2]).drop_duplicates().reset_index(drop=True) A B 0 1 2 1 3 1 2 5 6 </code></pre> The <code>reset_index(drop=True)</code> is to fix up the index after the <code>concat()</code> and <code>drop_duplicates()</code>. Without it you will have an index of <code>[0,1,0]</code> instead of <code>[0,1,2]</code>. This could cause problems for further operations on this <code>dataframe</code> down the road if it isn't reset right away.

Pandas/Python: How to concatenate two dataframes without duplicates?

I'd like to concatenate two dataframes A, B to a new one without duplicate rows (if rows in B already exist in A, don't add):

Dataframe A: Dataframe B:

   I    II    I    II 0  1    2     5    6 1  3    1     3    1

New Dataframe:

     I    II   0  1    2   1  3    1   2  5    6

How can I do this?

How avoid duplicates in pandas merge?

merge() function to join the two data frames by inner join. Now, add a suffix called 'remove' for newly joined columns that have the same name in both data frames. Use the drop() function to remove the columns with the suffix 'remove'. This will ensure that identical columns don't exist in the new dataframe.

How do I get rid of duplicates in pandas DataFrame?

Consider dataset containing ramen rating. By default, it removes duplicate rows based on all columns. To remove duplicates on specific column(s), use subset . To remove duplicates and keep last occurrences, use keep .

How do you concatenate two different DataFrames in Python?

The concat() function in pandas is used to append either columns or rows from one DataFrame to another. The concat() function does all the heavy lifting of performing concatenation operations along an axis while performing optional set logic (union or intersection) of the indexes (if any) on the other axes.

What is difference between pandas concat and merge?

Concat function concatenates dataframes along rows or columns. We can think of it as stacking up multiple dataframes. Merge combines dataframes based on values in shared columns. Merge function offers more flexibility compared to concat function because it allows combinations based on a condition.

The simplest way is to just do the concatenation, and then drop duplicates.

>>> df1    A  B 0  1  2 1  3  1 >>> df2    A  B 0  5  6 1  3  1 >>> pandas.concat([df1,df2]).drop_duplicates().reset_index(drop=True)    A  B 0  1  2 1  3  1 2  5  6

The reset_index(drop=True) is to fix up the index after the concat() and drop_duplicates(). Without it you will have an index of [0,1,0] instead of [0,1,2]. This could cause problems for further operations on this dataframe down the road if it isn't reset right away.

In case you have a duplicate row already in DataFrame A, then concatenating and then dropping duplicate rows, will remove rows from DataFrame A that you might want to keep.

In this case, you will need to create a new column with a cumulative count, and then drop duplicates, it all depends on your use case, but this is common in time-series data

Here is an example:

df_1 = pd.DataFrame([ {'date':'11/20/2015', 'id':4, 'value':24}, {'date':'11/20/2015', 'id':4, 'value':24}, {'date':'11/20/2015', 'id':6, 'value':34},])  df_2 = pd.DataFrame([ {'date':'11/20/2015', 'id':4, 'value':24}, {'date':'11/20/2015', 'id':6, 'value':14}, ])   df_1['count'] = df_1.groupby(['date','id','value']).cumcount() df_2['count'] = df_2.groupby(['date','id','value']).cumcount()  df_tot = pd.concat([df_1,df_2], ignore_index=False) df_tot = df_tot.drop_duplicates() df_tot = df_tot.drop(['count'], axis=1) >>> df_tot  date    id  value 0   11/20/2015  4   24 1   11/20/2015  4   24 2   11/20/2015  6   34 1   11/20/2015  6   14

Pandas/Python: How to concatenate two dataframes without duplicates?

Tags:

python

pandas

duplicates

MJP

People also ask

2 Answers

Ryan G

marwan

Recent Activity

Donate For Us

Pandas/Python: How to concatenate two dataframes without duplicates?

Tags:

python

pandas

duplicates

MJP

People also ask

2 Answers

Ryan G

marwan

Related questions

Recent Activity

Donate For Us