Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: unique dataframe

Tags:

python

pandas

I have a DataFrame that has duplicated rows. I'd like to get a DataFrame with a unique index and no duplicates. It's ok to discard the duplicated values. Is this possible? Would it be a done by groupby?

like image 768
Adam Greenhall Avatar asked Sep 07 '12 17:09

Adam Greenhall


People also ask

What does unique () do in pandas?

The unique function in pandas is used to find the unique values from a series. A series is a single column of a data frame. We can use the unique function on any possible set of elements in Python. It can be used on a series of strings, integers, tuples, or mixed elements.

What does unique () pandas Return?

unique. Return unique values based on a hash table. Uniques are returned in order of appearance.

How do I get unique records in pandas?

And you can use the following syntax to select unique rows across specific columns in a pandas DataFrame: df = df. drop_duplicates(subset=['col1', 'col2', ...])

How do I display only unique values in pandas DataFrame?

You can use the pandas unique() function to get the different unique values present in a column. It returns a numpy array of the unique values in the column.


2 Answers

In [29]: df.drop_duplicates()
Out[29]: 
   b  c
1  2  3
3  4  0
7  5  9
like image 98
Wouter Overmeire Avatar answered Oct 05 '22 20:10

Wouter Overmeire


Figured out one way to do it by reading the split-apply-combine documentation examples.

df = pandas.DataFrame({'b':[2,2,4,5], 'c': [3,3,0,9]}, index=[1,1,3,7])
df_unique = df.groupby(level=0).first()

df
   b  c
1  2  3
1  2  3
3  4  0
7  5  9

df_unique
   b  c
1  2  3
3  4  0
7  5  9
like image 37
Adam Greenhall Avatar answered Oct 05 '22 20:10

Adam Greenhall