Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient way to get the unique values from 2 or more columns in a Dataframe

Given a matrix from an SFrame:

>>> from sframe import SFrame
>>> sf =SFrame({'x':[1,1,2,5,7], 'y':[2,4,6,8,2], 'z':[2,5,8,6,2]})
>>> sf
Columns:
    x   int
    y   int
    z   int

Rows: 5

Data:
+---+---+---+
| x | y | z |
+---+---+---+
| 1 | 2 | 2 |
| 1 | 4 | 5 |
| 2 | 6 | 8 |
| 5 | 8 | 6 |
| 7 | 2 | 2 |
+---+---+---+
[5 rows x 3 columns]

I want to get the unique values for the x and y columns and I can do it as such:

>>> sf['x'].unique().append(sf['y'].unique()).unique()
dtype: int
Rows: 7
[2, 8, 5, 4, 1, 7, 6]

This way I get the unique values of x and unique values of y then append them and get the unique values of the appended list.

I could also do it as such:

>>> sf['x'].append(sf['y']).unique()
dtype: int
Rows: 7
[2, 8, 5, 4, 1, 7, 6]

But that way, if my x and y columns are huge with lots of duplicates, I would be appending it into a very huge container before getting the unique.

Is there a more efficient way to get the unique values of a combined columns created from 2 or more columns in an SFrame?

What is the equivalence in pandas of the efficent way to get unique values from 2 or more columns in pandas?

like image 639
alvas Avatar asked Aug 03 '16 03:08

alvas


People also ask

How do I get unique values from multiple columns in a data frame?

Pandas series aka columns has a unique() method that filters out only unique values from a column. The first output shows only unique FirstNames. We can extend this method using pandas concat() method and concat all the desired columns into 1 single column and then find the unique of the resultant column.

How do I get unique values from multiple columns in pandas?

You can get unique values in column (multiple columns) from pandas DataFrame using unique() or Series. unique() functions. unique() from Series is used to get unique values from a single column and the other one is used to get from multiple columns.

How can I get unique values from a DataFrame column in a list?

To get unique values from a column in a DataFrame, use the unique(). To count the unique values from a column in a DataFrame, use the nunique().


1 Answers

I dont have SFrame but tested on pd.DataFrame:

  sf[["x", "y"]].stack().value_counts().index.tolist()
  [2, 1, 8, 7, 6, 5, 4]
like image 74
Merlin Avatar answered Oct 06 '22 11:10

Merlin