Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

print the unique values in every column in a pandas dataframe

I have a dataframe (df) and want to print the unique values from each column in the dataframe.

I need to substitute the variable (i) [column name] into the print statement

column_list = df.columns.values.tolist() for column_name in column_list:     print(df."[column_name]".unique() 

Update

When I use this: I get "Unexpected EOF Parsing" with no extra details.

column_list = sorted_data.columns.values.tolist() for column_name in column_list:       print(sorted_data[column_name].unique() 

What is the difference between your syntax YS-L (above) and the below:

for column_name in sorted_data:       print(column_name)       s = sorted_data[column_name].unique()       for i in s:         print(str(i)) 
like image 361
yoshiserry Avatar asked Dec 02 '14 03:12

yoshiserry


People also ask

How do you get a list of all unique values in a column in pandas?

Unique is also referred to as distinct, you can get unique values in the column using pandas Series. unique() function, since this function needs to call on the Series object, use df['column_name'] to get the unique values as a Series.

How do I get unique values in multiple columns in pandas?

Pandas series aka columns has a unique() method that filters out only unique values from a column. The first output shows only unique FirstNames. We can extend this method using pandas concat() method and concat all the desired columns into 1 single column and then find the unique of the resultant column.

How do you find unique values in a data frame?

To get the unique values in multiple columns of a dataframe, we can merge the contents of those columns to create a single series object and then can call unique() function on that series object i.e. It returns the count of unique elements in multiple columns.


1 Answers

It can be written more concisely like this:

for col in df:     print(df[col].unique()) 

Generally, you can access a column of the DataFrame through indexing using the [] operator (e.g. df['col']), or through attribute (e.g. df.col).

Attribute accessing makes the code a bit more concise when the target column name is known beforehand, but has several caveats -- for example, it does not work when the column name is not a valid Python identifier (e.g. df.123), or clashes with the built-in DataFrame attribute (e.g. df.index). On the other hand, the [] notation should always work.

like image 130
YS-L Avatar answered Sep 20 '22 17:09

YS-L