Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract unique value with multiple columns from DataFrame

I have a dataframe where I want to extract values from two columns but the criteria set is unique values from one of the columns. In the image below, I want to extract unique values of 'education' along with its corresponding values from 'education-num'. I can easily extract the unique values with df['education'].unique() and I am stuck with not being able to extract the 'education-num'.

image of the dataframe.

(Originally the task was to compute the population of people with education of Bachelors, Masters and Doctorate and I assume this would be easier when comparing the 'education-num' rather than logical operators on string. But if there's any way we could do it directly from the 'education' that would also be helpful.

Edit: Turns out the Dataframe.isin helps to select rows by the list of string as given in the solution here.)

P.S. stack-overflow didn't allow me to post the image directly and posted a link to it instead...😒

like image 202
HoppyHOP Avatar asked May 04 '26 08:05

HoppyHOP


1 Answers

Select columns by subset and call DataFrame.drop_duplicates:

df1 = df[['education', 'education-num']].drop_duplicates()

If need count population use:

df2 = df.groupby(['education', 'education-num']).size().reset_index(name='count')
like image 166
jezrael Avatar answered May 06 '26 22:05

jezrael