Pandas group by one column concatenate values of other column as delimited list

Question

I want to group all qualifications(as a delimiter separated list) against a job title.

In the following dataset, same type of job (.net developer) require different set of qualifications and another job does not require any qualification.

JobID    Job Title      Qualification ID Qualification Name
34455226 .Net Developer ICT50715         Diploma of Software Development
34455226 .Net Developer ICT40515         Certificate IV in Programming
34466933 .Net Developer ICT50715         Diploma of Software Development
34466111 .Net Developer ICT50655         Diploma of Software Testing
34479964 Snr Finance Systems Analyst

I want a consolidated view of all unique qualifications that might be required for a particular type of job as below

Job Title                     Qualifications
.Net Developer                Diploma of Software Development,Certificate IV in Programming,Diploma of Software Testing
Snr Finance Systems Analyst   N/A

This is what I have done so far.

def f(x):
 return pd.Series(dict(Qualifications = ",".join(map(str, x["Qualification Name"]))))

df_jobs_qualifications\
    .groupby("Job Title")[['Qualification Name']]\
    .apply(f)

But it gives me repeated qualification names (see below - Diploma of Software Development is repeated) whereas I want unique qualification names

Job Title                     Qualifications
.Net Developer                Diploma of Software Development,Certificate IV in Programming,Diploma of Software Development,Diploma of Software Testing
Snr Finance Systems Analyst   N/A

UPDATE

My question is different from this question as in I do not get unique values even after following the steps mentioned on the before mentioned question enter image description here

jezrael · Accepted Answer

If need unique strings:

You can add set or unique and if possible some Nones or NaNs add dropna:

df1 = (df.groupby('Job Title')['Qualification Name']
       .apply(lambda x: ','.join(set(x.dropna())))
       .reset_index())

print (df1)
                     Job Title  \
0               .Net Developer   
1  Snr Finance Systems Analyst   

                                  Qualification Name  
0  Diploma of Software Development,Diploma of Sof...  
1

If order is important:

df1 = (df.groupby('Job Title')['Qualification Name']
       .apply(lambda x: ','.join(x.dropna().unique()))
       .reset_index())

print (df1)
                     Job Title  \
0               .Net Developer   
1  Snr Finance Systems Analyst   

                                  Qualification Name  
0  Diploma of Software Development,Certificate IV...  
1

If want NaNs for no values:

def f(x):
    val = set(x.dropna())
    if len(val) > 0:
        val = ','.join(val)
    else:
        val = np.nan
    return val

df2 = df.groupby('Job Title')['Qualification Name'].apply(f).reset_index()
print (df2)
                     Job Title  \
0               .Net Developer   
1  Snr Finance Systems Analyst   

                                  Qualification Name  
0  Diploma of Software Development,Diploma of Sof...  
1                                                NaN

If need unique lists:

df2 = (df.groupby('Job Title')['Qualification Name']
       .apply(lambda x: list(set(x)))
       .reset_index())

print (df2)
                     Job Title  \
0               .Net Developer   
1  Snr Finance Systems Analyst   

                                  Qualification Name  
0  [Diploma of Software Development, Diploma of S...  
1                                             [None]  

df2 = (df.groupby('Job Title')['Qualification Name']
       .apply(lambda x: list(x.unique()))
       .reset_index())

print (df2)
                     Job Title  \
0               .Net Developer   
1  Snr Finance Systems Analyst   

                                  Qualification Name  
0  [Diploma of Software Development, Certificate ...  
1                                             [None]

Pandas group by one column concatenate values of other column as delimited list

Tags:

python

pandas

group-by

pandas-groupby

Ali

1 Answers

jezrael

Recent Activity

Donate For Us

Pandas group by one column concatenate values of other column as delimited list

Tags:

python

pandas

group-by

pandas-groupby

Ali

1 Answers

jezrael

Related questions

Recent Activity

Donate For Us