I want to group all qualifications(as a delimiter separated list) against a job title.
In the following dataset, same type of job (.net developer) require different set of qualifications and another job does not require any qualification.
JobID Job Title Qualification ID Qualification Name
34455226 .Net Developer ICT50715 Diploma of Software Development
34455226 .Net Developer ICT40515 Certificate IV in Programming
34466933 .Net Developer ICT50715 Diploma of Software Development
34466111 .Net Developer ICT50655 Diploma of Software Testing
34479964 Snr Finance Systems Analyst
I want a consolidated view of all unique qualifications that might be required for a particular type of job as below
Job Title Qualifications
.Net Developer Diploma of Software Development,Certificate IV in Programming,Diploma of Software Testing
Snr Finance Systems Analyst N/A
This is what I have done so far.
def f(x):
return pd.Series(dict(Qualifications = ",".join(map(str, x["Qualification Name"]))))
df_jobs_qualifications\
.groupby("Job Title")[['Qualification Name']]\
.apply(f)
But it gives me repeated qualification names (see below - Diploma of Software Development is repeated) whereas I want unique qualification names
Job Title Qualifications
.Net Developer Diploma of Software Development,Certificate IV in Programming,Diploma of Software Development,Diploma of Software Testing
Snr Finance Systems Analyst N/A
UPDATE
My question is different from this question as in I do not get unique values even after following the steps mentioned on the before mentioned question
If need unique strings:
You can add set
or unique
and if possible some None
s or NaN
s add dropna
:
df1 = (df.groupby('Job Title')['Qualification Name']
.apply(lambda x: ','.join(set(x.dropna())))
.reset_index())
print (df1)
Job Title \
0 .Net Developer
1 Snr Finance Systems Analyst
Qualification Name
0 Diploma of Software Development,Diploma of Sof...
1
If order is important:
df1 = (df.groupby('Job Title')['Qualification Name']
.apply(lambda x: ','.join(x.dropna().unique()))
.reset_index())
print (df1)
Job Title \
0 .Net Developer
1 Snr Finance Systems Analyst
Qualification Name
0 Diploma of Software Development,Certificate IV...
1
If want NaN
s for no values:
def f(x):
val = set(x.dropna())
if len(val) > 0:
val = ','.join(val)
else:
val = np.nan
return val
df2 = df.groupby('Job Title')['Qualification Name'].apply(f).reset_index()
print (df2)
Job Title \
0 .Net Developer
1 Snr Finance Systems Analyst
Qualification Name
0 Diploma of Software Development,Diploma of Sof...
1 NaN
If need unique lists:
df2 = (df.groupby('Job Title')['Qualification Name']
.apply(lambda x: list(set(x)))
.reset_index())
print (df2)
Job Title \
0 .Net Developer
1 Snr Finance Systems Analyst
Qualification Name
0 [Diploma of Software Development, Diploma of S...
1 [None]
df2 = (df.groupby('Job Title')['Qualification Name']
.apply(lambda x: list(x.unique()))
.reset_index())
print (df2)
Job Title \
0 .Net Developer
1 Snr Finance Systems Analyst
Qualification Name
0 [Diploma of Software Development, Certificate ...
1 [None]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With