I would like to take a dataframe and duplicate certain rows.
One column, called name, may have multiple names.
An example dataframe is contructed below:
data = [
['Joe', '17-11-2018', '2'],
['Karen', '17-11-2018', '4'],
['Bill, Avery', '17-11-2018', '6'],
['Sam', '18-11-2018', '4'],
['Alex, Frank', '18-11-2018', '6'],
['Chris', '18-11-2018', '8'],
]
df = pd.DataFrame(data, columns = ['name','date','number'])
This yields the following dataframe:
name date number
0 Joe 17-11-2018 2
1 Karen 17-11-2018 4
2 Bill, Avery 17-11-2018 6
3 Sam 18-11-2018 4
4 Alex, Frank 18-11-2018 6
5 Chris 18-11-2018 8
I would like to take all rows where there are multiple names (comma-separated) and duplicate them for each individual name. The resulting dataframe should look like this:
name date number
0 Joe 17-11-2018 2
1 Karen 17-11-2018 4
2 Bill 17-11-2018 6
3 Avery 17-11-2018 6
4 Sam 18-11-2018 4
5 Alex 18-11-2018 6
6 Frank 18-11-2018 6
7 Chris 18-11-2018 8
After str.split , it become a unnest problem
df['name']=df.name.str.split(',')
unnesting(df,['name'])
Out[97]:
name date number
0 Joe 17-11-2018 2
1 Karen 17-11-2018 4
2 Bill 17-11-2018 6
2 Avery 17-11-2018 6
3 Sam 18-11-2018 4
4 Alex 18-11-2018 6
4 Frank 18-11-2018 6
5 Chris 18-11-2018 8
def unnesting(df, explode):
idx = df.index.repeat(df[explode[0]].str.len())
df1 = pd.concat([
pd.DataFrame({x: np.concatenate(df[x].values)}) for x in explode], axis=1)
df1.index = idx
return df1.join(df.drop(explode, 1), how='left')
Jack. I don't use dataframes much, but the following code should work before df = pd.DataFrame(data, columns = ['name','date','number'])
new_data = []
for item in data:
if "," in item[0]:
new_data.append([item[0].split(", ")[0], item[1], item[2]])
new_data.append([item[0].split(", ")[1], item[1], item[2]])
else:
new_data.append(item)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With