Duplicating rows where a cell contains multiple pieces of data

Question

I would like to take a dataframe and duplicate certain rows. One column, called name, may have multiple names. An example dataframe is contructed below:

data = [
    ['Joe', '17-11-2018', '2'],
    ['Karen', '17-11-2018', '4'],
    ['Bill, Avery', '17-11-2018', '6'],
    ['Sam', '18-11-2018', '4'],
    ['Alex, Frank', '18-11-2018', '6'],
    ['Chris', '18-11-2018', '8'],
]
df = pd.DataFrame(data, columns = ['name','date','number'])

This yields the following dataframe:

          name        date number
0          Joe  17-11-2018      2
1        Karen  17-11-2018      4
2  Bill, Avery  17-11-2018      6
3          Sam  18-11-2018      4
4  Alex, Frank  18-11-2018      6
5        Chris  18-11-2018      8

I would like to take all rows where there are multiple names (comma-separated) and duplicate them for each individual name. The resulting dataframe should look like this:

    name        date number
0    Joe  17-11-2018      2
1  Karen  17-11-2018      4
2   Bill  17-11-2018      6
3  Avery  17-11-2018      6
4    Sam  18-11-2018      4
5   Alex  18-11-2018      6
6  Frank  18-11-2018      6
7  Chris  18-11-2018      8

BENY · Accepted Answer

After str.split , it become a unnest problem

df['name']=df.name.str.split(',')

unnesting(df,['name'])
Out[97]: 
     name        date number
0     Joe  17-11-2018      2
1   Karen  17-11-2018      4
2    Bill  17-11-2018      6
2   Avery  17-11-2018      6
3     Sam  18-11-2018      4
4    Alex  18-11-2018      6
4   Frank  18-11-2018      6
5   Chris  18-11-2018      8

def unnesting(df, explode):
    idx = df.index.repeat(df[explode[0]].str.len())
    df1 = pd.concat([
        pd.DataFrame({x: np.concatenate(df[x].values)}) for x in explode], axis=1)
    df1.index = idx
    return df1.join(df.drop(explode, 1), how='left')

TechPerson · Answer

Jack. I don't use dataframes much, but the following code should work before df = pd.DataFrame(data, columns = ['name','date','number'])

new_data = []
for item in data:
    if "," in item[0]:
        new_data.append([item[0].split(", ")[0], item[1], item[2]])
        new_data.append([item[0].split(", ")[1], item[1], item[2]])
    else:
        new_data.append(item)

Duplicating rows where a cell contains multiple pieces of data

Tags:

python

python-3.x

pandas

Jack Walsh

2 Answers

BENY

TechPerson

Recent Activity

Donate For Us

Duplicating rows where a cell contains multiple pieces of data

Tags:

python

python-3.x

pandas

Jack Walsh

2 Answers

BENY

TechPerson

Related questions

Recent Activity

Donate For Us