Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

combining different columns

I have a dataframe like:

df = pd.DataFrame({'id':[1,2,3,4,5,6,7],
                'vote':[5,4,5,1,10,1,9],
                'doggo': [None,"doggo",None,None,"doggo",None,None], 
                'floofer': ["floofer",None,None,"floofer",None,None,None],
                'pupper': [None,None,"pupper",None,None,None,None],
               'puppo':[None,None,None,None,None,None,"puppo"]})

I want to combine last 4 columns and gnerate:

df = pd.DataFrame({'id':[1,2,3,4,5,6,7],
                    'vote':[5,4,5,1,10,1,9],
                    'categories': ["floofer","doggo","pupper","floofer","doggo",None,"puppo"]})

Any guidance is appreciated.

like image 613
Pouyan Ebrahimi Avatar asked Feb 21 '26 15:02

Pouyan Ebrahimi


1 Answers

Solution if each row has only one not None value per categorical columns:

cols = ['doggo','floofer','pupper','puppo']
cols1 = df.columns.difference(cols)
df2 = df[cols1].join(df[cols].ffill(axis=1).iloc[:, -1].rename('Categories'))
print (df2)
   id  vote Categories
0   1     5    floofer
1   2     4      doggo
2   3     5     pupper
3   4     1    floofer
4   5    10      doggo
5   6     1       None
6   7     9      puppo

Explanation:

First select only columns with catagorical data and forward fill missing values - expected data are in last column:

print (df[cols].ffill(axis=1))
  doggo  floofer   pupper    puppo
0   None  floofer  floofer  floofer
1  doggo    doggo    doggo    doggo
2   None     None   pupper   pupper
3   None  floofer  floofer  floofer
4  doggo    doggo    doggo    doggo
5   None     None     None     None
6   None     None     None    puppo

Select last column by position:

print (df[cols].ffill(axis=1).iloc[:, -1])
0    floofer
1      doggo
2     pupper
3    floofer
4      doggo
5       None
6      puppo
Name: puppo, dtype: object

Solution if multiple values - data are created from columns names of categorical columns:

df = pd.DataFrame({'id':[1,2,3,4,5,6,7],
                'vote':[5,4,5,1,10,1,9],
                'doggo': [None,"doggo1",None,"doggo2","doggo3",None,None], 
                'floofer': ["floofer1",None,None,"floofer2",None,None,None],
                'pupper': [None,None,"pupper1",None,None,None,None],
               'puppo':["puppo1",None,None,None,None,None,"puppo2"]})
print (df)
   id  vote   doggo   floofer   pupper   puppo
0   1     5    None  floofer1     None  puppo1
1   2     4  doggo1      None     None    None
2   3     5    None      None  pupper1    None
3   4     1  doggo2  floofer2     None    None
4   5    10  doggo3      None     None    None
5   6     1    None      None     None    None
6   7     9    None      None     None  puppo2


s = (df[cols].notnull()
            .dot(pd.Index(cols) + ', ')
            .str.strip(', ')
            .rename('Categories')
            .replace('', np.nan)
            )
df = df[cols1].join(s)
print (df)
   id  vote      Categories
0   1     5  floofer, puppo
1   2     4           doggo
2   3     5          pupper
3   4     1  doggo, floofer
4   5    10           doggo
5   6     1             NaN
6   7     9           puppo

Another solution, expected output is not from columns names:

s = pd.Series(df[cols].add(', ').fillna('').values.sum(axis=1), 
                  index=df.index, name='Categories').str.strip(', ')
df = df[cols1].join(s)
print (df)
   id  vote        Categories
0   1     5  floofer1, puppo1
1   2     4            doggo1
2   3     5           pupper1
3   4     1  doggo2, floofer2
4   5    10            doggo3
5   6     1                  
6   7     9            puppo2
like image 78
jezrael Avatar answered Feb 23 '26 05:02

jezrael