Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge multiple pandas columns into new column

I have a dataframe where some of the columns indicate whether or not a set of survey questions was seen. For example:

Q1_Seen    Q2_Seen    Q3_Seen    Q4_Seen
    Q1a        nan        nan        nan
    nan        Q2a        nan        nan
    nan        nan        Q3d        nan
    nan        Q2c        nan        nan

I would like to collapse these columns into one column, let's say Q_Seen, that would take on the form:

Q_Seen
   Q1a
   Q2a
   Q3d
   Q2c

Note that each row would be mutually exclusive: if there is a value in one of the columns, all the others would be NaN.

I tried doing this with pd.concat, but it didn't seem to be producing correct results.

like image 426
redct Avatar asked Jun 17 '15 18:06

redct


3 Answers

Try this:

df['Q_Seen'] = df.stack().values

>>> df

Q1_Seen    Q2_Seen    Q3_Seen     Q4_Seen     Q_Seen
    Q1a        nan        nan         nan        Q1a
    nan        Q2a        nan         nan        Q2a
    nan        nan        Q3d         nan        Q3d
    nan        Q2c        nan         nan        Q2c
like image 138
vk1011 Avatar answered Sep 27 '22 22:09

vk1011


Using column-wise max() -- i.e. max(axis=1) -- would allow you to collapse all values into a single column:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({"Q1_Seen": ['Q1a', None, None, None], "Q2_Seen": [None, "Q2a", None, "Q2c"], "Q3_Seen": [None, None, "Q3d", None],"Q4_Seen": [None, None, None, None]})

In [3]: df
Out[3]: 
  Q1_Seen Q2_Seen Q3_Seen Q4_Seen
0     Q1a    None    None    None
1    None     Q2a    None    None
2    None    None     Q3d    None
3    None     Q2c    None    None

In [4]: df['Q_Seen'] = df.max(axis=1)

In [5]: df
Out[5]: 
  Q1_Seen Q2_Seen Q3_Seen Q4_Seen Q_Seen
0     Q1a    None    None    None    Q1a
1    None     Q2a    None    None    Q2a
2    None    None     Q3d    None    Q3d
3    None     Q2c    None    None    Q2c
like image 30
emveebeeare Avatar answered Sep 27 '22 21:09

emveebeeare


The following worked for me:

df = pd.DataFrame({'Q1': [1, None, None], 'Q2': [None, 2, None], 'Q3': [None, None, 3]})
df['Q'] = df.concat([df['Q1'], df['Q2'], df['Q3']]).dropna()

There could be a more elegant solution, but this is what first popped into my head.

like image 38
Brett Patterson Avatar answered Sep 27 '22 22:09

Brett Patterson