Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

merging multiple columns into one columns in pandas

I have a dataframe called ref(first dataframe) with columns c1, c2 ,c3 and c4.

ref= pd.DataFrame([[1,3,.3,7],[0,4,.5,4.5],[2,5,.6,3]], columns=['c1','c2','c3','c4'])
print(ref)
   c1  c2   c3   c4
0   1   3  0.3  7.0
1   0   4  0.5  4.5
2   2   5  0.6  3.0

I wanted to create a new column i.e, c5 ( second dataframe) that has all the values from columns c1,c2,c3 and c4.

I tried concat, merge columns but i cannot get it work.

Please let me know if you have a solutions?

pic

like image 446
Sasihci Avatar asked Jan 13 '17 05:01

Sasihci


People also ask

How do I combine multiple columns into one column in pandas?

To combine the values of all the column and append them into a single column, we will use apply() method inside which we will write our expression to do the same. Whenever we want to perform some operation on the entire DataFrame, we use apply() method.

How do I merge 3 columns in pandas?

To merge two pandas DataFrames on multiple columns use pandas. merge() method. merge() is considered more versatile and flexible and we also have the same method in DataFrame.

How do I merge two DataFrames in pandas based on multiple common column?

To merge two Pandas DataFrame with common column, use the merge() function and set the ON parameter as the column name.

Can you group by multiple columns in pandas?

groupby() can take the list of columns to group by multiple columns and use the aggregate functions to apply single or multiple aggregations at the same time.


2 Answers

You can use unstack for creating Series from DataFrame and then concat to original:

print (pd.concat([ref, ref.unstack().reset_index(drop=True).rename('c5')], axis=1))
     c1   c2   c3   c4   c5
0   1.0  3.0  0.3  7.0  1.0
1   0.0  4.0  0.5  4.5  0.0
2   2.0  5.0  0.6  3.0  2.0
3   NaN  NaN  NaN  NaN  3.0
4   NaN  NaN  NaN  NaN  4.0
5   NaN  NaN  NaN  NaN  5.0
6   NaN  NaN  NaN  NaN  0.3
7   NaN  NaN  NaN  NaN  0.5
8   NaN  NaN  NaN  NaN  0.6
9   NaN  NaN  NaN  NaN  7.0
10  NaN  NaN  NaN  NaN  4.5
11  NaN  NaN  NaN  NaN  3.0

Alternative solution for creating Series is convert df to numpy array by values and then reshape by ravel:

    print (pd.concat([ref, pd.Series(ref.values.ravel('F'), name='c5')], axis=1))
         c1   c2   c3   c4   c5
    0   1.0  3.0  0.3  7.0  1.0
    1   0.0  4.0  0.5  4.5  0.0
    2   2.0  5.0  0.6  3.0  2.0
    3   NaN  NaN  NaN  NaN  3.0
    4   NaN  NaN  NaN  NaN  4.0
    5   NaN  NaN  NaN  NaN  5.0
    6   NaN  NaN  NaN  NaN  0.3
    7   NaN  NaN  NaN  NaN  0.5
    8   NaN  NaN  NaN  NaN  0.6
    9   NaN  NaN  NaN  NaN  7.0
    10  NaN  NaN  NaN  NaN  4.5
    11  NaN  NaN  NaN  NaN  3.0
like image 161
jezrael Avatar answered Nov 09 '22 21:11

jezrael


using join + ravel('F')

ref.join(pd.Series(ref.values.ravel('F')).to_frame('c5'), how='right')

using join + T.ravel()

ref.join(pd.Series(ref.values.T.ravel()).to_frame('c5'), how='right')

pd.concat + T.stack() + rename

pd.concat([ref, ref.T.stack().reset_index(drop=True).rename('c5')], axis=1)

way too many transposes + append

ref.T.append(ref.T.stack().reset_index(drop=True).rename('c5')).T

combine_first + ravel('F') <--- my favorite

ref.combine_first(pd.Series(ref.values.ravel('F')).to_frame('c5'))

All yield

     c1   c2   c3   c4   c5
0   1.0  3.0  0.3  7.0  1.0
1   0.0  4.0  0.5  4.5  0.0
2   2.0  5.0  0.6  3.0  2.0
3   NaN  NaN  NaN  NaN  3.0
4   NaN  NaN  NaN  NaN  4.0
5   NaN  NaN  NaN  NaN  5.0
6   NaN  NaN  NaN  NaN  0.3
7   NaN  NaN  NaN  NaN  0.5
8   NaN  NaN  NaN  NaN  0.6
9   NaN  NaN  NaN  NaN  7.0
10  NaN  NaN  NaN  NaN  4.5
11  NaN  NaN  NaN  NaN  3.0
like image 26
piRSquared Avatar answered Nov 09 '22 22:11

piRSquared