I have a pandas Dataframe y with 1 million rows and 5 columns.
np.shape(y)
(1037889, 5)
The column values are all 0 or 1. Looks something like this:
y.head()
a, b, c, d, e
0, 0, 1, 0, 0
1, 0, 0, 1, 1
0, 1, 1, 1, 1
0, 0, 0, 0, 0
I want a Dataframe with 1 million rows and 1 column.
np.shape(y)
(1037889, )
where the column is just the 5 columns concatenated together.
New column
0, 0, 1, 0, 0
1, 0, 0, 1, 1
0, 1, 1, 1, 1
0, 0, 0, 0, 0
I keep trying different things like merge
, concat
, dstack
, etc...
but can't seem to figure this out.
By use + operator simply you can concatenate two or multiple text/string columns in pandas DataFrame. Note that when you apply + operator on numeric columns it actually does addition instead of concatenation.
To start, you may use this template to concatenate your column values (for strings only): df['New Column Name'] = df['1st Column Name'] + df['2nd Column Name'] + ... Notice that the plus symbol ('+') is used to perform the concatenation.
Pandas str.cat() is used to concatenate strings to the passed caller series of string. Distinct values from a different series can be passed but the length of both the series has to be same. . str has to be prefixed to differentiate it from the Python's default method.
If you want new column to have all data concatenated to string, it's good case for apply() function:
>>> df = pd.DataFrame({'a':[0,1,0,0], 'b':[0,0,1,0], 'c':[1,0,1,0], 'd':[0,1,1,0], 'c':[0,1,1,0]})
>>> df
a b c d
0 0 0 0 0
1 1 0 1 1
2 0 1 1 1
3 0 0 0 0
>>> df2 = df.apply(lambda row: ','.join(map(str, row)), axis=1)
>>> df2
0 0,0,0,0
1 1,0,1,1
2 0,1,1,1
3 0,0,0,0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With