Given the following:
df = pd.DataFrame({'col1' : ["a","b"],
'col2' : ["ab",np.nan], 'col3' : ["w","e"]})
I would like to be able to create a column that joins the content of all three columns into one string, separated by the character "*" while ignoring NaN
.
so that I would get something like that for example:
a*ab*w
b*e
Any ideas?
Just realised there were a few additional requirements, I needed the method to work with ints and floats and also to be able to deal with special characters (e.g., letters of Spanish alphabet).
Pandas str.cat() is used to concatenate strings to the passed caller series of string. Distinct values from a different series can be passed but the length of both the series has to be same. . str has to be prefixed to differentiate it from the Python's default method.
By use + operator simply you can concatenate two or multiple text/string columns in pandas DataFrame. Note that when you apply + operator on numeric columns it actually does addition instead of concatenation.
Concatenating string columns in small datasets For relatively small datasets (up to 100–150 rows) you can use pandas.Series.str.cat() method that is used to concatenate strings in the Series using the specified separator (by default the separator is set to '' ).
In [68]:
df['new_col'] = df.apply(lambda x: '*'.join(x.dropna().values.tolist()), axis=1)
df
Out[68]:
col1 col2 col3 new_col
0 a ab w a*ab*w
1 b NaN e b*e
UPDATE
If you have ints or float you can convert these to str
first:
In [74]:
df = pd.DataFrame({'col1' : ["a","b",3],
'col2' : ["ab",np.nan, 4], 'col3' : ["w","e", 6]})
df
Out[74]:
col1 col2 col3
0 a ab w
1 b NaN e
2 3 4 6
In [76]:
df['new_col'] = df.apply(lambda x: '*'.join(x.dropna().astype(str).values), axis=1)
df
Out[76]:
col1 col2 col3 new_col
0 a ab w a*ab*w
1 b NaN e b*e
2 3 4 6 3*4*6
Another update
In [81]:
df = pd.DataFrame({'col1' : ["a","b",3,'ñ'],
'col2' : ["ab",np.nan, 4,'ü'], 'col3' : ["w","e", 6,'á']})
df
Out[81]:
col1 col2 col3
0 a ab w
1 b NaN e
2 3 4 6
3 ñ ü á
In [82]:
df['new_col'] = df.apply(lambda x: '*'.join(x.dropna().astype(str).values), axis=1)
df
Out[82]:
col1 col2 col3 new_col
0 a ab w a*ab*w
1 b NaN e b*e
2 3 4 6 3*4*6
3 ñ ü á ñ*ü*á
My code still works with Spanish characters
You can use dropna()
df['col4'] = df.apply(lambda row: '*'.join(row.dropna()), axis=1)
UPDATE:
Since, you need to convert numbers and special chars too, you can use astype(unicode)
In [37]: df = pd.DataFrame({'col1': ["a", "b"], 'col2': ["ab", np.nan], "col3": [3, u'\xf3']})
In [38]: df.apply(lambda row: '*'.join(row.dropna().astype(unicode)), axis=1)
Out[38]:
0 a*ab*3
1 b*ó
dtype: object
In [39]: df['col4'] = df.apply(lambda row: '*'.join(row.dropna().astype(unicode)), axis=1)
In [40]: df
Out[40]:
col1 col2 col3 col4
0 a ab 3 a*ab*3
1 b NaN ó b*ó
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With