I have a df with two columns and I want to combine both columns ignoring the NaN values. The catch is that sometimes both columns have NaN values in which case I want the new column to also have NaN. Here's the example:
df = pd.DataFrame({'foodstuff':['apple-martini', 'apple-pie', None, None, None], 'type':[None, None, 'strawberry-tart', 'dessert', None]})
df
Out[10]:
foodstuff type
0 apple-martini None
1 apple-pie None
2 None strawberry-tart
3 None dessert
4 None None
I tried to use fillna
and solve this :
df['foodstuff'].fillna('') + df['type'].fillna('')
and I got :
0 apple-martini
1 apple-pie
2 strawberry-tart
3 dessert
4
dtype: object
The row 4 has become a blank value. What I want in this situation is a NaN value since both the combining columns are NaNs.
0 apple-martini
1 apple-pie
2 strawberry-tart
3 dessert
4 None
dtype: object
By use + operator simply you can combine/merge two or multiple text/string columns in pandas DataFrame. Note that when you apply + operator on numeric columns it actually does addition instead of concatenation.
Merge Python Pandas dataframe with a common column and set NaN for unmatched values. To merge two Pandas DataFrame with common column, use the merge() function and set the ON parameter as the column name. To set NaN for unmatched values, use the “how” parameter and set it left or right.
Use fillna
on one column with the fill values being the other column:
df['foodstuff'].fillna(df['type'])
The resulting output:
0 apple-martini 1 apple-pie 2 strawberry-tart 3 dessert 4 None
you can use the combine
method with a lambda
:
df['foodstuff'].combine(df['type'], lambda a, b: ((a or "") + (b or "")) or None, None)
(a or "")
returns ""
if a is None
then the same logic is applied on the concatenation (where the result would be None
if the concatenation is an empty string).
fillna
both columns together sum(1)
to add themreplace('', np.nan)
df.fillna('').sum(1).replace('', np.nan)
0 apple-martini
1 apple-pie
2 strawberry-tart
3 dessert
4 NaN
dtype: object
If you deal with columns that contain something where the others don't and vice-versa, a one-liner that does well the job is
>>> df.rename(columns={'type': 'foodstuff'}).stack().unstack()
foodstuff
0 apple-martini
1 apple-pie
2 strawberry-tart
3 dessert
... which solution also generalises well if you have multiple columns to "intricate", as long as you can define your ~.rename
mapping. The intention behind such renaming is to create duplicates that ~.stack().unstack()
will then process for you.
As explained, this solution only suits configuration with orthogonal columns, i.e. columns that never are simultaneously valued.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With