Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas combine two columns with null values

I have a df with two columns and I want to combine both columns ignoring the NaN values. The catch is that sometimes both columns have NaN values in which case I want the new column to also have NaN. Here's the example:

df = pd.DataFrame({'foodstuff':['apple-martini', 'apple-pie', None, None, None], 'type':[None, None, 'strawberry-tart', 'dessert', None]})

df
Out[10]:
foodstuff   type
0   apple-martini   None
1   apple-pie   None
2   None    strawberry-tart
3   None    dessert
4   None    None

I tried to use fillna and solve this :

df['foodstuff'].fillna('') + df['type'].fillna('')

and I got :

0      apple-martini
1          apple-pie
2    strawberry-tart
3            dessert
4                   
dtype: object

The row 4 has become a blank value. What I want in this situation is a NaN value since both the combining columns are NaNs.

0      apple-martini
1          apple-pie
2    strawberry-tart
3            dessert
4            None       
dtype: object
like image 664
vagabond Avatar asked Jan 03 '17 17:01

vagabond


People also ask

How do I merge two columns in pandas?

By use + operator simply you can combine/merge two or multiple text/string columns in pandas DataFrame. Note that when you apply + operator on numeric columns it actually does addition instead of concatenation.

Does pandas merge on NaN values?

Merge Python Pandas dataframe with a common column and set NaN for unmatched values. To merge two Pandas DataFrame with common column, use the merge() function and set the ON parameter as the column name. To set NaN for unmatched values, use the “how” parameter and set it left or right.


4 Answers

Use fillna on one column with the fill values being the other column:

df['foodstuff'].fillna(df['type']) 

The resulting output:

0      apple-martini 1          apple-pie 2    strawberry-tart 3            dessert 4               None 
like image 124
root Avatar answered Sep 20 '22 02:09

root


you can use the combine method with a lambda:

df['foodstuff'].combine(df['type'], lambda a, b: ((a or "") + (b or "")) or None, None) 

(a or "") returns "" if a is None then the same logic is applied on the concatenation (where the result would be None if the concatenation is an empty string).

like image 26
sirfz Avatar answered Sep 22 '22 02:09

sirfz


  • fillna both columns together
  • sum(1) to add them
  • replace('', np.nan)

df.fillna('').sum(1).replace('', np.nan)

0      apple-martini
1          apple-pie
2    strawberry-tart
3            dessert
4                NaN
dtype: object
like image 40
piRSquared Avatar answered Sep 22 '22 02:09

piRSquared


If you deal with columns that contain something where the others don't and vice-versa, a one-liner that does well the job is

>>> df.rename(columns={'type': 'foodstuff'}).stack().unstack()
         foodstuff
0    apple-martini
1        apple-pie
2  strawberry-tart
3          dessert

... which solution also generalises well if you have multiple columns to "intricate", as long as you can define your ~.rename mapping. The intention behind such renaming is to create duplicates that ~.stack().unstack() will then process for you.

As explained, this solution only suits configuration with orthogonal columns, i.e. columns that never are simultaneously valued.

like image 30
keepAlive Avatar answered Sep 19 '22 02:09

keepAlive