I have a df with two columns and I want to combine both columns ignoring the NaN values. The catch is that sometimes both columns have NaN values in which case I want the new column to also have NaN. Here's the example: <pre class="prettyprint"><code>df = pd.DataFrame({'foodstuff':['apple-martini', 'apple-pie', None, None, None], 'type':[None, None, 'strawberry-tart', 'dessert', None]}) df Out[10]: foodstuff type 0 apple-martini None 1 apple-pie None 2 None strawberry-tart 3 None dessert 4 None None </code></pre> I tried to use <code>fillna</code> and solve this : <pre class="prettyprint"><code>df['foodstuff'].fillna('') + df['type'].fillna('') </code></pre> and I got : <pre class="prettyprint"><code>0 apple-martini 1 apple-pie 2 strawberry-tart 3 dessert 4 dtype: object </code></pre> The row 4 has become a blank value. What I want in this situation is a NaN value since both the combining columns are NaNs. <pre class="prettyprint"><code>0 apple-martini 1 apple-pie 2 strawberry-tart 3 dessert 4 None dtype: object </code></pre>

Use <code>fillna</code> on one column with the fill values being the other column: <pre class="prettyprint"><code>df['foodstuff'].fillna(df['type']) </code></pre> The resulting output: <pre class="prettyprint"><code>0 apple-martini 1 apple-pie 2 strawberry-tart 3 dessert 4 None </code></pre>

you can use the <code>combine</code> method with a <code>lambda</code>: <pre class="prettyprint"><code>df['foodstuff'].combine(df['type'], lambda a, b: ((a or "") + (b or "")) or None, None) </code></pre> <code>(a or "")</code> returns <code>""</code> if a is <code>None</code> then the same logic is applied on the concatenation (where the result would be <code>None</code> if the concatenation is an empty string).

<ul> <li> <code>fillna</code> both columns together </li> <li> <code>sum(1)</code> to add them</li> <li><code>replace('', np.nan)</code></li> </ul> <hr> <pre class="prettyprint"><code>df.fillna('').sum(1).replace('', np.nan) 0 apple-martini 1 apple-pie 2 strawberry-tart 3 dessert 4 NaN dtype: object </code></pre>

pandas combine two columns with null values

Tags:

python

pandas

dataframe

nonetype

I have a df with two columns and I want to combine both columns ignoring the NaN values. The catch is that sometimes both columns have NaN values in which case I want the new column to also have NaN. Here's the example:

df = pd.DataFrame({'foodstuff':['apple-martini', 'apple-pie', None, None, None], 'type':[None, None, 'strawberry-tart', 'dessert', None]})

df
Out[10]:
foodstuff   type
0   apple-martini   None
1   apple-pie   None
2   None    strawberry-tart
3   None    dessert
4   None    None

I tried to use fillna and solve this :

df['foodstuff'].fillna('') + df['type'].fillna('')

and I got :

0      apple-martini
1          apple-pie
2    strawberry-tart
3            dessert
4                   
dtype: object

The row 4 has become a blank value. What I want in this situation is a NaN value since both the combining columns are NaNs.

0      apple-martini
1          apple-pie
2    strawberry-tart
3            dessert
4            None       
dtype: object

664

asked Jan 03 '17 17:01

vagabond

4 Answers

Use fillna on one column with the fill values being the other column:

df['foodstuff'].fillna(df['type'])

The resulting output:

0      apple-martini 1          apple-pie 2    strawberry-tart 3            dessert 4               None

124

answered Sep 20 '22 02:09

root

you can use the combine method with a lambda:

df['foodstuff'].combine(df['type'], lambda a, b: ((a or "") + (b or "")) or None, None)

(a or "") returns "" if a is None then the same logic is applied on the concatenation (where the result would be None if the concatenation is an empty string).

answered Sep 22 '22 02:09

sirfz

fillna both columns together
sum(1) to add them
replace('', np.nan)

df.fillna('').sum(1).replace('', np.nan)

0      apple-martini
1          apple-pie
2    strawberry-tart
3            dessert
4                NaN
dtype: object

answered Sep 22 '22 02:09

piRSquared

If you deal with columns that contain something where the others don't and vice-versa, a one-liner that does well the job is

>>> df.rename(columns={'type': 'foodstuff'}).stack().unstack()
         foodstuff
0    apple-martini
1        apple-pie
2  strawberry-tart
3          dessert

... which solution also generalises well if you have multiple columns to "intricate", as long as you can define your ~.rename mapping. The intention behind such renaming is to create duplicates that ~.stack().unstack() will then process for you.

As explained, this solution only suits configuration with orthogonal columns, i.e. columns that never are simultaneously valued.

answered Sep 19 '22 02:09

keepAlive

Related questions
                            
                                Beautiful Soup find children for particular div
                            
                                How can I check the existence of attributes and tags in XML before parsing?
                            
                                Unpivot Pandas Data
                            
                                Using openpyxl to read file from memory
                            
                                How to remove parentheses and all data within using Pandas/Python?
                            
                                Calculating the averages for each KEY in a Pairwise (K,V) RDD in Spark with Python
                            
                                Fast calculation of Pareto front in Python
                            
                                Seaborn countplot with normalized y axis per group
                            
                                Why were literal formatted strings (f-strings) so slow in Python 3.6 alpha? (now fixed in 3.6 stable)
                            
                                python theading.Timer: how to pass argument to the callback?
                            
                                Python set datetime hour to be a specific time
                            
                                Jupyter notebook command does not work on Mac
                            
                                Why are default arguments evaluated at definition time?
                            
                                How to use OR using Django's model filter system?
                            
                                Assigning to columns in NumPy?
                            
                                Tkinter example code for multiple windows, why won't buttons load correctly?
                            
                                Perform a reverse cumulative sum on a numpy array
                            
                                Search python docs offline?
                            
                                Pycharm's code style inspection: ignore/switch off specific rules
                            
                                How to select a range of values in a pandas dataframe column?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With