pandas merge dataframe with NaN (or "unknown") for missing values

Tags:

I have 2 dataframes, one of which has supplemental information for some (but not all) of the rows in the other.

names = df({'names':['bob','frank','james','tim','ricardo','mike','mark','joan','joe'],             'position':['dev','dev','dev','sys','sys','sys','sup','sup','sup']}) info = df({'names':['joe','mark','tim','frank'],            'classification':['thief','thief','good','thief']})

I would like to take the classification column from the info dataframe above and add it to the names dataframe above. However, when I do combined = pd.merge(names, info) the resulting dataframe is only 4 rows long. All of the rows that do not have supplemental info are dropped.

Ideally, I would have the values in those missing columns set to unknown. Resulting in a dataframe where some people are theives, some are good, and the rest are unknown.

EDIT: One of the first answers I received suggested using merge outter which seems to do some weird things. Here is a code sample:

names = df({'names':['bob','frank','bob','bob','bob''james','tim','ricardo','mike','mark','joan','joe'],             'position':['dev','dev','dev','dev','dev','dev''sys','sys','sys','sup','sup','sup']}) info = df({'names':['joe','mark','tim','frank','joe','bill'],            'classification':['thief','thief','good','thief','good','thief']}) what = pd.merge(names, info, how="outer") what.fillna("unknown")

The strange thing is that in the output I'll get a row where the resulting name is "bobjames" and another where position is "devsys". Finally, even though bill does not appear in the names dataframe it shows up in the resulting dataframe. So I really need a way to say lookup a value in this other dataframe and if you find something tack on those columns.

447

asked Jan 27 '15 16:01

Kevin Thompson

2 Answers

In case you are still looking for an answer for this:

The "strange" things that you described are due to some minor errors in your code. For example, the first (appearance of "bobjames" and "devsys") is due to the fact that you don't have a comma between those two values in your source dataframes. And the second is because pandas doesn't care about the name of your dataframe but cares about the name of your columns when merging (you have a dataframe called "names" but also your columns are called "names"). Otherwise, it seems that the merge is doing exactly what you are looking for:

import pandas as pd names = pd.DataFrame({'names':['bob','frank','bob','bob','bob', 'james','tim','ricardo','mike','mark','joan','joe'],                        'position':['dev','dev','dev','dev','dev','dev', 'sys','sys','sys','sup','sup','sup']})  info = pd.DataFrame({'names':['joe','mark','tim','frank','joe','bill'],                      'classification':['thief','thief','good','thief','good','thief']}) what = pd.merge(names, info, how="outer") what.fillna('unknown', inplace=True)

which will result in:

      names position classification 0       bob      dev        unknown 1       bob      dev        unknown 2       bob      dev        unknown 3       bob      dev        unknown 4     frank      dev          thief 5     james      dev        unknown 6       tim      sys           good 7   ricardo      sys        unknown 8      mike      sys        unknown 9      mark      sup          thief 10     joan      sup        unknown 11      joe      sup          thief 12      joe      sup           good 13     bill  unknown          thief

answered Oct 17 '22 02:10

oxtay

I think you want to perform an outer merge:

In [60]:  pd.merge(names, info, how='outer') Out[60]:      names position classification 0      bob      dev            NaN 1    frank      dev          thief 2    james      dev            NaN 3      tim      sys           good 4  ricardo      sys            NaN 5     mike      sys            NaN 6     mark      sup          thief 7     joan      sup            NaN 8      joe      sup          thief

There is section showing the type of merges can perform: http://pandas.pydata.org/pandas-docs/stable/merging.html#database-style-dataframe-joining-merging

answered Oct 17 '22 02:10

EdChum

Related questions
                            
                                What is the naming convention for Python class references
                            
                                Is it safe to combine 'with' and 'yield' in python?
                            
                                Is there a way to remove unused imports for Python in VS Code?
                            
                                globals and locals in python exec()
                            
                                How to clear memory completely of all matplotlib plots
                            
                                Machine Learning Algorithm for Predicting Order of Events?
                            
                                Do unused imports in Python hamper performance?
                            
                                Convert numpy array type and values from Float64 to Float32
                            
                                Python 3.7 Docker images
                            
                                What is the purpose of numpy.where returning a tuple?
                            
                                Matplotlib text dimensions
                            
                                Decorator classes in Python
                            
                                Is there a standard Python data structure that keeps things in sorted order?
                            
                                Importing all functions from a package: "from .* import *"
                            
                                How to work with HEIC image file types in Python
                            
                                Parallel Processing in python
                            
                                Understanding Django-LDAP authentication
                            
                                How to unit test Google Cloud Endpoints
                            
                                Get html using Python requests?
                            
                                Concatenation using the + and += operators in Python [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

pandas merge dataframe with NaN (or "unknown") for missing values

Tags:

python

pandas

dataframe

Kevin Thompson

People also ask

2 Answers

oxtay

EdChum

Recent Activity

Donate For Us