I have 2 dataframes, one of which has supplemental information for some (but not all) of the rows in the other.
names = df({'names':['bob','frank','james','tim','ricardo','mike','mark','joan','joe'], 'position':['dev','dev','dev','sys','sys','sys','sup','sup','sup']}) info = df({'names':['joe','mark','tim','frank'], 'classification':['thief','thief','good','thief']})
I would like to take the classification column from the info
dataframe above and add it to the names
dataframe above. However, when I do combined = pd.merge(names, info)
the resulting dataframe is only 4 rows long. All of the rows that do not have supplemental info are dropped.
Ideally, I would have the values in those missing columns set to unknown. Resulting in a dataframe where some people are theives, some are good, and the rest are unknown.
EDIT: One of the first answers I received suggested using merge outter which seems to do some weird things. Here is a code sample:
names = df({'names':['bob','frank','bob','bob','bob''james','tim','ricardo','mike','mark','joan','joe'], 'position':['dev','dev','dev','dev','dev','dev''sys','sys','sys','sup','sup','sup']}) info = df({'names':['joe','mark','tim','frank','joe','bill'], 'classification':['thief','thief','good','thief','good','thief']}) what = pd.merge(names, info, how="outer") what.fillna("unknown")
The strange thing is that in the output I'll get a row where the resulting name is "bobjames" and another where position is "devsys". Finally, even though bill does not appear in the names dataframe it shows up in the resulting dataframe. So I really need a way to say lookup a value in this other dataframe and if you find something tack on those columns.
Merge Python Pandas dataframe with a common column and set NaN for unmatched values. To merge two Pandas DataFrame with common column, use the merge() function and set the ON parameter as the column name. To set NaN for unmatched values, use the “how” parameter and set it left or right.
Checking for missing values using isnull() and notnull() In order to check missing values in Pandas DataFrame, we use a function isnull() and notnull(). Both function help in checking whether a value is NaN or not. These function can also be used in Pandas Series in order to find null values in a series.
merge() function to join the left dataframe with the unique column dataframe using 'inner' join. This will ensure that no columns are duplicated in the merged dataset.
In case you are still looking for an answer for this:
The "strange" things that you described are due to some minor errors in your code. For example, the first (appearance of "bobjames" and "devsys") is due to the fact that you don't have a comma between those two values in your source dataframes. And the second is because pandas doesn't care about the name of your dataframe but cares about the name of your columns when merging (you have a dataframe called "names" but also your columns are called "names"). Otherwise, it seems that the merge is doing exactly what you are looking for:
import pandas as pd names = pd.DataFrame({'names':['bob','frank','bob','bob','bob', 'james','tim','ricardo','mike','mark','joan','joe'], 'position':['dev','dev','dev','dev','dev','dev', 'sys','sys','sys','sup','sup','sup']}) info = pd.DataFrame({'names':['joe','mark','tim','frank','joe','bill'], 'classification':['thief','thief','good','thief','good','thief']}) what = pd.merge(names, info, how="outer") what.fillna('unknown', inplace=True)
which will result in:
names position classification 0 bob dev unknown 1 bob dev unknown 2 bob dev unknown 3 bob dev unknown 4 frank dev thief 5 james dev unknown 6 tim sys good 7 ricardo sys unknown 8 mike sys unknown 9 mark sup thief 10 joan sup unknown 11 joe sup thief 12 joe sup good 13 bill unknown thief
I think you want to perform an outer
merge
:
In [60]: pd.merge(names, info, how='outer') Out[60]: names position classification 0 bob dev NaN 1 frank dev thief 2 james dev NaN 3 tim sys good 4 ricardo sys NaN 5 mike sys NaN 6 mark sup thief 7 joan sup NaN 8 joe sup thief
There is section showing the type of merges can perform: http://pandas.pydata.org/pandas-docs/stable/merging.html#database-style-dataframe-joining-merging
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With