There are two DataFrames that I want to merge: <pre class="prettyprint"><code>DataFrame A columns: index, userid, locale (2000 rows) DataFrame B columns: index, userid, age (300 rows) </code></pre> When I perform the following: <pre class="prettyprint"><code>pd.merge(A, B, on='userid', how='outer') </code></pre> I got a DataFrame with the following columns: index, Unnamed:0, userid, locale, age The <code>index</code> column and the <code>Unnamed:0</code> column are identical. I guess the <code>Unnamed:0</code> column is the index column of DataFrame B. My question is: is there a way to avoid this <code>Unnamed</code> column when merging two DFs? I can drop the <code>Unnamed</code> column afterwards, but just wondering if there is a better way to do it.

In summary, what you're doing is saving the index to file and when you're reading back from the file, the column previously saved as <code>index</code> is loaded as a regular column. There are a few ways to deal with this: Method 1 When saving a <code>pandas.DataFrame</code> to disk, use <code>index=False</code> like this: <pre class="prettyprint"><code>df.to_csv(path, index=False) </code></pre> Method 2 When reading from file, you can define the column that is to be used as index, like this: <pre class="prettyprint"><code>df = pd.read_csv(path, index_col='index') </code></pre> Method 3 If method #2 does not suit you for some reason, you can always set the column to be used as index later on, like this: <pre class="prettyprint"><code>df.set_index('index', inplace=True) </code></pre> After this point, your datafame should look like this: <pre class="prettyprint"><code> userid locale age index 0 A1092 EN-US 31 1 B9032 SV-SE 23 </code></pre> I hope this helps.

Pandas merge how to avoid unnamed column

Tags:

python

pandas

There are two DataFrames that I want to merge:

Click to copy

DataFrame A columns: index, userid, locale  (2000 rows)  
DataFrame B columns: index, userid, age     (300 rows)

When I perform the following:

Click to copy

pd.merge(A, B, on='userid', how='outer')

I got a DataFrame with the following columns:

index, Unnamed:0, userid, locale, age

The index column and the Unnamed:0 column are identical. I guess the Unnamed:0 column is the index column of DataFrame B.

My question is: is there a way to avoid this Unnamed column when merging two DFs?

I can drop the Unnamed column afterwards, but just wondering if there is a better way to do it.

491

asked Dec 11 '16 15:12

Cheng

Video Answer

1 Answers

In summary, what you're doing is saving the index to file and when you're reading back from the file, the column previously saved as index is loaded as a regular column.

There are a few ways to deal with this:

Method 1

When saving a pandas.DataFrame to disk, use index=False like this:

Click to copy

df.to_csv(path, index=False)

Method 2

When reading from file, you can define the column that is to be used as index, like this:

Click to copy

df = pd.read_csv(path, index_col='index')

Method 3

If method #2 does not suit you for some reason, you can always set the column to be used as index later on, like this:

Click to copy

df.set_index('index', inplace=True)

After this point, your datafame should look like this:

Click to copy

        userid    locale    age
index
    0    A1092     EN-US     31
    1    B9032     SV-SE     23

I hope this helps.

answered Sep 18 '22 05:09

Thanos

Related questions
                            
                                How to append a tuple to a numpy array without it being preformed element-wise?
                            
                                Provide temporary PYTHONPATH on the commandline?
                            
                                AttributeError: probability estimates are not available for loss='hinge'
                            
                                PIP how escape character # in password?
                            
                                Using scipy.interpolate.interpn to interpolate a N-Dimensional array
                            
                                python pandas.Series.str.contains WHOLE WORD
                            
                                Image loses quality with cv2.warpPerspective
                            
                                Adding an extra hidden layer using Google's TensorFlow
                            
                                What does (n,) mean in the context of numpy and vectors?
                            
                                Redis locking for a KEY
                            
                                How to view the source code of numpy.random.exponential?
                            
                                How to get current user from a Django Channels web socket packet?
                            
                                how to save/crop detected faces in dlib python
                            
                                Mean of data scaled with sklearn StandardScaler is not zero
                            
                                How to calculate 95% confidence intervals using Bootstrap method
                            
                                Override Jinja block in included template from extending template
                            
                                Call method once to set multiple fields in Django Rest Framework serializer
                            
                                Reading from CSV: delimiter must be a string, not unicode
                            
                                How to install NumPy for Python 3.6
                            
                                Pycharm - how can I copy a file (tab) in pycharm (to a new tab)?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas merge how to avoid unnamed column

Tags:

python

pandas

Cheng

People also ask

Video Answer

1 Answers

Thanos

Recent Activity

Donate For Us