Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas merge how to avoid unnamed column

Tags:

python

pandas

There are two DataFrames that I want to merge:

DataFrame A columns: index, userid, locale  (2000 rows)  
DataFrame B columns: index, userid, age     (300 rows)

When I perform the following:

pd.merge(A, B, on='userid', how='outer')

I got a DataFrame with the following columns:

index, Unnamed:0, userid, locale, age

The index column and the Unnamed:0 column are identical. I guess the Unnamed:0 column is the index column of DataFrame B.

My question is: is there a way to avoid this Unnamed column when merging two DFs?

I can drop the Unnamed column afterwards, but just wondering if there is a better way to do it.

like image 491
Cheng Avatar asked Dec 11 '16 15:12

Cheng


People also ask

How do I stop pandas unnamed columns?

There are situations when an Unnamed: 0 column in pandas comes when you are reading CSV file . The simplest solution would be to read the "Unnamed: 0" column as the index. So, what you have to do is to specify an index_col=[0] argument to read_csv() function, then it reads in the first column as the index.

How do I skip column names in pandas?

We can exclude one column from the pandas dataframe by using the loc function. This function removes the column based on the location. Here we will be using the loc() function with the given data frame to exclude columns with name,city, and cost in python.

What does Index_col 0 mean?

if index_col is 0, in this case it means that "1" will be the index of the first column, "2" will be the index for the second column and so on.

What does Index_col do in pandas?

index_col: This is to allow you to set which columns to be used as the index of the dataframe. The default value is None, and pandas will add a new column start from 0 to specify the index column. It can be set as a column name or column index, which will be used as the index column.


Video Answer


1 Answers

In summary, what you're doing is saving the index to file and when you're reading back from the file, the column previously saved as index is loaded as a regular column.

There are a few ways to deal with this:

Method 1

When saving a pandas.DataFrame to disk, use index=False like this:

df.to_csv(path, index=False)

Method 2

When reading from file, you can define the column that is to be used as index, like this:

df = pd.read_csv(path, index_col='index')

Method 3

If method #2 does not suit you for some reason, you can always set the column to be used as index later on, like this:

df.set_index('index', inplace=True)

After this point, your datafame should look like this:

        userid    locale    age
index
    0    A1092     EN-US     31
    1    B9032     SV-SE     23

I hope this helps.

like image 64
Thanos Avatar answered Sep 18 '22 05:09

Thanos