There are two DataFrames that I want to merge:
DataFrame A columns: index, userid, locale (2000 rows)
DataFrame B columns: index, userid, age (300 rows)
When I perform the following:
pd.merge(A, B, on='userid', how='outer')
I got a DataFrame with the following columns:
index, Unnamed:0, userid, locale, age
The index
column and the Unnamed:0
column are identical. I guess the Unnamed:0
column is the index column of DataFrame B.
My question is: is there a way to avoid this Unnamed
column when merging two DFs?
I can drop the Unnamed
column afterwards, but just wondering if there is a better way to do it.
There are situations when an Unnamed: 0 column in pandas comes when you are reading CSV file . The simplest solution would be to read the "Unnamed: 0" column as the index. So, what you have to do is to specify an index_col=[0] argument to read_csv() function, then it reads in the first column as the index.
We can exclude one column from the pandas dataframe by using the loc function. This function removes the column based on the location. Here we will be using the loc() function with the given data frame to exclude columns with name,city, and cost in python.
if index_col is 0, in this case it means that "1" will be the index of the first column, "2" will be the index for the second column and so on.
index_col: This is to allow you to set which columns to be used as the index of the dataframe. The default value is None, and pandas will add a new column start from 0 to specify the index column. It can be set as a column name or column index, which will be used as the index column.
In summary, what you're doing is saving the index to file and when you're reading back from the file, the column previously saved as index
is loaded as a regular column.
There are a few ways to deal with this:
Method 1
When saving a pandas.DataFrame
to disk, use index=False
like this:
df.to_csv(path, index=False)
Method 2
When reading from file, you can define the column that is to be used as index, like this:
df = pd.read_csv(path, index_col='index')
Method 3
If method #2 does not suit you for some reason, you can always set the column to be used as index later on, like this:
df.set_index('index', inplace=True)
After this point, your datafame should look like this:
userid locale age
index
0 A1092 EN-US 31
1 B9032 SV-SE 23
I hope this helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With