Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas display extra unnamed columns for an excel file

Tags:

python

pandas

I'm working on a project using pandas library, in which I need to read an Excel file which has following columns:

'invoiceid', 'locationid', 'timestamp', 'customerid', 'discount', 'tax',
   'total', 'subtotal', 'productid', 'quantity', 'productprice',
   'productdiscount', 'invoice_products_id', 'producttax',
   'invoice_payments_id', 'paymentmethod', 'paymentdetails', 'amount'

But when I read this file by using the Python code below:

df_full = pd.read_excel('input/invoiced_products_noinvoiceids_inproduct_v2.0.xlsx', sheet_name=0,)
df_full.head()

it returns some rows along with 6 unnamed columns with values as NAN.I don't know why these columns are displaying here?

Below is the link to a sample file as requested:

https://mega.nz/#!0MlXCBYJ!Oim9RF56h6hUitTwqSG1354dIKLZEgIszzPrVpfHas8

Why are these extra columns appearing?

like image 542
Abdul Rehman Avatar asked Apr 04 '18 07:04

Abdul Rehman


People also ask

Why am I getting unnamed columns in pandas?

An unnamed column in pandas comes when you are reading the CSV file using it. Sometimes we require to drop columns in the dataset that we are not required. It not only saves memory but is also helpful in analyzing the data efficiently.

How do I skip a column in pandas?

We can exclude one column from the pandas dataframe by using the loc function. This function removes the column based on the location. Here we will be using the loc() function with the given data frame to exclude columns with name,city, and cost in python.

What is unnamed column in pandas?

An unnamed column in pandas comes when you are reading the CSV file using it. Sometimes we require to drop columns in the dataset that we are not required. It not only saves memory but is also helpful in analyzing the data efficiently. One approach is r emoving the NaN value or some other value.

How to view 5 columns in a pandas Dataframe?

To view 5 columns from the top and from the bottom of the data frame, we can run the command: This head () and tail () method also take arguments as numbers for the number of columns to show. If any column contains numerical data, we can sort that column using the sort_values () method in pandas as follows:

How do I include columns in pandas read_Excel?

Pandas Solutions The simplest solution for this data set is to use the header and usecols arguments to read_excel (). The usecols parameter, in particular, can be very useful for controlling the columns you would like to include. If you would like to follow along with these examples, the file is on github.

How do I remove an index from a Dataframe in pandas?

If the column is the index you have to first reset the index and then drop the column. Use the following line of code to remove the index from the dataframe. You can also first reset the index column and then use the drop () method on the column name you want to remove. Sometimes you want to access unnamed columns in pandas.


1 Answers

As discussed in comments the problem seems to be that, there is extra data after last named columns. That's why you are getting Unnamed columns.

If you wanna drop these columns this is how you can ignore these columns

df_full = df_full[df_full.filter(regex='^(?!Unnamed)').columns]
like image 105
rock321987 Avatar answered Oct 25 '22 16:10

rock321987