Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas read_excel doesn't read all rows

Tags:

python

pandas

I have a problem with "pandas read_excel", thats my code:

import pandas as pd

df = pd.read_excel('myExcelfile.xlsx', 'Table1', engine='openpyxl', header=1)
print(df.__len__())

If I run this code in Pycharm on Windows PC I got the right length of the dataframe, which is 28757 but if I run this code on my linux server I got only 26645 as output.

Any ideas whats the reason for that?

Thanks

like image 956
Reinhard Halusa Avatar asked Sep 01 '25 16:09

Reinhard Halusa


2 Answers

Try this way:

import pandas as pd

data= pd.read_excel('Advertising.xlsx')

data.head()
like image 79
Ankit Rai Avatar answered Sep 04 '25 05:09

Ankit Rai


I got the solution. The problem was an empty first row in my .xlsx File.

My file is automatically created by another program, so I used openpyxl to delete the first row and make a new .xlsx File.

import openpyxl

path = 'myExcelFile.xlsx'
book = openpyxl.load_workbook(path)
sheet = book['Tabelle1']
#start at row 0, length 1 row:
sheet.delete_rows(0,1)
#save in new file:
book.save('myExcelFile_new.xlsx')

Attention, in this code sample I don`t check if the first row is empty! So I delete the first line no matter if there is content in it or not.

like image 40
Reinhard Halusa Avatar answered Sep 04 '25 06:09

Reinhard Halusa