how can I quickly convert in python an xlsx file into a csv file?

Tags:

I have a 140MB Excel file I need to analyze using pandas. The problem is that if I open this file as xlsx it takes python 5 minutes simply to read it. I tried to manually save this file as csv and then it takes Python about a second to open and read it! There are different 2012-2014 solutions that why Python 3 don't really work on my end.

Can somebody suggest how to convert very quickly file 'C:\master_file.xlsx' to 'C:\master_file.csv'?

912

asked Dec 07 '17 21:12

Andrea

4 Answers

There is a project aiming to be very pythonic on dealing with data called "rows". It relies on "openpyxl" for xlsx, though. I don't know if this will be faster than Pandas, but anyway:

$ pip install rows openpyxl

And:

import rows
data = rows.import_from_xlsx("my_file.xlsx")
rows.export_to_csv(data, open("my_file.csv", "wb"))

answered Oct 12 '22 23:10

jsbueno

I faced the same problem as you. Pandas and openpyxl didn't work for me.

I came across with this solution and that worked great for me:

import win32com.client
xl=win32com.client.Dispatch("Excel.Application")
xl.DisplayAlerts = False
xl.Workbooks.Open(Filename=your_file_path,ReadOnly=1)
wb = xl.Workbooks(1)
wb.SaveAs(Filename='new_file.csv', FileFormat='6') #6 means csv
wb.Close(False)
xl.Application.Quit()
wb=None
xl=None

Here you convert the file to csv by means of Excel. All the other ways that I tried refuse to work.

answered Oct 12 '22 22:10

mlader

Use read-only mode in openpyxl. Something like the following should work.

import csv
import openpyxl

wb = load_workbook("myfile.xlsx", read_only=True)
ws = wb['sheetname']
with open("myfile.csv", "wb") as out:
    writer = csv.writer(out)
    for row in ws:
        values = (cell.value for cell in row)
        writer.writerow(values)

answered Oct 12 '22 23:10

Charlie Clark

Fastest way that pops to mind:

pandas.read_excel
pandas.DataFrame.to_csv

As an added benefit, you'll be able to do cleanup of the data before saving it to csv.

import pandas as pd
df = pd.read_excel('C:\master_file.xlsx', header=0) #, sheetname='<your sheet>'
df.to_csv('C:\master_file.csv', index=False, quotechar="'")

At some point, dealing with lots of data will take lots of time. Just a fact of life. Good to look for options if it's a problem, though.

answered Oct 13 '22 00:10

RagingRoosevelt

Related questions
                            
                                Split a string with "(" and ")" and keep the delimiters (Python) [duplicate]
                            
                                How to create strings from dataframe columns elements in Python?
                            
                                Python Pandas Dynamically Create a Dataframe
                            
                                Python retry using the tenacity module
                            
                                How to parse and evaluate a math expression with Pandas Dataframe columns?
                            
                                pandas chained_assignment warning exception handling
                            
                                Not able to install new wxpython
                            
                                Jupyter Notebook figure size settings
                            
                                Tornado: get request arguments
                            
                                Pytorch: how to convert data into tensor
                            
                                Object is not subscripable networkx
                            
                                How to sync only the changed files from the remote directory using pysftp?
                            
                                Error when install picamera on python 3.5.2 windows 10
                            
                                find pairs of rows in numpy array that differ only by sign
                            
                                Choose the number of decimal points in string interpolation
                            
                                How do I convert local .JPG file to Base64 to work with Boto3 and Detect_Text?
                            
                                Why is Twine 1.9.1 still uploading to legacy PyPi?
                            
                                Django unable to migrate PostgreSQL: constraint X of relation Y does not exist
                            
                                In redis, how do I delete one key and get its value at the same time
                            
                                Django How can i split string using template tag

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

how can I quickly convert in python an xlsx file into a csv file?

Tags:

python

pandas

xlrd

openpyxl

xlsxwriter