Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how can I quickly convert in python an xlsx file into a csv file?

I have a 140MB Excel file I need to analyze using pandas. The problem is that if I open this file as xlsx it takes python 5 minutes simply to read it. I tried to manually save this file as csv and then it takes Python about a second to open and read it! There are different 2012-2014 solutions that why Python 3 don't really work on my end.

Can somebody suggest how to convert very quickly file 'C:\master_file.xlsx' to 'C:\master_file.csv'?

like image 912
Andrea Avatar asked Dec 07 '17 21:12

Andrea


People also ask

How do I convert an Excel file to CSV in Python?

Create a variable to store the path of the input excel file. To create/load a workbook object, pass the input excel file to the openpyxl module's load_workbook() function (loads a workbook). Opening an output CSV file in write mode with open() and writer() functions to convert an input excel file into a CSV file.

How do I convert XLSX to multiple CSV?

(1) Keep selecting all sheets. If not, you can check the checkbox before Worksheet name to select all sheets; (2) Check the Specify save format option; (3) Click the box below Specify save format option, and select CSV (Macintosh)(*.


4 Answers

There is a project aiming to be very pythonic on dealing with data called "rows". It relies on "openpyxl" for xlsx, though. I don't know if this will be faster than Pandas, but anyway:

$ pip install rows openpyxl

And:

import rows
data = rows.import_from_xlsx("my_file.xlsx")
rows.export_to_csv(data, open("my_file.csv", "wb"))
like image 89
jsbueno Avatar answered Oct 12 '22 23:10

jsbueno


I faced the same problem as you. Pandas and openpyxl didn't work for me.

I came across with this solution and that worked great for me:

import win32com.client
xl=win32com.client.Dispatch("Excel.Application")
xl.DisplayAlerts = False
xl.Workbooks.Open(Filename=your_file_path,ReadOnly=1)
wb = xl.Workbooks(1)
wb.SaveAs(Filename='new_file.csv', FileFormat='6') #6 means csv
wb.Close(False)
xl.Application.Quit()
wb=None
xl=None

Here you convert the file to csv by means of Excel. All the other ways that I tried refuse to work.

like image 45
mlader Avatar answered Oct 12 '22 22:10

mlader


Use read-only mode in openpyxl. Something like the following should work.

import csv
import openpyxl

wb = load_workbook("myfile.xlsx", read_only=True)
ws = wb['sheetname']
with open("myfile.csv", "wb") as out:
    writer = csv.writer(out)
    for row in ws:
        values = (cell.value for cell in row)
        writer.writerow(values)
like image 41
Charlie Clark Avatar answered Oct 12 '22 23:10

Charlie Clark


Fastest way that pops to mind:

  1. pandas.read_excel
  2. pandas.DataFrame.to_csv

As an added benefit, you'll be able to do cleanup of the data before saving it to csv.

import pandas as pd
df = pd.read_excel('C:\master_file.xlsx', header=0) #, sheetname='<your sheet>'
df.to_csv('C:\master_file.csv', index=False, quotechar="'")

At some point, dealing with lots of data will take lots of time. Just a fact of life. Good to look for options if it's a problem, though.

like image 33
RagingRoosevelt Avatar answered Oct 13 '22 00:10

RagingRoosevelt