Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to save in *.xlsx long URL in cell using Pandas

For example I read excel file into DataFrame with 2 columns(id and URL). URLs in input file are like text(without hyperlinks):

input_f = pd.read_excel("input.xlsx")

Watch what inside this DataFrame - everything was successfully read, all URLs are ok in input_f. After that when I wan't to save this file to_excel

input_f.to_excel("output.xlsx", index=False)

I got warning.

Path\worksheet.py:836: UserWarning: Ignoring URL 'http:// here long URL' with link or location/anchor > 255 characters since it exceeds Excel's limit for URLS force_unicode(url))

And in output.xlsx cells with long URL were empty, and URLs become hyperlinks.

How to fix this?

like image 698
chinskiy Avatar asked Feb 16 '16 18:02

chinskiy


People also ask

Does pandas work with Xlsx?

Read an Excel file into a pandas DataFrame. Supports xls , xlsx , xlsm , xlsb , odf , ods and odt file extensions read from a local filesystem or URL. Supports an option to read a single sheet or a list of sheets. Any valid string path is acceptable.

What is read_excel in pandas?

pandas. read_excel() function is used to read excel sheet with extension xlsx into pandas DataFrame. By reading a single sheet it returns a pandas DataFrame object, but reading two sheets it returns a Dict of DataFrame.

How do I save an Excel file into a DataFrame?

Create an Excel Writer with the name of the desired output excel file. Call to_excel() function on the DataFrame with the writer and the name of the Excel Sheet passed as arguments. Save the Excel file using save() method of Excel Writer.


3 Answers

You can create an ExcelWriter object with the option not to convert strings to urls:

writer = pandas.ExcelWriter(r'file.xlsx', engine='xlsxwriter',options={'strings_to_urls': False})
df.to_excel(writer)
writer.close()
like image 110
Ophir Yoktan Avatar answered Oct 08 '22 21:10

Ophir Yoktan


I tried it myself and got the same problem. You could try to create a temp csv file and then use xlsxwriter to create an excel file. Once done then delete the tmp file. xlsxwriter has a write_string method that will override the auto hyperlinking that excel does. This worked for me.

import pandas as pd
import csv
import os
from xlsxwriter.workbook import Workbook
inData = "C:/Users/martbar/Desktop/test.xlsx"
tmp = "C:/Users/martbar/Desktop/tmp.csv"
exFile = "C:/Users/martbar/Desktop/output.xlsx"

#read in data
df = pd.read_excel(inData)

#send to csv
df.to_csv(tmp, index=False)

#convert to excel
workbook = Workbook(exFile)
worksheet = workbook.add_worksheet()
with open(tmp, 'r') as f:
    reader = csv.reader(f)
    for r, row in enumerate(reader):
        for c, col in enumerate(row):
            #if you use write instead of write_string you will get the error
            worksheet.write_string(r, c, col) 
workbook.close()

#delete tmp file
os.remove(tmp)
like image 37
bvmcode Avatar answered Oct 08 '22 20:10

bvmcode


From the docs in the section: "Passing XlsxWriter constructor options to Pandas", 'strings_to_urls': False is now specified like this:

writer = pd.ExcelWriter('pandas_example.xlsx',
                        engine='xlsxwriter',
                        engine_kwargs={'options': {'strings_to_urls': False}})

and then keep doing what the accepted response suggests here:

df.to_excel(writer)
writer.close()
like image 20
gdiz Avatar answered Oct 08 '22 20:10

gdiz