Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Openpyxl Unicode Values

I am using openpyxl to read in cell values from a Excel Spreadsheet. One of the cells has values that are separated by a newline. I want to split the string using the newline character as the delimiter. However it seems that openpyxl is serializing the carriage return into a non-standard format. Look at the example below.

Code

import openpyxl

# Open the worksheet
wb = openpyxl.load_workbook(wb_path)
ws = wb.get_sheet_by_name("testing")

# Get the string value
tests_str = ws.cell(row = row, column = column).value

# Split text on newlines and add them to the list
tests = []
for test in tests_str.splitlines():
    tests.append(test)

Output

>>> tests_str
u'Test1_x000D_\nTest2_x000D_\nTest3_x000D_'
>>> tests
[u'Test1_x000D_', u'Test2_x000D_', u'Test3_x000D_']

openpyxl seems to be serializing the \r character into _x000D_ which is why splitlines() is not removing it as a newline character. Is there a reason why openpyxl behaves like this? Am I doing something wrong?

like image 406
Stefan Bossbaly Avatar asked Apr 30 '15 19:04

Stefan Bossbaly


People also ask

How do I create a value in Excel using Openpyxl?

Openpyxl write to a cell There are two basic ways to write to a cell: using a key of a worksheet such as A1 or D3, or using a row and column notation with the cell method. In the example, we write two values to two cells. Here, we assing a numerical value to the A1 cell.

Does Openpyxl work with Excel 2016?

Change the directory to the folder where your Python is installed, then execute the command pip list. This will give all the packages installed along with their version. Q #6) Does Openpyxl work with Excel 2016? Answer: Yes, it supports xlsx, xlsm, xltx, and xltm file formats.

Why is Openpyxl used?

Openpyxl is a Python library that is used to read from an Excel file or write to an Excel file. Data scientists use Openpyxl for data analysis, data copying, data mining, drawing charts, styling sheets, adding formulas, and more. Workbook: A spreadsheet is represented as a workbook in openpyxl.


2 Answers

As stated in some support issue from 2015 (see Google cache entry to avoid login), which has been posted in the official Bitbucket project of openpyxl, this is done by Excel and seems to be out of control for openpyxl.

To resolve this, there are some utilty functions for encoding/decoding.

>> openpyxl.utils.escape.unescape(tests_str))
u'Test1\r\nTest2\r\nTest3\r'

Link to documentation: https://openpyxl.readthedocs.io/en/stable/api/openpyxl.utils.escape.html

like image 109
Tim Wißmann Avatar answered Oct 22 '22 11:10

Tim Wißmann


It looks like either openpyxl or Excel is encoding carriage returns (\r, ASCII 0Dh) in that manner. You can convert them back or split on them as well:

>>> s=u'Test1_x000D_\nTest2_x000D_\nTest3_x000D_'
>>> s.split('_x000D_\n')
[u'Test1', u'Test2', u'Test3_x000D_']     # This misses the final one.
>>> s.replace('_x000D_','').splitlines()  # Better...
[u'Test1', u'Test2', u'Test3']
like image 4
Mark Tolonen Avatar answered Oct 22 '22 13:10

Mark Tolonen