I am using openpyxl
to read in cell values from a Excel Spreadsheet. One of the cells has values that are separated by a newline. I want to split the string using the newline character as the delimiter. However it seems that openpyxl
is serializing the carriage return into a non-standard format. Look at the example below.
Code
import openpyxl
# Open the worksheet
wb = openpyxl.load_workbook(wb_path)
ws = wb.get_sheet_by_name("testing")
# Get the string value
tests_str = ws.cell(row = row, column = column).value
# Split text on newlines and add them to the list
tests = []
for test in tests_str.splitlines():
tests.append(test)
Output
>>> tests_str
u'Test1_x000D_\nTest2_x000D_\nTest3_x000D_'
>>> tests
[u'Test1_x000D_', u'Test2_x000D_', u'Test3_x000D_']
openpyxl
seems to be serializing the \r
character into _x000D_
which is why splitlines()
is not removing it as a newline character. Is there a reason why openpyxl
behaves like this? Am I doing something wrong?
Openpyxl write to a cell There are two basic ways to write to a cell: using a key of a worksheet such as A1 or D3, or using a row and column notation with the cell method. In the example, we write two values to two cells. Here, we assing a numerical value to the A1 cell.
Change the directory to the folder where your Python is installed, then execute the command pip list. This will give all the packages installed along with their version. Q #6) Does Openpyxl work with Excel 2016? Answer: Yes, it supports xlsx, xlsm, xltx, and xltm file formats.
Openpyxl is a Python library that is used to read from an Excel file or write to an Excel file. Data scientists use Openpyxl for data analysis, data copying, data mining, drawing charts, styling sheets, adding formulas, and more. Workbook: A spreadsheet is represented as a workbook in openpyxl.
As stated in some support issue from 2015 (see Google cache entry to avoid login), which has been posted in the official Bitbucket project of openpyxl, this is done by Excel and seems to be out of control for openpyxl.
To resolve this, there are some utilty functions for encoding/decoding.
>> openpyxl.utils.escape.unescape(tests_str))
u'Test1\r\nTest2\r\nTest3\r'
Link to documentation: https://openpyxl.readthedocs.io/en/stable/api/openpyxl.utils.escape.html
It looks like either openpyxl or Excel is encoding carriage returns (\r
, ASCII 0Dh) in that manner. You can convert them back or split on them as well:
>>> s=u'Test1_x000D_\nTest2_x000D_\nTest3_x000D_'
>>> s.split('_x000D_\n')
[u'Test1', u'Test2', u'Test3_x000D_'] # This misses the final one.
>>> s.replace('_x000D_','').splitlines() # Better...
[u'Test1', u'Test2', u'Test3']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With