I am given the an excel file which contains some text formatting. Some can be bold, some italic, some are supercase1, and some other formats (but not as many as the three mentioned). Examples: <ul> <li>Ku'lah 2ku.lah v; definition: some def; usage: some usage; </li> <li>He'lahsa 2he.lah.sa n; definition: some def; usage: some usage;</li> <li>And so on</li> </ul> Now, since this cell is to be made as dictionary (real, human, dictionary) database entry, I would like to retain the format of the cell, as it will be beneficial to tell the usage of the word (such as bold in the above case indicating the word type: v (verb) and italic indicating new section). But it is all in the excel cell. When I try to simply read the excel file directly using database tool like Toad for Oracle, the format is gone! <ol> <li>Is there any way to read the excel file and yet retain the format? </li> <li>Alternatively, is there any way to detect the formatting? As long as we can detect the format, I can simply replace the text with some HTML format like <code>v</code> and that will be my work. I only want to know how we retain or detect the excel cell text format in Python. (in particular are these three formats: bold, italic, and supercase)</li> </ol> Edit: I try to get the text format with xlrd package, but I can't seem to find a way to get the text format style as the <code>cell</code> object only consists of: <code>ctype</code>, <code>value</code>, and <code>xf_index</code>. It has no info about the text format, and when I create the instance with the <code>formatting_info=True</code>: <pre class="prettyprint"><code>book = xlrd.open_workbook("HuluHalaDict.xlsx", sys.stdout, 0, xlrd.USE_MMAP, None, None, \ formatting_info=True, on_demand=False, ragged_rows=False) </code></pre> I got the following error: <blockquote> NotImplementedError: formatting_info=True not yet implemented </blockquote> Raised by this line in the <code>xlsx.py</code> file of the <code>xlrd</code> package: <pre class="prettyprint"><code>if formatting_info: raise NotImplementedError("formatting_info=True not yet implemented") </code></pre> Which I found it strange, since I use version 0.9.4 xlrd (latest) and the documentation says that since version above 0.6.1, the formatting info is included: <blockquote> Default Formatting Default formatting is applied to all empty cells (those not described by a cell record). Firstly row default information (ROW record, Rowinfo class) is used if available. Failing that, column default information (COLINFO record, Colinfo class) is used if available. As a last resort the worksheet/workbook default cell format will be used; this should always be present in an Excel file, described by the XF record with the fixed index 15 (0-based). By default, it uses the worksheet/workbook default cell style, described by the very first XF record (index 0). Formatting features not included in xlrd version 0.6.1 Rich text i.e. strings containing partial bold italic and underlined text, change of font inside a string, etc. See OOo docs s3.4 and s3.2 Asian phonetic text (known as "ruby"), used for Japanese furigana. See OOo docs s3.4.2 (p15) Conditional formatting. See OOo docs s5.12, s6.21 (CONDFMT record), s6.16 (CF record) Miscellaneous sheet-level and book-level items e.g. printing layout, screen panes. Modern Excel file versions don't keep most of the built-in "number formats" in the file; Excel loads formats according to the user's locale. Currently xlrd's emulation of this is limited to a hard-wired table that applies to the US English locale. This may mean that currency symbols, date order, thousands separator, decimals separator, etc are inappropriate. Note that this does not affect users who are copying XLS files, only those who are visually rendering cells. </blockquote> Did I make any mistake here? My code is simply as shown: <pre class="prettyprint"><code>book = xlrd.open_workbook("HuluHalaDict.xlsx", sys.stdout, 0, xlrd.USE_MMAP, None, None, \ formatting_info=True, on_demand=False, ragged_rows=False) </code></pre> <hr> Edit 2: The example shown in the post shows that it creates the class instance (<code>book</code>) with <code>formatting_info=True</code>. But I check it in my implementation. It raises the error above. Any idea?

I suggest you the library xlrd https://secure.simplistix.co.uk/svn/xlrd/trunk/xlrd/doc/xlrd.html?p=4966 On GitHub here https://github.com/python-excel/xlrd You can find an easy example on how to use xlrd to determine the font style here Using XLRD module and Python to determine cell font style (italics or not) Here a practical example: <pre class="prettyprint"><code>from xlrd import open_workbook path = '/Users/.../Desktop/Workbook1.xls' wb = open_workbook(path, formatting_info=True) sheet = wb.sheet_by_name("Sheet1") cell = sheet.cell(0, 0) # The first cell print("cell.xf_index is", cell.xf_index) fmt = wb.xf_list[cell.xf_index] print("type(fmt) is", type(fmt)) print("Dumped Info:") fmt.dump() </code></pre> It outputs the following: <pre class="prettyprint"><code>cell.xf_index is 62 type(fmt) is <class 'xlrd.formatting.XF'> Dumped Info: _alignment_flag: 0 _background_flag: 0 _border_flag: 0 _font_flag: 1 _format_flag: 0 _protection_flag: 0 alignment (XFAlignment object): hor_align: 0 indent_level: 0 rotation: 0 shrink_to_fit: 0 text_direction: 0 text_wrapped: 0 vert_align: 2 background (XFBackground object): background_colour_index: 65 fill_pattern: 0 pattern_colour_index: 64 border (XFBorder object): bottom_colour_index: 0 bottom_line_style: 0 diag_colour_index: 0 diag_down: 0 diag_line_style: 0 diag_up: 0 left_colour_index: 0 left_line_style: 0 right_colour_index: 0 right_line_style: 0 top_colour_index: 0 top_line_style: 0 font_index: 6 format_key: 0 is_style: 0 lotus_123_prefix: 0 parent_style_index: 0 protection (XFProtection object): cell_locked: 1 formula_hidden: 0 xf_index: 62 </code></pre> Where <code>_font_flag: 1</code> indicates that is Bold

How to read excel cell and retain or detect its format in Python

Tags:

python

excel

text-formatting

I am given the an excel file which contains some text formatting. Some can be bold, some italic, some are supercase¹, and some other formats (but not as many as the three mentioned).

Examples:

Ku'lah ²ku.lah v; definition: some def; usage: some usage;
He'lahsa ²he.lah.sa n; definition: some def; usage: some usage;
And so on

Now, since this cell is to be made as dictionary (real, human, dictionary) database entry, I would like to retain the format of the cell, as it will be beneficial to tell the usage of the word (such as bold in the above case indicating the word type: v (verb) and italic indicating new section).

But it is all in the excel cell.

When I try to simply read the excel file directly using database tool like Toad for Oracle, the format is gone!

Is there any way to read the excel file and yet retain the format?
Alternatively, is there any way to detect the formatting? As long as we can detect the format, I can simply replace the text with some HTML format like v and that will be my work. I only want to know how we retain or detect the excel cell text format in Python. (in particular are these three formats: bold, italic, and supercase)

Edit:

I try to get the text format with xlrd package, but I can't seem to find a way to get the text format style as the cell object only consists of: ctype, value, and xf_index. It has no info about the text format, and when I create the instance with the formatting_info=True:

book = xlrd.open_workbook("HuluHalaDict.xlsx", sys.stdout, 0, xlrd.USE_MMAP, None, None, \
                          formatting_info=True, on_demand=False, ragged_rows=False)

I got the following error:

NotImplementedError: formatting_info=True not yet implemented

Raised by this line in the xlsx.py file of the xlrd package:

if formatting_info:
    raise NotImplementedError("formatting_info=True not yet implemented")

Which I found it strange, since I use version 0.9.4 xlrd (latest) and the documentation says that since version above 0.6.1, the formatting info is included:

Default Formatting

Default formatting is applied to all empty cells (those not described by a cell record). Firstly row default information (ROW record, Rowinfo class) is used if available. Failing that, column default information (COLINFO record, Colinfo class) is used if available. As a last resort the worksheet/workbook default cell format will be used; this should always be present in an Excel file, described by the XF record with the fixed index 15 (0-based). By default, it uses the worksheet/workbook default cell style, described by the very first XF record (index 0). Formatting features not included in xlrd version 0.6.1

Rich text i.e. strings containing partial bold italic and underlined text, change of font inside a string, etc. See OOo docs s3.4 and s3.2 Asian phonetic text (known as "ruby"), used for Japanese furigana. See OOo docs s3.4.2 (p15) Conditional formatting. See OOo docs s5.12, s6.21 (CONDFMT record), s6.16 (CF record) Miscellaneous sheet-level and book-level items e.g. printing layout, screen panes. Modern Excel file versions don't keep most of the built-in "number formats" in the file; Excel loads formats according to the user's locale. Currently xlrd's emulation of this is limited to a hard-wired table that applies to the US English locale. This may mean that currency symbols, date order, thousands separator, decimals separator, etc are inappropriate. Note that this does not affect users who are copying XLS files, only those who are visually rendering cells.

Did I make any mistake here? My code is simply as shown:

book = xlrd.open_workbook("HuluHalaDict.xlsx", sys.stdout, 0, xlrd.USE_MMAP, None, None, \
                          formatting_info=True, on_demand=False, ragged_rows=False)

Edit 2:

The example shown in the post shows that it creates the class instance (book) with formatting_info=True. But I check it in my implementation. It raises the error above. Any idea?

922

asked Apr 20 '16 11:04

Ian

1 Answers

I suggest you the library xlrd https://secure.simplistix.co.uk/svn/xlrd/trunk/xlrd/doc/xlrd.html?p=4966

On GitHub here https://github.com/python-excel/xlrd

You can find an easy example on how to use xlrd to determine the font style here Using XLRD module and Python to determine cell font style (italics or not)

Here a practical example:

from xlrd import open_workbook

path = '/Users/.../Desktop/Workbook1.xls'
wb = open_workbook(path, formatting_info=True)
sheet = wb.sheet_by_name("Sheet1")
cell = sheet.cell(0, 0) # The first cell
print("cell.xf_index is", cell.xf_index)
fmt = wb.xf_list[cell.xf_index]
print("type(fmt) is", type(fmt))
print("Dumped Info:")
fmt.dump()

It outputs the following:

cell.xf_index is 62
type(fmt) is <class 'xlrd.formatting.XF'>
Dumped Info:
_alignment_flag: 0
_background_flag: 0
_border_flag: 0
_font_flag: 1
_format_flag: 0
_protection_flag: 0
alignment (XFAlignment object):
    hor_align: 0
    indent_level: 0
    rotation: 0
    shrink_to_fit: 0
    text_direction: 0
    text_wrapped: 0
    vert_align: 2
background (XFBackground object):
    background_colour_index: 65
    fill_pattern: 0
    pattern_colour_index: 64
border (XFBorder object):
    bottom_colour_index: 0
    bottom_line_style: 0
    diag_colour_index: 0
    diag_down: 0
    diag_line_style: 0
    diag_up: 0
    left_colour_index: 0
    left_line_style: 0
    right_colour_index: 0
    right_line_style: 0
    top_colour_index: 0
    top_line_style: 0
font_index: 6
format_key: 0
is_style: 0
lotus_123_prefix: 0
parent_style_index: 0
protection (XFProtection object):
    cell_locked: 1
    formula_hidden: 0
xf_index: 62

Where _font_flag: 1 indicates that is Bold

146

answered Sep 19 '22 01:09

alec_djinn

Related questions
                            
                                column "Column" must appear in the GROUP BY clause -- SQLAlchemy
                            
                                Pandas date_range starting from the end date to start date
                            
                                How to parse JSON-XML hybrid file in Python
                            
                                Use the result from Cross tab (spark dataframe) for chi-square test in SparkMlib
                            
                                sum of multiplication of cells in the same row but different column for pandas data frame
                            
                                Salt: manage 100+ virtualenvs on one host
                            
                                numpy: unexpected result when dividing a vertical array by one of its own elements
                            
                                Selecting data from Pandas dataframe based on criteria stored in a dict
                            
                                SqlAlchemy(Flask+Postgres) : How to update only a specific attribute of a json field?
                            
                                How to include post install script in python setuptools
                            
                                Render_to_string and response.content.decode() not matching
                            
                                Getting a slice of a numpy ndarray (for arbitary dimensions)
                            
                                Is it possible to select pandas dataframe with row indices and column names?
                            
                                How to use sum and order by in SQLAlchemy query
                            
                                Unable to import grequests for AWS Lambda
                            
                                How to recognize windows 10 using Python? [closed]
                            
                                Is this Python "static variable" hack ok to use? [closed]
                            
                                Go c-shared library callback into other languages
                            
                                Convert fraction to string with repeating decimal places in brackets
                            
                                Django GenericRelation still does not enable reverse querying from GenericForeignKey

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With