Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I find the formatting for a subset of text in an Excel document cell

Tags:

python

xlrd

Using Python, I need to find all substrings in a given Excel sheet cell that are either bold or italic.

My problem is similar to this:

Using XLRD module and Python to determine cell font style (italics or not)

..but the solution is not applicable for me as I cannot assume that the same formatting holds for all content in the cell. The value in a single cell can look like this:


1. Some bold text Some normal text. Some italic text.


Is there a way to find the formatting of a range of characters in a cell using xlrd (or any other Python Excel module)?

like image 747
westmark Avatar asked Sep 11 '12 14:09

westmark


People also ask

How do you view formatting in a cell?

Right-click and then select "Format Cells" from the popup menu. When the Format Cells window appears, select the Number tab. In the Category listbox, select your format. A sample of your text will appear on the right portion of the window based on the format that you've selected.

How do I find conditional formatting in a cell?

On the HOME tab, in the Styles group, click the arrow next to Conditional Formatting, and then click Manage Rules. The list of conditional formatting rules are displayed for the current selection including the rule type, the format, the range of cells the rule applies to, and the Stop If True setting.


1 Answers

Thanks to @Vyassa for all of the right pointers, I've been able to write the following code which iterates over the rows in a XLS file and outputs style information for cells with "single" style information (e.g., the whole cell is italic) or style "segments" (e.g., part of the cell is italic, part of it is not).

import xlrd

# accessing Column 'C' in this example
COL_IDX = 2

book = xlrd.open_workbook('your-file.xls', formatting_info=True)
first_sheet = book.sheet_by_index(0)

for row_idx in range(first_sheet.nrows):
  text_cell = first_sheet.cell_value(row_idx, COL_IDX)
  text_cell_xf = book.xf_list[first_sheet.cell_xf_index(row_idx, COL_IDX)]

  # skip rows where cell is empty
  if not text_cell:
    continue
  print text_cell,

  text_cell_runlist = first_sheet.rich_text_runlist_map.get((row_idx, COL_IDX))
  if text_cell_runlist:
    print '(cell multi style) SEGMENTS:'
    segments = []
    for segment_idx in range(len(text_cell_runlist)):
      start = text_cell_runlist[segment_idx][0]
      # the last segment starts at given 'start' and ends at the end of the string
      end = None
      if segment_idx != len(text_cell_runlist) - 1:
        end = text_cell_runlist[segment_idx + 1][0]
      segment_text = text_cell[start:end]
      segments.append({
        'text': segment_text,
        'font': book.font_list[text_cell_runlist[segment_idx][1]]
      })
    # segments did not start at beginning, assume cell starts with text styled as the cell
    if text_cell_runlist[0][0] != 0:
      segments.insert(0, {
        'text': text_cell[:text_cell_runlist[0][0]],
        'font': book.font_list[text_cell_xf.font_index]
      })

    for segment in segments:
      print segment['text'],
      print 'italic:', segment['font'].italic,
      print 'bold:', segment['font'].bold

  else:
    print '(cell single style)',
    print 'italic:', book.font_list[text_cell_xf.font_index].italic,
    print 'bold:', book.font_list[text_cell_xf.font_index].bold
like image 122
Greg Sadetsky Avatar answered Sep 21 '22 02:09

Greg Sadetsky