Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to detect "strikethrough" style from xlsx file in R

I have to check the data which contain "strikethrough" format when importing excel file in R

Do we have any method to detect them ? Welcome for both R and Python approach

like image 480
rane Avatar asked Oct 16 '25 20:10

rane


2 Answers

R-solution

the tidyxl-package can help you...

example temp.xlsx, with data on A1:A4 of the first sheet. Below is an excel-screenshot:

enter image description here

library(tidyxl)

formats <- xlsx_formats( "temp.xlsx" )
cells <- xlsx_cells( "temp.xlsx" )

strike <- which( formats$local$font$strike )
cells[ cells$local_format_id %in% strike, 2 ]

# A tibble: 2 x 1
#   address
#   <chr>  
# 1 A2     
# 2 A4   
like image 196
Wimpel Avatar answered Oct 18 '25 10:10

Wimpel


I present below a small sample program that filters out text with strikethrough applied, using the openpyxl package (I tested it on version 2.5.6 with Python 3.7.0). Sorry it took so long to get back to you.

import openpyxl as opx
from openpyxl.styles import Font


def ignore_strikethrough(cell):
    if cell.font.strike:
        return False
    else:
        return True


wb = opx.load_workbook('test.xlsx')
ws = wb.active
colA = ws['A']
fColA = filter(ignore_strikethrough, colA)
for i in fColA:
    print("Cell {0}{1} has value {2}".format(i.column, i.row, i.value))
    print(i.col_idx)

I tested it on a new workbook with the default worksheets, with the letters a,b,c,d,e in the first five rows of column A, where I had applied strikethrough formatting to b and d. This program filters out the cells in columnA which have had strikethrough applied to the font, and then prints the cell, row and values of the remaining ones. The col_idx property returns the 1-based numeric column value.

like image 29
Jarak Avatar answered Oct 18 '25 10:10

Jarak