openpyxl read tables from existing data book example?

In the openpyxl documentation there is an example of how to place a table into a workbook but there are no examples of how to find back the tables of a workbook. I have an XLS file that has named tables in it and I want to open the file, find all of the tables and parse them. I cannot find any documentation on how to do this. Can anyone help?

In the meantime I worked it out and wrote the following class to work with openpyxl:

class NamedArray(object):

    ''' Excel Named range object

        Reproduces the named range feature of Microsoft Excel
        Assumes a definition in the form <Worksheet PinList!$A$6:$A$52 provided by openpyxl
        Written for use with, and initialised by the get_names function
        After initialisation named array can be used in the same way as for VBA in excel
        Written for openpyxl version 2.4.1, may not work with earlier versions 
    '''

    C_CAPS = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'   

    def __init__(self, wb, named_range_raw):
        ''' Initialise a NameArray object from the named_range_raw information in the given workbook

        '''
        self.sheet, cellrange_str = str(named_range_raw).split('!')
        self.sheet = self.sheet.replace("'",'') # remove the single quotes if they exist
        self.loc = wb[self.sheet]

        if ':' in cellrange_str:
            self.has_range = True
            self.has_value = False
            lo, hi = cellrange_str.split(':')
            self.ad_lo = lo.replace('$','')
            self.ad_hi = hi.replace('$','')
        else:
            self.has_range = False
            self.has_value = True
            self.ad_lo = cellrange_str.replace('$','')
            self.ad_hi = self.ad_lo

        self.row = self.get_row(self.ad_lo) 
        self.max_row = self.get_row(self.ad_hi)
        self.rows = self.max_row - self.row + 1
        self.min_col = self.col_to_n(self.ad_lo)
        self.max_col = self.col_to_n(self.ad_hi)
        self.cols    = self.max_col - self.min_col + 1


    def size_of(self):
        ''' Returns two dimensional size of named space
        '''
        return self.cols, self.rows 

    def value(self, row=1, col=1):
       ''' Returns the value at row, col
       '''
       assert row <= self.rows , 'invalid row number given'
       assert col <= self.cols , 'invalid column number given'
       return self.loc.cell(self.n_to_col(self.min_col + col-1)+str(self.row + row-1)).value    


    def __str__(self):
        ''' printed description of named space
        '''
        locs = 's ' + self.ad_lo + ':' + self.ad_hi if self.is_range else ' ' + self.ad_lo 
        return('named range'+ str(self.size_of()) + ' in sheet ' + self.sheet + ' @ location' + locs)  


    def __contains__(self, val):
        rval = False
        for row in range(1,self.rows+1):
            for col in range(1,self.cols+1):
                if self.value(row,col) == val:
                    rval = True
        return rval


    def vlookup(self, key, col):
        ''' excel style vlookup function
        '''
        assert col <= self.cols , 'invalid column number given'
        rval = None
        for row in range(1,self.rows+1):
            if self.value(row,1) == key:
                rval = self.value(row, col)
                break
        return rval


    def hlookup(self, key, row):
        ''' excel style hlookup function
        '''
        assert row <= self.rows , 'invalid row number given'
        rval = None
        for col in range(1,self.cols+1):
            if self.value(1,col) == key:
                rval = self.value(row, col)
                break
        return rval

    @classmethod
    def get_row(cls, ad):
        ''' get row number from cell string
        Cell string is assumed to be in excel format i.e "ABC123" where row is 123
        '''
        row = 0
        for l in ad:
            if l in "1234567890":
                row = row*10 + int(l)
        return row

    @classmethod
    def col_to_n(cls, ad):
        ''' find column number from xl address
            Cell string is assumed to be in excel format i.e "ABC123" where column is abc
            column number is integer represenation i.e.(A-A)*26*26 + (B-A)*26 + (C-A)
        '''
        n = 0
        for l in ad:
            if l in cls.C_CAPS:
                n = n*26 + cls.C_CAPS.find(l)+1
        return n

    @classmethod
    def n_to_col(cls, n):
        ''' make xl column address from column number
        '''
        ad = ''
        while n > 0:
            ad = cls.C_CAPS[n%26-1] + ad  
            n = n // 26
        return ad



def get_names(workbook, filt='', debug=False):
    ''' Create a structure containing all of the names in the given workbook

        filt is an optional parameter and used to create a subset of names starting with filt
        useful for IO_ring_spreadsheet as all names start with 'n_'
        if present, filt characters are stipped off the front of the name
    '''
    named_ranges = workbook.defined_names.definedName
    name_list = {}

    for named_range in named_ranges:
        name = named_range.name
        if named_range.attr_text.startswith('#REF'):
            print('WARNING: named range "', name, '" is undefined')
        elif filt == '' or name.startswith(filt):
            name_list[name[len(filt):]] = NamedArray(workbook, named_range.attr_text)

    if debug:
        with open("H:\\names.txt",'w') as log:
            for item in name_list:
                print (item, '=', name_list[item])
                log.write(item.ljust(30) + ' = ' + str(name_list[item])+'\n')

    return name_list

How do I read a spreadsheet in openpyxl?

Read Specific Cells You can access their values by using dictionary-like access: sheet["A2"]. value . Alternatively, you can assign sheet["A2"] to a variable and then do something like cell. value to get the cell's value.

Is openpyxl faster than Pandas?

Step 3: Load with Openpyxl The file is loaded to memory but data is loaded through a generator which allows mapped-retrieval of values. Still slow but a tiny drop faster than Pandas.

I agree that the documentation does not really help, and the public API also seems to have only add_table() method. But then I found an openpyxl Issue 844 asking for a better interface, and it shows that worksheet has an _tables property.

This is enough to get a list of all tables in a file, together with some basic properties:

from openpyxl import load_workbook
wb = load_workbook(filename = 'test.xlsx')
for ws in wb.worksheets:
    print("Worksheet %s include %d tables:" % (ws.title, len(ws._tables)))
    for tbl in ws._tables:
        print(" : " + tbl.displayName)
        print("   -  name = " + tbl.name)
        print("   -  type = " + (tbl.tableType if isinstance(tbl.tableType, str) else 'n/a')
        print("   - range = " + tbl.ref)
        print("   - #cols = %d" % len(tbl.tableColumns))
        for col in tbl.tableColumns:
            print("     : " + col.name)

Note that the if/else construct is required for the tableType, since it can return NoneType (for standard tables), which is not convertible to str.

openpyxl read tables from existing data book example?

Tags:

python

openpyxl

Stephen Ellwood

People also ask

1 Answers

Michal Kaut

Recent Activity

Donate For Us

openpyxl read tables from existing data book example?

Tags:

python

openpyxl

Stephen Ellwood

People also ask

1 Answers

Michal Kaut

Related questions

Recent Activity

Donate For Us