Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read Excel Tables, not simple named ranges

Tags:

import

r

excel

To avoid "duplicate" close request: I know how to read Excel named ranges; examples are given in the code below. This is about "real" tables in Excel.

Excel2007 and later have the useful concept of tables: you can convert ranges to tables, and avoid hassles when sorting and rearranging. When you create a table in an Excel range, it gets a default name (Tabelle1 in German Version, TableName in the following example), but you can additionally simply name the range of the table (TableAsRangeName); as indicated by the icons in the Excel range name editor, these two seem to be treated differently.

I have not been able to read these tables (in the strict sense) from R. The only known workaround is using CSV intermediate, or converting the table to a normal named range, which has nasty irreversible side effects when you use column names in cell references; these are converted to A1 notation.

The example below shows the problem. You mileage may vary with different combinations of 32/64 bit ODBC drivers and 32/64 bit Java

# Read Excel Tables (not simply named ranges)
# Test Computer: 64 Bit Windows 7, R 32 bit  
# My ODBC drivers are 32 bit
library(RODBC)
# Test file has three ranges
# NonTable Simple named range
# TableName Name of table 
# TableAsRangeName Named Range covering the above table
sampleFile = "ExcelTables.xlsx"
if (!file.exists(sampleFile)){
  download.file("http://www.menne-biomed.de/uni/ExcelTables.xlsx",sampleFile)
  # Or do it manually, if this fails
}
# ODBC
channel = odbcConnectExcel2007(sampleFile)
sqlQuery(channel, "SELECT * from NonTable") # Ok
sqlQuery(channel, "SELECT * from TableName") # Could not find range
sqlQuery(channel, "SELECT * from TableAsRangeName") # Could not find range
close(channel)

# gdata has read.xls, but seems not to support named regions

library(xlsx)
wb = loadWorkbook(sampleFile)
getRanges(wb) # This one fails already with "TableName" does not exist
ws = getSheets(wb)[[1]]
readRange("NonTable",ws) # Invalid range address
readRange("TableName",ws) # Invalid range address
readRange("TableAsRangeName",ws) # Invalid range address

# my machine requires 64 bit for this one; depends on your Java installation
sampleFile = "ExcelTables.xlsx"
library(XLConnect) # requires Java
readNamedRegionFromFile(sampleFile,"NonTable") # OK
readNamedRegionFromFile(sampleFile,"TableName") # "TableName" does not exist
readNamedRegionFromFile(sampleFile,"TableAsRangeName") # NullPointerException

wb <- loadWorkbook(sampleFile)
readNamedRegion(wb,"NonTable") # Ok
readNamedRegion(wb,"TableName") # does not exist
readNamedRegion(wb,"TableAsRangeName") # Null Pointer
like image 529
Dieter Menne Avatar asked Jul 16 '13 08:07

Dieter Menne


People also ask

What is the difference between named range and table in Excel?

A table is a defined grid of cells for data and formulas that automatically expands as you add to it and come with the capacity to sort and filter. Creating a table in Excel, Named ranges are automatically added to your worksheet. A named range is just one or more cells to which you a named is assigned.

What is a dynamic named range in Excel?

A dynamic named range, also called simply a "dynamic range" is a specially constructed range that expands automatically to accommodate new data. In the example shown, we have a small set of data in B5:B13, and two formulas calculating a minimum and maximum value like this: =MAX(data) =MIN(data)

What is the disadvantage of using named ranges?

However there are number of difficulties with range names which mean they can make a model harder to read. 3. If there are a lot of names used, the modeller has to be creative to find distinctive names. In most cases this means very long names with lots of abbreviations that are meaningful only to the model builder.


1 Answers

I've added some initial support for Excel tables in XLConnect. Please find the latest changes on github at https://github.com/miraisolutions/xlconnect

In the following a small sample:

require(XLConnect)
sampleFile = "ExcelTables.xlsx"
wb = loadWorkbook(sampleFile)
readTable(wb, sheet = "ExcelTable", table = "TableName")

Note that Excel tables are associated to a sheet. So as far as I can see it's possible to have multiple tables with the same name associated to different sheets. For this reason there is a sheet-argument to readTable.

like image 182
Martin Studer Avatar answered Oct 08 '22 13:10

Martin Studer