Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast way to get Excel range as array of texts or cell formats in C#?

Array operations are way faster than range operations in VSTO, so currently I'm using

object[,] RangeValues = wsh.get_Range("A1:" + lastCell.Address).Value2;

with pretty good effect. Sadly, I have some inconsistent data. Sometimes there is 0.45, and sometimes 0.45%, and of course later i see it as 0.0045 in code. Sadly, from "business" point of view, both values means 0.45. I cannot force consistency, files come from various sources I don't have any authority over. It's something I need to deal with.

The way, of course, would be to look at format, or at display text, and see if there is a % sign in it. If there is, I just need to multiply value by 100. Sadly, if I try:

object[,] RangeValues = wsh.get_Range("A1:" + lastCell.Address).Text;

I get message that cannot convert DBNull to object[,]. So is there any way that would allow me to load texts or formats all at once, without going thorough code <-> worksheet border at each step of mu loop?

like image 558
Mołot Avatar asked Aug 17 '15 10:08

Mołot


2 Answers

Detecting Excel Cell Formats

To find the format of cells use Excel's Cell("format",A1) function rather that interrogating the datatypes which would be much slower, harder and prone to problems, eg: 0.45% != 45%.

enter image description here

private void button1_Click(object sender, EventArgs e) 
{
    // evaluate the Format of Cells A1 thru to A7
    using (var rnEvaluate = xlApp.Range["C1:C1"].WithComCleanup())
    {
        for (int i = 1; i < 8; i++)
        {
            rnEvaluate.Resource.Value2 = "=CELL(\"format\",A" + i.ToString() + ")";
            string cellFormat = GetExcelCellFormat(rnEvaluate.Resource.Value2);
            System.Diagnostics.Debug.Write(cellFormat);
        }
    } 
}

private string GetExcelCellFormat(string cellFormat = "G") 
{
    switch (cellFormat.Substring(0, 1))
    {
        case "F" :
            return "Number";
            break;
        case "P" :
            return "Percentage";
            break;
        case "C":
            return "Currency";
            break;
        case "D":
            return "Date";
            break;
        default :
            return "General";
            break;
    } 
}

The .WithComCleanup() is because I am using VSTO Contrib.


Detecting All Excel Cell Formats at once

Is there any way that would allow me to load texts or formats all at once?

Simply use the above method to detect all the cell formats (using AutoFill) and add them to a objectArray. Say for example I wanted to know the cell formats for columns A & B:

enter image description here

Using this VBA code I could to get all cell formats (at once without iterating over cells):

Range("C1").Select
ActiveCell.Value2 = "=CELL(""format"",A1)"
'Fill Down
Range("C1").Select
Selection.AutoFill Destination:=Range("C1:C6"), Type:=xlFillDefault
'Fill Across
Range("C1:C6").Select
Selection.AutoFill Destination:=Range("C1:D6"), Type:=xlFillDefault

Here is the above VBA code converted to C# and storing the formats in an object array:

var filepath = @"C:\temp\test\book2.xlsx";
var xlApp = new Microsoft.Office.Interop.Excel.Application();
//Optional but recommended if the user shouldn't see Excel.
xlApp.Visible = false;
xlApp.ScreenUpdating = false;
//AddToMru parameter is optional, but recommended in automation scenarios.
var workbook = xlApp.Workbooks.Open(filepath, AddToMru: false);

//This operation may take a little bit of time but no where near 15 minutes!!!
var cell = xlApp.Range["C1:C1"];
cell.Value2 = "=CELL(\"format\",A1)";
//Fill Down
cell.AutoFill(xlApp.Range["C1:C6"], Microsoft.Office.Interop.Excel.XlAutoFillType.xlFillDefault);
//Fill Across
cell = xlApp.Range["C1:C6"];
cell.AutoFill(xlApp.Range["C1:D6"], Microsoft.Office.Interop.Excel.XlAutoFillType.xlFillDefault);
//Get cell formats into object array
object[,] rangeFormats = xlApp.get_Range("C1:D6").Value2;

The Excel Percentage Conversion Trick

I have some inconsistent data. Sometimes there is 0.45, and sometimes 0.45%

If the only data inconsistencies you're having is with % values then here is a trick.

Presumably the percentage values will be in a column, to convert them, copy the column of values (in column A):

enter image description here

Make sure you set a column with values of 100 (as shown in column B)

Right click the cell in the column of 100's and choose Paste Special:

enter image description here

Choose Values and Multiply:

enter image description here

Excel will convert them to real numbers:

enter image description here

Obviously you can do this programmatically. Simply record the operation as a Macro and convert the VBA to C#.

and of course later i see it as 0.0045 in code.

Note: The code is right, 0.45% is not 45%, 0.45% is less than half a percent! If a particular customer is sending you files expecting you to break the laws of mathematics and treat 0.45% = 45% then there is a good chance they may suddenly start getting 100x more or 100x less. I'd politely point out they need to change it. Don't try and program around this. If this is why you want to look at Cell Formats then all you're doing is troubleshooting a symptom rather than fixing the root cause which will exacerbate the issue and hide a much bigger problem. Just politely point it out to the sources you have no control over there could be some massive problems by x100 fold and insist it needs to be corrected. Otherwise I expect to see a hilarious story about it in DailyWTF featuring this code:

var val = rangeValues[1,1].ToString();
var cellFormat = rangeFormat[1,1].ToString();

if (val.EndsWith("%") && val.Replace("%","") < 1 && cellFormat == "G")  {
   dailyWTFval = val.Replace("%","") * 100;
}
else
   dailyWTFval = val;    
}
like image 146
Jeremy Thompson Avatar answered Oct 28 '22 20:10

Jeremy Thompson


I think the easiest way to read large amounts of non harmonized data from excel will be following

  1. In C# save excel file in XML Spreadsheet 2003(*xml). That will produce xml file with all data and styling. C# method for saving - Workbook.SaveAs with FileFormat value = XlFileFormat.xlXMLSpreadsheet

  2. Parse xml file and extract data with formats

  3. Delete temporary file

Examle: Excel Excel

XML

<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
 xmlns:o="urn:schemas-microsoft-com:office:office"
 xmlns:x="urn:schemas-microsoft-com:office:excel"
 xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
 xmlns:html="http://www.w3.org/TR/REC-html40">
  <Styles>
    <Style ss:ID="Default" ss:Name="Normal">
      <Alignment ss:Vertical="Bottom"/>
      <Borders/>
      <Font ss:FontName="Calibri" x:CharSet="204" x:Family="Swiss" ss:Size="11"
       ss:Color="#000000"/>
      <Interior/>
      <NumberFormat/>
      <Protection/>
    </Style>
    <Style ss:ID="s62">
      <NumberFormat ss:Format="0%"/>
    </Style>
  </Styles>
  <Worksheet ss:Name="Sheet1">
    <Table ss:ExpandedColumnCount="1" ss:ExpandedRowCount="5" x:FullColumns="1"
     x:FullRows="1" ss:DefaultRowHeight="15">
      <Row ss:AutoFitHeight="0">
        <Cell>
          <Data ss:Type="String">Data</Data>
        </Cell>
      </Row>
      <Row ss:AutoFitHeight="0">
        <Cell>
          <Data ss:Type="Number">45</Data>
        </Cell>
      </Row>
      <Row ss:AutoFitHeight="0">
        <Cell ss:StyleID="s62">
          <Data ss:Type="Number">0.45</Data>
        </Cell>
      </Row>
      <Row ss:AutoFitHeight="0">
        <Cell>
          <Data ss:Type="String">String</Data>
        </Cell>
      </Row>
      <Row ss:AutoFitHeight="0">
        <Cell>
          <Data ss:Type="Number">45.5</Data>
        </Cell>
      </Row>
    </Table>
  </Worksheet>

I removed some nodes for simplicity. Following elements are required to be analyzed to correctly extract data

  1. Workbook\Worksheet\Table\Row\Cell\Data - contains data formated to invariant culture
  2. Workbook\Worksheet\Table\Row\Cell\Data, attribute ss:Type - contains data type of Data element contents
  3. Workbook\Worksheet\Table\Row\Cell, attribute ss:StyleID - reference to style, for your case it is only required to correctly identify cells where number is formated as percentage (multiplication to 100 required)
  4. Workbook\Styles\Style, attribute ss:ID - id of style used to reference style from cells
  5. Workbook\Styles\Style\NumberFormat, attrubute ss:Format - if ends with % and data type Number -> it is percentage

Parser logic:

  1. String type in cell -> convert as is
  2. Number type in cell. If format ends with '%' -> multiply by 100, otherwise use as is.

If data volumes are not so big(less then 200-300), it is possible to do that analysis cell by cell from C# without saving file in xml format.

like image 34
Timur Mannapov Avatar answered Oct 28 '22 20:10

Timur Mannapov