Array operations are way faster than range operations in VSTO, so currently I'm using
object[,] RangeValues = wsh.get_Range("A1:" + lastCell.Address).Value2;
with pretty good effect. Sadly, I have some inconsistent data. Sometimes there is 0.45
, and sometimes 0.45%
, and of course later i see it as 0.0045
in code. Sadly, from "business" point of view, both values means 0.45
. I cannot force consistency, files come from various sources I don't have any authority over. It's something I need to deal with.
The way, of course, would be to look at format, or at display text, and see if there is a %
sign in it. If there is, I just need to multiply value by 100. Sadly, if I try:
object[,] RangeValues = wsh.get_Range("A1:" + lastCell.Address).Text;
I get message that cannot convert DBNull
to object[,]
. So is there any way that would allow me to load texts or formats all at once, without going thorough code <-> worksheet border at each step of mu loop?
To find the format of cells use Excel's Cell("format",A1)
function rather that interrogating the datatypes which would be much slower, harder and prone to problems, eg: 0.45% != 45%.
private void button1_Click(object sender, EventArgs e)
{
// evaluate the Format of Cells A1 thru to A7
using (var rnEvaluate = xlApp.Range["C1:C1"].WithComCleanup())
{
for (int i = 1; i < 8; i++)
{
rnEvaluate.Resource.Value2 = "=CELL(\"format\",A" + i.ToString() + ")";
string cellFormat = GetExcelCellFormat(rnEvaluate.Resource.Value2);
System.Diagnostics.Debug.Write(cellFormat);
}
}
}
private string GetExcelCellFormat(string cellFormat = "G")
{
switch (cellFormat.Substring(0, 1))
{
case "F" :
return "Number";
break;
case "P" :
return "Percentage";
break;
case "C":
return "Currency";
break;
case "D":
return "Date";
break;
default :
return "General";
break;
}
}
The .WithComCleanup()
is because I am using VSTO Contrib.
Is there any way that would allow me to load texts or formats all at once?
Simply use the above method to detect all the cell formats (using AutoFill) and add them to a objectArray. Say for example I wanted to know the cell formats for columns A & B:
Using this VBA code I could to get all cell formats (at once without iterating over cells):
Range("C1").Select
ActiveCell.Value2 = "=CELL(""format"",A1)"
'Fill Down
Range("C1").Select
Selection.AutoFill Destination:=Range("C1:C6"), Type:=xlFillDefault
'Fill Across
Range("C1:C6").Select
Selection.AutoFill Destination:=Range("C1:D6"), Type:=xlFillDefault
Here is the above VBA code converted to C# and storing the formats in an object array:
var filepath = @"C:\temp\test\book2.xlsx";
var xlApp = new Microsoft.Office.Interop.Excel.Application();
//Optional but recommended if the user shouldn't see Excel.
xlApp.Visible = false;
xlApp.ScreenUpdating = false;
//AddToMru parameter is optional, but recommended in automation scenarios.
var workbook = xlApp.Workbooks.Open(filepath, AddToMru: false);
//This operation may take a little bit of time but no where near 15 minutes!!!
var cell = xlApp.Range["C1:C1"];
cell.Value2 = "=CELL(\"format\",A1)";
//Fill Down
cell.AutoFill(xlApp.Range["C1:C6"], Microsoft.Office.Interop.Excel.XlAutoFillType.xlFillDefault);
//Fill Across
cell = xlApp.Range["C1:C6"];
cell.AutoFill(xlApp.Range["C1:D6"], Microsoft.Office.Interop.Excel.XlAutoFillType.xlFillDefault);
//Get cell formats into object array
object[,] rangeFormats = xlApp.get_Range("C1:D6").Value2;
I have some inconsistent data. Sometimes there is 0.45, and sometimes 0.45%
If the only data inconsistencies you're having is with % values then here is a trick.
Presumably the percentage values will be in a column, to convert them, copy the column of values (in column A):
Make sure you set a column with values of 100 (as shown in column B)
Right click the cell in the column of 100's and choose Paste Special:
Choose Values and Multiply:
Excel will convert them to real numbers:
Obviously you can do this programmatically. Simply record the operation as a Macro and convert the VBA to C#.
and of course later i see it as 0.0045 in code.
Note: The code is right, 0.45% is not 45%, 0.45% is less than half a percent! If a particular customer is sending you files expecting you to break the laws of mathematics and treat 0.45% = 45% then there is a good chance they may suddenly start getting 100x more or 100x less. I'd politely point out they need to change it. Don't try and program around this. If this is why you want to look at Cell Formats then all you're doing is troubleshooting a symptom rather than fixing the root cause which will exacerbate the issue and hide a much bigger problem. Just politely point it out to the sources you have no control over there could be some massive problems by x100 fold and insist it needs to be corrected. Otherwise I expect to see a hilarious story about it in DailyWTF featuring this code:
var val = rangeValues[1,1].ToString();
var cellFormat = rangeFormat[1,1].ToString();
if (val.EndsWith("%") && val.Replace("%","") < 1 && cellFormat == "G") {
dailyWTFval = val.Replace("%","") * 100;
}
else
dailyWTFval = val;
}
I think the easiest way to read large amounts of non harmonized data from excel will be following
In C# save excel file in XML Spreadsheet 2003(*xml). That will produce xml file with all data and styling. C# method for saving - Workbook.SaveAs with FileFormat value = XlFileFormat.xlXMLSpreadsheet
Parse xml file and extract data with formats
Examle: Excel
XML
<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:html="http://www.w3.org/TR/REC-html40">
<Styles>
<Style ss:ID="Default" ss:Name="Normal">
<Alignment ss:Vertical="Bottom"/>
<Borders/>
<Font ss:FontName="Calibri" x:CharSet="204" x:Family="Swiss" ss:Size="11"
ss:Color="#000000"/>
<Interior/>
<NumberFormat/>
<Protection/>
</Style>
<Style ss:ID="s62">
<NumberFormat ss:Format="0%"/>
</Style>
</Styles>
<Worksheet ss:Name="Sheet1">
<Table ss:ExpandedColumnCount="1" ss:ExpandedRowCount="5" x:FullColumns="1"
x:FullRows="1" ss:DefaultRowHeight="15">
<Row ss:AutoFitHeight="0">
<Cell>
<Data ss:Type="String">Data</Data>
</Cell>
</Row>
<Row ss:AutoFitHeight="0">
<Cell>
<Data ss:Type="Number">45</Data>
</Cell>
</Row>
<Row ss:AutoFitHeight="0">
<Cell ss:StyleID="s62">
<Data ss:Type="Number">0.45</Data>
</Cell>
</Row>
<Row ss:AutoFitHeight="0">
<Cell>
<Data ss:Type="String">String</Data>
</Cell>
</Row>
<Row ss:AutoFitHeight="0">
<Cell>
<Data ss:Type="Number">45.5</Data>
</Cell>
</Row>
</Table>
</Worksheet>
I removed some nodes for simplicity. Following elements are required to be analyzed to correctly extract data
Parser logic:
If data volumes are not so big(less then 200-300), it is possible to do that analysis cell by cell from C# without saving file in xml format.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With