I have data in an Excel spreadsheet with values like this:
The cells are formatted as Percentage, and set to display two decimal places. So they appear in Excel as:
I have a C# program that parses this data off the Clipboard
.
var dataObj = Clipboard.GetDataObject();
var format = DataFormats.CommaSeparatedValue;
if (dataObj != null && dataObj.GetDataPresent(format))
{
var csvData = dataObj.GetData(format);
// do something
}
The problem is that csvData
contains the display values from Excel, i.e. '69.49%' and '31.22%'. It does not contain the full precision of the extra decimal places.
I have tried using the various different DataFormats
values, but the data only ever contains the display value from Excel, e.g.:
DataFormats.Dif
DataFormats.Rtf
DataFormats.UnicodeText
As a test, I installed LibreOffice Calc and copy/pasted the same cells from Excel into Calc. Calc retains the full precision of the raw data.
So clearly Excel puts this data somewhere that other programs can access. How can I access it from my C# application?
Edit - Next steps.
I've downloaded the LibreOffice Calc source code and will have a poke around to see if I can find out how they get the full context of the copied data from Excel.
I also did a GetFormats()
call on the data object returned from the clipboard and got a list of 24 different data formats, some of which are not in the DataFormats
enum. These include formats like Biff12
, Biff8
, Biff5
, Format129
among other formats that are unfamiliar to me, so I'll investigate these and respond if I make any discoveries...
Excel allows XLLs to call the C API only when Excel has passed control to the XLL. A worksheet function that is called by Excel can call back into Excel by using the C API. An XLL command that is called by Excel can call the C API.
Also not a complete answer either, but some further insights into the problem:
When you copy a single Excel cell then what will end up in the clipboard is a complete Excel workbook which contains a single spreadsheet which in turn contains a single cell:
var dataObject = Clipboard.GetDataObject();
var mstream = (MemoryStream)dataObject.GetData("XML Spreadsheet");
// Note: For some reason we need to ignore the last byte otherwise
// an exception will occur...
mstream.SetLength(mstream.Length - 1);
var xml = XElement.Load(mstream);
Now, when you dump the content of the XElement to the console you can see that you indeed get a complete Excel Workbook. Also the "XML Spreadsheet" format contains the internal representation of the numbers stored in the cell. So I guess you could use Linq-To-Xml or similar to fetch the data you need:
XNamespace ssNs = "urn:schemas-microsoft-com:office:spreadsheet";
var numbers = xml.Descendants(ssNs + "Data").
Where(e => (string)e.Attribute(ssNs + "Type") == "Number").
Select(e => (double)e);
I've also tried to read the Biff formats using the Excel Data Reader however the resulting DataSets always came out empty...
The BIFF formats are an open specification by Microsoft. (Note, that I say specification not standard). Give a read to this to get an idea of what is going on.
Then those BIFF you see correspond to the some Excel formats. BIFF5 is XLS from Excel 5.0 and 95, BIFF8 is XLS from Excel 97 to 2003, BIFF12 is XLSB from Excel 2003, note that Excel 2007 can also produce them (I guess Excel 2010 too). There is some documentation here and also here (From OpenOffice) that may help you make sense of the binary there...
Anyways, there is some work has been done in past to parse this documents in C++, Java, VB and for your taste in C#. For example this BIFF12 Reader, the project NExcel, and ExcelLibrary to cite a few.
In particular NExcel will let you pass an stream which you can create from the clipboard data and then query NExcel to get the data. If you are going to take the source code then I think ExcelLibrary is much more readable.
You can get the stream like this:
var dataobject = System.Windows.Forms.Clipboard.GetDataObject();
var stream = (System.IO.Stream)dataobject.GetData(format);
And read form the stream with NExcel would be something like this:
var wb = getWorkbook(stream);
var sheet = wb.Sheets[0];
var somedata = sheet.getCell(0, 0).Contents;
I guess the actual Office libraries from Microsoft would work too.
I know this is not the whole tale, please share how is it going. Will try it if I get a chance.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With