END GOAL: Efficiently (in one pass) read all CellRecords
on a huge (30,000+ row), protected Worksheet
.
Problem:
Using the HSSF.EventUserModel
, how can I read all Record
s (including CellRecords
) for an XLS file with both Workbook and Worksheet protection?
Create Input Spreadsheet (in Excel 2010):
Progress thus far:
new HSSFWorkbook(Stream fs)
. However, I need the efficiency of EventUserModel
for my actual spreadsheet.NPOI.HSSF.Record.Crypto.Biff8EncryptionKey.CurrentUserPassword = "pass1";
did not work.ProcessRecord( )
function catches a PasswordRecord
, but I can't find any documentation on how to properly handle it.EncryptionInfo
or Decryptor
classes may be of some use.Note:
I'm using NPOI. However, I can translate any java examples to C#.
Code:
I use the following code to capture Record
events. My Book1-unprotected.xls
(without protection) shows all Record
events (including cell values). My Book1-protected.xls
displays some records and throws an exception.
I just view processedEvents
in the debugger.
using System;
using System.Collections.Generic;
using System.IO;
using NPOI.HSSF.Record;
using NPOI.HSSF.Model;
using NPOI.HSSF.UserModel;
using NPOI.HSSF.EventUserModel;
using NPOI.POIFS;
using NPOI.POIFS.FileSystem;
namespace NPOI_small {
class myListener : IHSSFListener {
List<Record> processedRecords;
private Stream fs;
public myListener(Stream fs) {
processedRecords = new List<Record>();
this.fs = fs;
HSSFEventFactory factory = new HSSFEventFactory();
HSSFRequest request = new HSSFRequest();
MissingRecordAwareHSSFListener mraListener;
FormatTrackingHSSFListener fmtListener;
EventWorkbookBuilder.SheetRecordCollectingListener recListener;
mraListener = new MissingRecordAwareHSSFListener(this);
fmtListener = new FormatTrackingHSSFListener(mraListener);
recListener = new EventWorkbookBuilder.SheetRecordCollectingListener(fmtListener);
request.AddListenerForAllRecords(recListener);
POIFSFileSystem poifs = new POIFSFileSystem(this.fs);
factory.ProcessWorkbookEvents(request, poifs);
}
public void ProcessRecord(Record record) {
processedRecords.Add(record);
}
}
class Program {
static void Main(string[] args) {
Stream fs = File.OpenRead(@"c:\users\me\desktop\xx\Book1-protected.xls");
myListener testListener = new myListener(fs); // Use EventModel
//HSSFWorkbook book = new HSSFWorkbook(fs); // Use UserModel
Console.Read();
}
}
}
UPDATE (for Juan Mellado):
Below is the exception. My best guess right now (in the answer by Victor Petrykin) is that the HSSFEventFactory
uses RecordInputStream
which cannot natively decrypt protected records. Upon receiving the exception, processedRecords
contains 22 records including the following potentially significant ones:
WriteAccessRecord
with a garbled (probably encrypted) value for .name
RefreshAllRecord
and is the last Record
in the listException:
NPOI.Util.RecordFormatException was unhandled
HResult=-2146233088
Message=Unable to construct record instance
Source=NPOI
StackTrace:
at NPOI.HSSF.Record.RecordFactory.ReflectionConstructorRecordCreator.Create(RecordInputStream in1)
at NPOI.HSSF.Record.RecordFactory.CreateSingleRecord(RecordInputStream in1)
at NPOI.HSSF.Record.RecordFactory.CreateRecord(RecordInputStream in1)
at NPOI.HSSF.EventUserModel.HSSFRecordStream.GetNextRecord()
at NPOI.HSSF.EventUserModel.HSSFRecordStream.NextRecord()
at NPOI.HSSF.EventUserModel.HSSFEventFactory.GenericProcessEvents(HSSFRequest req, RecordInputStream in1)
at NPOI.HSSF.EventUserModel.HSSFEventFactory.ProcessEvents(HSSFRequest req, Stream in1)
at NPOI.HSSF.EventUserModel.HSSFEventFactory.ProcessWorkbookEvents(HSSFRequest req, POIFSFileSystem fs)
at NPOI_small.myListener..ctor(Stream fs) in c:\Users\me\Documents\Visual Studio 2012\Projects\myTest\NPOI_small\Program.cs:line 35
at NPOI_small.Program.Main(String[] args) in c:\Users\me\Documents\Visual Studio 2012\Projects\myTest\NPOI_small\Program.cs:line 80
at System.AppDomain._nExecuteAssembly(RuntimeAssembly assembly, String[] args)
at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.ThreadHelper.ThreadStart()
InnerException: NPOI.Util.RecordFormatException
HResult=-2146233088
Message=Expected to find a ContinueRecord in order to read remaining 137 of 144 chars
Source=NPOI
StackTrace:
at NPOI.HSSF.Record.RecordInputStream.ReadStringCommon(Int32 requestedLength, Boolean pIsCompressedEncoding)
at NPOI.HSSF.Record.RecordInputStream.ReadUnicodeLEString(Int32 requestedLength)
at NPOI.HSSF.Record.FontRecord..ctor(RecordInputStream in1)
I think it's the bug in the NPOI library code. As far as I understood they use incorrect stream type for HSSFEventFactory
: it uses RecordInputStream
instead of RecordFactoryInputStream
with decryption function like in the original POI library or in the UserModel
(that's why HSSFWorkbook
is working)
This code is working too but it's not a event logic:
POIFSFileSystem poifs = new POIFSFileSystem(fs);
Entry document = poifs.Root.GetEntry("Workbook");
DocumentInputStream docStream = new DocumentInputStream((DocumentEntry)document);
//RecordFactory factory = new RecordFactory();
//List<Record> records = RecordFactory.CreateRecords(docStream);
RecordFactoryInputStream recFacStream = new RecordFactoryInputStream(docStream, true);
Record currRecord;
while ((currRecord = recFacStream.NextRecord()) != null)
ProcessRecord(currRecord);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With