Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read XLS with Protected Book and Sheet via HSSF.EventUserModel

END GOAL: Efficiently (in one pass) read all CellRecords on a huge (30,000+ row), protected Worksheet.

Problem: Using the HSSF.EventUserModel, how can I read all Records (including CellRecords) for an XLS file with both Workbook and Worksheet protection?

Create Input Spreadsheet (in Excel 2010):

  1. Create new Blank workbook.
  2. Set value of A1 to number: 50
  3. Set value of A2 to string: fifty
  4. Set value of A3 to formula: =25*2
  5. Review (ribbon) -> Protect Sheet -> Password: pass1
  6. Review (ribbon) -> Protect Workbook -> Password: pass1
  7. File (ribbon) ->Save As... -> Save as type: Excel 97-2003 Workbook

Progress thus far:

  • The XLS file opens without a password in Excel. Therefore, you shouldn't need the password to open it in POI.
  • The XLS file opens successfully with new HSSFWorkbook(Stream fs). However, I need the efficiency of EventUserModel for my actual spreadsheet.
  • Setting NPOI.HSSF.Record.Crypto.Biff8EncryptionKey.CurrentUserPassword = "pass1"; did not work.
  • The ProcessRecord( ) function catches a PasswordRecord, but I can't find any documentation on how to properly handle it.
  • Perhaps, the EncryptionInfo or Decryptor classes may be of some use.

Note:
I'm using NPOI. However, I can translate any java examples to C#.

Code:
I use the following code to capture Record events. My Book1-unprotected.xls (without protection) shows all Record events (including cell values). My Book1-protected.xls displays some records and throws an exception.

I just view processedEvents in the debugger.

using System;
using System.Collections.Generic;
using System.IO;

using NPOI.HSSF.Record;
using NPOI.HSSF.Model;
using NPOI.HSSF.UserModel;
using NPOI.HSSF.EventUserModel;
using NPOI.POIFS;
using NPOI.POIFS.FileSystem;

namespace NPOI_small {
    class myListener : IHSSFListener {
        List<Record> processedRecords;

        private Stream fs;

        public myListener(Stream fs) {
            processedRecords = new List<Record>();
            this.fs = fs;

            HSSFEventFactory factory = new HSSFEventFactory();
            HSSFRequest request = new HSSFRequest();

            MissingRecordAwareHSSFListener mraListener;
            FormatTrackingHSSFListener fmtListener;
            EventWorkbookBuilder.SheetRecordCollectingListener recListener;
            mraListener = new MissingRecordAwareHSSFListener(this);
            fmtListener = new FormatTrackingHSSFListener(mraListener);
            recListener = new EventWorkbookBuilder.SheetRecordCollectingListener(fmtListener);
            request.AddListenerForAllRecords(recListener);

            POIFSFileSystem poifs = new POIFSFileSystem(this.fs);

            factory.ProcessWorkbookEvents(request, poifs);
        }

        public void ProcessRecord(Record record) {
            processedRecords.Add(record);
        }
    }
    class Program {
        static void Main(string[] args) {
            Stream fs = File.OpenRead(@"c:\users\me\desktop\xx\Book1-protected.xls");

            myListener testListener = new myListener(fs); // Use EventModel 
            //HSSFWorkbook book = new HSSFWorkbook(fs); // Use UserModel

            Console.Read();
        }
    }
}

UPDATE (for Juan Mellado): Below is the exception. My best guess right now (in the answer by Victor Petrykin) is that the HSSFEventFactory uses RecordInputStream which cannot natively decrypt protected records. Upon receiving the exception, processedRecords contains 22 records including the following potentially significant ones:

  • processedRecords[5] is a WriteAccessRecord with a garbled (probably encrypted) value for .name
  • processedRecords[22] is a RefreshAllRecord and is the last Record in the list

Exception:

NPOI.Util.RecordFormatException was unhandled
  HResult=-2146233088
  Message=Unable to construct record instance
  Source=NPOI
  StackTrace:
       at NPOI.HSSF.Record.RecordFactory.ReflectionConstructorRecordCreator.Create(RecordInputStream in1)
       at NPOI.HSSF.Record.RecordFactory.CreateSingleRecord(RecordInputStream in1)
       at NPOI.HSSF.Record.RecordFactory.CreateRecord(RecordInputStream in1)
       at NPOI.HSSF.EventUserModel.HSSFRecordStream.GetNextRecord()
       at NPOI.HSSF.EventUserModel.HSSFRecordStream.NextRecord()
       at NPOI.HSSF.EventUserModel.HSSFEventFactory.GenericProcessEvents(HSSFRequest req, RecordInputStream in1)
       at NPOI.HSSF.EventUserModel.HSSFEventFactory.ProcessEvents(HSSFRequest req, Stream in1)
       at NPOI.HSSF.EventUserModel.HSSFEventFactory.ProcessWorkbookEvents(HSSFRequest req, POIFSFileSystem fs)
       at NPOI_small.myListener..ctor(Stream fs) in c:\Users\me\Documents\Visual Studio 2012\Projects\myTest\NPOI_small\Program.cs:line 35
       at NPOI_small.Program.Main(String[] args) in c:\Users\me\Documents\Visual Studio 2012\Projects\myTest\NPOI_small\Program.cs:line 80
       at System.AppDomain._nExecuteAssembly(RuntimeAssembly assembly, String[] args)
       at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()
       at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
       at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
       at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
       at System.Threading.ThreadHelper.ThreadStart()
  InnerException: NPOI.Util.RecordFormatException
       HResult=-2146233088
       Message=Expected to find a ContinueRecord in order to read remaining 137 of 144 chars
       Source=NPOI
       StackTrace:
            at NPOI.HSSF.Record.RecordInputStream.ReadStringCommon(Int32 requestedLength, Boolean pIsCompressedEncoding)
            at NPOI.HSSF.Record.RecordInputStream.ReadUnicodeLEString(Int32 requestedLength)
            at NPOI.HSSF.Record.FontRecord..ctor(RecordInputStream in1)
like image 452
Steven Avatar asked Apr 03 '13 18:04

Steven


1 Answers

I think it's the bug in the NPOI library code. As far as I understood they use incorrect stream type for HSSFEventFactory: it uses RecordInputStream instead of RecordFactoryInputStream with decryption function like in the original POI library or in the UserModel (that's why HSSFWorkbook is working)

This code is working too but it's not a event logic:

POIFSFileSystem poifs = new POIFSFileSystem(fs);
Entry document = poifs.Root.GetEntry("Workbook");
DocumentInputStream docStream = new DocumentInputStream((DocumentEntry)document);
//RecordFactory factory = new RecordFactory();
//List<Record> records = RecordFactory.CreateRecords(docStream);
RecordFactoryInputStream recFacStream = new RecordFactoryInputStream(docStream, true);
Record currRecord;
while ((currRecord = recFacStream.NextRecord()) != null) 
   ProcessRecord(currRecord);
like image 68
Victor Petrykin Avatar answered Sep 21 '22 14:09

Victor Petrykin