Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apache POI - reading modifies excel file

Whenever I open a excel file using the Apatche POI the file gets modified, even though I'm just reading the file and not making any modification.

Take for instance such test code.

public class ApachePoiTest {

    @Test
    public void readingShouldNotModifyFile() throws Exception {
        final File testFile = new File("C:/work/src/test/resources/Book2.xlsx");
        final byte[] originalChecksum = calculateChecksum(testFile);
        Assert.assertTrue("Calculating checksum modified file",
            MessageDigest.isEqual(originalChecksum, calculateChecksum(testFile)));
        try (Workbook wb = WorkbookFactory.create(testFile)) {
            Assert.assertNotNull("Reading file with Apache POI", wb);
        }
        Assert.assertTrue("Reading file with Apache POI modified file",
            MessageDigest.isEqual(originalChecksum, calculateChecksum(testFile)));
    }

    @Test
    public void readingInputStreamShouldNotModifyFile() throws Exception {
        final File testFile = new File("C:/work/src/test/resources/Book2.xlsx");
        final byte[] originalChecksum = calculateChecksum(testFile);
        Assert.assertTrue("Calculating checksum modified file",
            MessageDigest.isEqual(originalChecksum, calculateChecksum(testFile)));
        try (InputStream is = new FileInputStream(testFile); Workbook wb = WorkbookFactory.create(is)) {
            Assert.assertNotNull("Reading file with Apache POI", wb);
        }
        Assert.assertTrue("Reading file with Apache POI modified file",
            MessageDigest.isEqual(originalChecksum, calculateChecksum(testFile)));
    }

    private byte[] calculateChecksum(final File file) throws Exception {
        final MessageDigest md = MessageDigest.getInstance("MD5");
        md.reset();
        try (InputStream is = new FileInputStream(file)) {
            final byte[] bytes = new byte[2048];
            int numBytes;
            while ((numBytes = is.read(bytes)) != -1) {
                md.update(bytes, 0, numBytes);
            }
            return md.digest();
        }
    }
}

Test readingShouldNotModifyFile always fails, because the file gets always modified by Apache POI. More to it when testing on a blank excel file freshly created with MS Office, Apache POI cuts the file from 8.1 kb to 6.2 kb and corrupts the file.

Tested with:

<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi-ooxml</artifactId>
    <version>3.15</version>
</dependency>

and also with version 3.12

Can I prevent Apache POI from modifying my files by other means then passing InputStream instead of File. I don't want to pass InputStream because I'm concerned about Apache's warning that it takes more memory and has some specific requirements to the InputStream.

like image 376
MJar Avatar asked Oct 07 '16 14:10

MJar


1 Answers

Your problem is that you're not passing in the readonly flag, so Apache POI is defaulting to opening the file read/write.

You need to use the overloaded WorkbookFactory.create method which takes a readonly flag + set that readonly flag to true

Change the line

try (InputStream is = new FileInputStream(testFile); Workbook wb = WorkbookFactory.create(is)) {

to

try (IWorkbook wb = WorkbookFactory.create(testFile,null,true)) {

and your file will be opened read-only with no changes

like image 70
Gagravarr Avatar answered Oct 18 '22 09:10

Gagravarr