Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Writing a large resultset to an Excel file using POI

This is sort of inline w/ Writing a large ResultSet to a File but the file in question is an Excel file.

I'm using the Apache POI library to write an Excel file with a large data set retrieved from a ResultSet object. The data could range from a few thousand records to about 1 million; not sure how this translates into file system bytes in Excel format.

The following is a test code I wrote to check out the time taken to write such a large result set and also the performance implication w.r.t CPU & Memory.

protected void writeResultsetToExcelFile(ResultSet rs, int numSheets, String fileNameAndPath) throws Exception {      BufferedOutputStream bos = new BufferedOutputStream(new FileOutputStream(fileNameAndPath));     int numColumns = rs.getMetaData().getColumnCount();      Workbook wb = ExcelFileUtil.createExcelWorkBook(true, numSheets);     Row heading = wb.getSheetAt(0).createRow(1);      ResultSetMetaData rsmd = rs.getMetaData();      for(int x = 0; x < numColumns; x++) {         Cell cell = heading.createCell(x+1);         cell.setCellValue(rsmd.getColumnLabel(x+1));     }      int rowNumber = 2;     int sheetNumber = 0;      while(rs.next()) {          if(rowNumber == 65001) {             log("Sheet " + sheetNumber + "written; moving onto to sheet " + (sheetNumber + 1));             sheetNumber++;             rowNumber = 2;         }          Row row = wb.getSheetAt(sheetNumber).createRow(rowNumber);         for(int y = 0; y < numColumns; y++) {             row.createCell(y+1).setCellValue(rs.getString(y+1));             wb.write(bos);         }          rowNumber++;     }      //wb.write(bos);      bos.close(); } 

Not much luck with the above code. The file which is created seems to grow rapidly (~70Mb per sec). So I stopped the execution after about 10 minutes (killed the JVM when the file reaches 7Gb) and tried to open the file in Excel 2007. The moment I open it, the file size becomes 8k(!) and only the header and the first row are created. Not sure what I'm missing here.

Any ideas?

like image 897
Ranga Avatar asked Sep 01 '11 17:09

Ranga


People also ask

How can I have more than 1 million rows in Excel?

You may generate large csv file yourself manually - save about million of rows from excel as csv, open such csv in Notepad, copy and add or add manually another half millions of rows or so, close the file.


2 Answers

Using SXSSF poi 3.8

package example;  import java.io.FileInputStream; import java.io.FileOutputStream;  import org.apache.poi.ss.usermodel.Cell; import org.apache.poi.ss.usermodel.Row; import org.apache.poi.ss.util.CellReference; import org.apache.poi.xssf.streaming.SXSSFSheet; import org.apache.poi.xssf.streaming.SXSSFWorkbook; import org.apache.poi.xssf.usermodel.XSSFWorkbook;  public class SXSSFexample {       public static void main(String[] args) throws Throwable {         FileInputStream inputStream = new FileInputStream("mytemplate.xlsx");         XSSFWorkbook wb_template = new XSSFWorkbook(inputStream);         inputStream.close();          SXSSFWorkbook wb = new SXSSFWorkbook(wb_template);          wb.setCompressTempFiles(true);          SXSSFSheet sh = (SXSSFSheet) wb.getSheetAt(0);         sh.setRandomAccessWindowSize(100);// keep 100 rows in memory, exceeding rows will be flushed to disk     for(int rownum = 4; rownum < 100000; rownum++){         Row row = sh.createRow(rownum);         for(int cellnum = 0; cellnum < 10; cellnum++){             Cell cell = row.createCell(cellnum);             String address = new CellReference(cell).formatAsString();             cell.setCellValue(address);         }      }       FileOutputStream out = new FileOutputStream("tempsxssf.xlsx");     wb.write(out);     out.close(); }  } 

It requires:

  • poi-ooxml-3.8.jar,
  • poi-3.8.jar,
  • poi-ooxml-schemas-3.8.jar,
  • stax-api-1.0.1.jar,
  • xml-apis-1.0.b2.jar,
  • xmlbeans-2.3.0.jar,
  • commons-codec-1.5.jar,
  • dom4j-1.6.1.jar

Useful link

like image 72
Juan Rojas Avatar answered Sep 21 '22 21:09

Juan Rojas


Oh. I think you're writing the workbook out 944,000 times. Your wb.write(bos) call is in the inner loop. I'm not sure this is quite consistent with the semantics of the Workbook class? From what I can tell in the Javadocs of that class, that method writes out the entire workbook to the output stream specified. And it's gonna write out every row you've added so far once for every row as the thing grows.

This explains why you're seeing exactly 1 row, too. The first workbook (with one row) to be written out to the file is all that is being displayed - and then 7GB of junk thereafter.

like image 28
Gian Avatar answered Sep 23 '22 21:09

Gian