Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

writing many records into an excel file can get very slow

I have an array of 1,300,000 records. Each record is an array itself. I read each record of the array and insert each bucket of that record in a cell of a row of an excel sheet and at the end, I write that excell sheet into an excel file. After writing 100k of records it becomes slower and slower and then breaks at the end. I used POI apache to do it and here is my code, I am not sure what causes the writing process slows down that much. Any hint?

try {
  //save to excel file
  FileOutputStream out = new FileOutputStream(new File(path));
  XSSFWorkbook resultWorkBook = new XSSFWorkbook();
  XSSFSheet sheet = resultWorkBook.createSheet("Comparison_result");
  int sizeOfOriginalTermMain = 0;
  int sizeOfOriginalTermMatch = 0;
  //blue cell style
    CellStyle blueStyle = resultWorkBook.createCellStyle();
    XSSFFont cellFont = resultWorkBook.createFont();
    cellFont.setColor(IndexedColors.BLUE.getIndex());
    blueStyle.setFont(cellFont);

  //yellow bg cell style
    CellStyle GreenStyle = resultWorkBook.createCellStyle();
    GreenStyle.setFillBackgroundColor(IndexedColors.GREEN.getIndex());


  //create heading 
  Row heading = sheet.createRow(0);
  heading.createCell(0).setCellValue("Main List ID");
  heading.createCell(1).setCellValue("Match number > 0");
  heading.createCell(2).setCellValue("Found Match ID");
  heading.createCell(3).setCellValue("Source list: 2");
  heading.createCell(4).setCellValue("Matched Trems");

  for(int i=0; i<5;i++) {
      CellStyle styleRowHeading = resultWorkBook.createCellStyle();
      XSSFFont font = resultWorkBook.createFont();
      font.setBold(true);
      font.setFontName(XSSFFont.DEFAULT_FONT_NAME);
      font.setFontHeightInPoints((short)11);
      styleRowHeading.setFont(font);
      heading.getCell(i).setCellStyle(styleRowHeading);
  }


  ArrayList<Object> currentList = new ArrayList<Object>();
  RecordId mainRecordId = new RecordId();
  String mainRecordIdValue = "";
  LinkedHashSet<String> commonStrings = new LinkedHashSet<String>();
  int numberOfMatch=0;
  RecordId matchRecordId = new RecordId();
  String matchRecordIdValue = "";
  int size = processResult.size();
  int matchRecordIdListNumber = 0;
  String concatenatedMatchTerms = "";
  ArrayList<String> OrininalTemrsInMainList = new ArrayList<String>();
  ArrayList<String> OrininalTemrsInMatchList = new ArrayList<String>();
  //adding value to each row of the excel sheet

  int q= 0;
  for (int i = 0; i < size; i++) {
    currentList = processResult.get(i);
    Row row = sheet.createRow(i+1);                   
    //object ppmsID column
    Cell mainIdCell = row.createCell(0);
    mainRecordId = (RecordId)(currentList.get(0));
    mainRecordIdValue = mainRecordId.getIdValue();
    mainIdCell.setCellValue(mainRecordIdValue);
    mainIdCell.setCellStyle(blueStyle);

    //productDB column
    Cell matchNumberCell = row.createCell(1);
    commonStrings = (LinkedHashSet<String>)(currentList.get(2));
    numberOfMatch = commonStrings.size();
    matchNumberCell.setCellValue(Integer.toString(numberOfMatch));

    //match record Id
    Cell matchIdCell = row.createCell(2);
    matchRecordId = (RecordId)(currentList.get(1));
    matchRecordIdValue = matchRecordId.getIdValue();
    matchRecordIdListNumber = matchRecordId.getListNumber();
    matchIdCell.setCellValue(matchRecordIdValue);


    Cell sourceListNumber = row.createCell(3);
    sourceListNumber.setCellValue(Integer.toString(matchRecordIdListNumber));

    //terms of match
    Cell matchTerms = row.createCell(4);
    concatenatedMatchTerms = getConcatenatedStringFromList(commonStrings);
    matchTerms.setCellValue(concatenatedMatchTerms);

    OrininalTemrsInMainList = (ArrayList<String>) currentList.get(3);
    sizeOfOriginalTermMain = OrininalTemrsInMainList.size();
    OrininalTemrsInMatchList = (ArrayList<String>) currentList.get(4);
    sizeOfOriginalTermMatch = OrininalTemrsInMatchList.size();
    for (int k = 0; k<sizeOfOriginalTermMain;k++) {
        Cell newCell = row.createCell(5+k);
        newCell.setCellValue(OrininalTemrsInMainList.get(k));
        newCell.setCellStyle(blueStyle);

    }
    Cell emptyCell = row.createCell(5+sizeOfOriginalTermMain);
    emptyCell.setCellValue("emptyCell");
    emptyCell.setCellStyle(GreenStyle);
    for (int n = 0; n<OrininalTemrsInMatchList.size();n++) {
        Cell newCell = row.createCell(5+sizeOfOriginalTermMain+1+n);
        newCell.setCellValue(OrininalTemrsInMatchList.get(n));
    }

  }

  resultWorkBook.write(out);
  out.close();
  resultWorkBook.close();


}catch(Exception e) {
  System.out.println(e.getMessage());
}
like image 434
user1836957 Avatar asked Dec 14 '22 19:12

user1836957


1 Answers

Don't use XSSF to create spreadsheets with so many cells.
XSSF relies on objects consuming a lot of memory.

Instead use SXSSF that is a Streaming Usermodel API.

SXSSF (package: org.apache.poi.xssf.streaming) is an API-compatible streaming extension of XSSF to be used when very large spreadsheets have to be produced, and heap space is limited. SXSSF achieves its low memory footprint by limiting access to the rows that are within a sliding window, while XSSF gives access to all rows in the document. Older rows that are no longer in the window become inaccessible, as they are written to the disk.

Updating a code that uses XSSF to use SXSSF is rather a piece of cake.

Two important things :

The window size (number of rows accessible in memory) : using the default or configuring it explicitly if suitable

You can specify the window size at workbook construction time via new SXSSFWorkbook(int windowSize) or you can set it per-sheet via SXSSFSheet#setRandomAccessWindowSize(int windowSize)

When a new row is created via createRow() and the total number of unflushed records would exceed the specified window size, then the row with the lowest index value is flushed and cannot be accessed via getRow() anymore.

The default window size is 100 and defined by SXSSFWorkbook.DEFAULT_WINDOW_SIZE.

Clean up requirement

SXSSF allocates temporary files that you must always clean up explicitly, by calling the dispose method.

It should be invoked :

SXSSFWorkbook.dispose();

So you should write something as :

SXSSFWorkbook wb = new SXSSFWorkbook(100); // keep 100 rows in memory, exceeding rows will be flushed to disk
 // write rows ...
      ...
// dispose of temporary files backing this workbook on disk
wb.dispose();

About SXSSF limitations :

Due to the streaming nature of the implementation, there are the following limitations when compared to XSSF:

  • Only a limited number of rows are accessible at a point in time.

  • Sheet.clone() is not supported.

  • Formula evaluation is not supported

About your corrupted file :

According to official SXSSF limitations, if you don't rely on Formula evaluation, the cause of the corrupted excel file is probably not related to the SXSSF model.

Before trying anything, you could update to the last stable POI version.

Then, it is hard to give specific pointers but as a general advise, isolate things to try to understand what exactly happens.
You could start by reducing the number of produced rows and processing only some specific cols to see whether that fixes the issue.
If it doesn't work, you could also test by using default styles.

like image 178
davidxxx Avatar answered Dec 18 '22 00:12

davidxxx