I have an array of 1,300,000 records. Each record is an array itself. I read each record of the array and insert each bucket of that record in a cell of a row of an excel sheet and at the end, I write that excell sheet into an excel file. After writing 100k of records it becomes slower and slower and then breaks at the end. I used POI apache to do it and here is my code, I am not sure what causes the writing process slows down that much. Any hint?
try {
//save to excel file
FileOutputStream out = new FileOutputStream(new File(path));
XSSFWorkbook resultWorkBook = new XSSFWorkbook();
XSSFSheet sheet = resultWorkBook.createSheet("Comparison_result");
int sizeOfOriginalTermMain = 0;
int sizeOfOriginalTermMatch = 0;
//blue cell style
CellStyle blueStyle = resultWorkBook.createCellStyle();
XSSFFont cellFont = resultWorkBook.createFont();
cellFont.setColor(IndexedColors.BLUE.getIndex());
blueStyle.setFont(cellFont);
//yellow bg cell style
CellStyle GreenStyle = resultWorkBook.createCellStyle();
GreenStyle.setFillBackgroundColor(IndexedColors.GREEN.getIndex());
//create heading
Row heading = sheet.createRow(0);
heading.createCell(0).setCellValue("Main List ID");
heading.createCell(1).setCellValue("Match number > 0");
heading.createCell(2).setCellValue("Found Match ID");
heading.createCell(3).setCellValue("Source list: 2");
heading.createCell(4).setCellValue("Matched Trems");
for(int i=0; i<5;i++) {
CellStyle styleRowHeading = resultWorkBook.createCellStyle();
XSSFFont font = resultWorkBook.createFont();
font.setBold(true);
font.setFontName(XSSFFont.DEFAULT_FONT_NAME);
font.setFontHeightInPoints((short)11);
styleRowHeading.setFont(font);
heading.getCell(i).setCellStyle(styleRowHeading);
}
ArrayList<Object> currentList = new ArrayList<Object>();
RecordId mainRecordId = new RecordId();
String mainRecordIdValue = "";
LinkedHashSet<String> commonStrings = new LinkedHashSet<String>();
int numberOfMatch=0;
RecordId matchRecordId = new RecordId();
String matchRecordIdValue = "";
int size = processResult.size();
int matchRecordIdListNumber = 0;
String concatenatedMatchTerms = "";
ArrayList<String> OrininalTemrsInMainList = new ArrayList<String>();
ArrayList<String> OrininalTemrsInMatchList = new ArrayList<String>();
//adding value to each row of the excel sheet
int q= 0;
for (int i = 0; i < size; i++) {
currentList = processResult.get(i);
Row row = sheet.createRow(i+1);
//object ppmsID column
Cell mainIdCell = row.createCell(0);
mainRecordId = (RecordId)(currentList.get(0));
mainRecordIdValue = mainRecordId.getIdValue();
mainIdCell.setCellValue(mainRecordIdValue);
mainIdCell.setCellStyle(blueStyle);
//productDB column
Cell matchNumberCell = row.createCell(1);
commonStrings = (LinkedHashSet<String>)(currentList.get(2));
numberOfMatch = commonStrings.size();
matchNumberCell.setCellValue(Integer.toString(numberOfMatch));
//match record Id
Cell matchIdCell = row.createCell(2);
matchRecordId = (RecordId)(currentList.get(1));
matchRecordIdValue = matchRecordId.getIdValue();
matchRecordIdListNumber = matchRecordId.getListNumber();
matchIdCell.setCellValue(matchRecordIdValue);
Cell sourceListNumber = row.createCell(3);
sourceListNumber.setCellValue(Integer.toString(matchRecordIdListNumber));
//terms of match
Cell matchTerms = row.createCell(4);
concatenatedMatchTerms = getConcatenatedStringFromList(commonStrings);
matchTerms.setCellValue(concatenatedMatchTerms);
OrininalTemrsInMainList = (ArrayList<String>) currentList.get(3);
sizeOfOriginalTermMain = OrininalTemrsInMainList.size();
OrininalTemrsInMatchList = (ArrayList<String>) currentList.get(4);
sizeOfOriginalTermMatch = OrininalTemrsInMatchList.size();
for (int k = 0; k<sizeOfOriginalTermMain;k++) {
Cell newCell = row.createCell(5+k);
newCell.setCellValue(OrininalTemrsInMainList.get(k));
newCell.setCellStyle(blueStyle);
}
Cell emptyCell = row.createCell(5+sizeOfOriginalTermMain);
emptyCell.setCellValue("emptyCell");
emptyCell.setCellStyle(GreenStyle);
for (int n = 0; n<OrininalTemrsInMatchList.size();n++) {
Cell newCell = row.createCell(5+sizeOfOriginalTermMain+1+n);
newCell.setCellValue(OrininalTemrsInMatchList.get(n));
}
}
resultWorkBook.write(out);
out.close();
resultWorkBook.close();
}catch(Exception e) {
System.out.println(e.getMessage());
}
Don't use XSSF
to create spreadsheets with so many cells.XSSF
relies on objects consuming a lot of memory.
Instead use SXSSF
that is a Streaming Usermodel API.
SXSSF (package: org.apache.poi.xssf.streaming) is an API-compatible streaming extension of XSSF to be used when very large spreadsheets have to be produced, and heap space is limited. SXSSF achieves its low memory footprint by limiting access to the rows that are within a sliding window, while XSSF gives access to all rows in the document. Older rows that are no longer in the window become inaccessible, as they are written to the disk.
Updating a code that uses XSSF
to use SXSSF
is rather a piece of cake.
Two important things :
The window size (number of rows accessible in memory) : using the default or configuring it explicitly if suitable
You can specify the window size at workbook construction time via new SXSSFWorkbook(int windowSize) or you can set it per-sheet via SXSSFSheet#setRandomAccessWindowSize(int windowSize)
When a new row is created via createRow() and the total number of unflushed records would exceed the specified window size, then the row with the lowest index value is flushed and cannot be accessed via getRow() anymore.
The default window size is 100 and defined by SXSSFWorkbook.DEFAULT_WINDOW_SIZE.
Clean up requirement
SXSSF allocates temporary files that you must always clean up explicitly, by calling the dispose method.
It should be invoked :
SXSSFWorkbook.dispose();
So you should write something as :
SXSSFWorkbook wb = new SXSSFWorkbook(100); // keep 100 rows in memory, exceeding rows will be flushed to disk
// write rows ...
...
// dispose of temporary files backing this workbook on disk
wb.dispose();
About SXSSF
limitations :
Due to the streaming nature of the implementation, there are the following limitations when compared to XSSF:
Only a limited number of rows are accessible at a point in time.
Sheet.clone() is not supported.
Formula evaluation is not supported
About your corrupted file :
According to official SXSSF
limitations, if you don't rely on Formula evaluation, the cause of the corrupted excel file is probably not related to the SXSSF
model.
Before trying anything, you could update to the last stable POI version.
Then, it is hard to give specific pointers but as a general advise, isolate things to try to understand what exactly happens.
You could start by reducing the number of produced rows and processing only some specific cols to see whether that fixes the issue.
If it doesn't work, you could also test by using default styles.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With