Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Write large text file data into excel

I am reading a text file separated with some delimiters.

Example of my text file content

Avc def efg jksjd
1 2 3 5
3 4 6 0

line by line and holding it in memory using hashmap having line numbers as key of integer type and each line of text file as List object

Consider, my map would store information like this

Integer List

1 [Avc def efg jksjd]

I am using Apache POI to write into excel. When writing into excel using Apache POI, I am following this approach, here is my code snippet

HSSFWorkbook workbook = new HSSFWorkbook();
HSSFSheet sheet = workbook.createSheet("Sample sheet");
Map<Integer, List<Object>> excelDataHolder = new LinkedHashMap<Integer, List<Object>>();
int rownum = 0;
for (Integer key : keyset) {
            Row row = sheet.createRow(rownum++);
            List<Object> objList = excelHolder.get(key);//excelHolder is my map
            int cellnum = 0;
            for (Object obj : objList) {
                Cell cell = row.createCell(cellnum++);
                    cell.setCellValue((Date) obj);
            }
}

This works quite well if the number of lines/records to be written into excel are less. Imagine, if the records are in billion number or if the text file has more lines assume in 100 000. I think, my approach fails, because createRow and createCell creates more than 100 000 objects in heap. Whatever the java to excel api, I think writing into it(excel) is based on the same approach i.e.., iteration of collection as shown above. I did some examples with aspose as well, as a result aspose also have the same problem I guess.

  • Does createRow and createCell create new objects each time they are called?
  • If yes, what is the alternative?. How would I write large data to excel with better performance?
like image 877
srk Avatar asked May 14 '13 08:05

srk


2 Answers

A recent version of apache-poi has sxssf. Shameless copy from website

SXSSF (package: org.apache.poi.xssf.streaming) is an API-compatible streaming extension of XSSF to be used when very large spreadsheets have to be produced, and heap space is limited. SXSSF achieves its low memory footprint by limiting access to the rows that are within a sliding window, while XSSF gives access to all rows in the document. Older rows that are no longer in the window become inaccessible, as they are written to the disk.

I had used it for creating spreadsheet with 1.5 million rows.

like image 114
Jayan Avatar answered Sep 28 '22 02:09

Jayan


I will answer with respect to Aspose.Cells for Java, since you tried it too.

Creating or loading a very large Excel file almost always requires large memory. Even if you read single line or multiple lines at a time, still you will be writing the contents to instance of Workbook, which is loaded into memory.

Solution 1 (Not good and very limited): Increase the heap size, if the max heap size allowed works for your largest file, opt for it.

Solution 2 (Complex with some manual work): Excel 2007 and later allows around 1 million rows per sheet. I would suggest you to create one workbook with only one sheet for 1 million rows. That is, if you have 10 million lines in text file, create 10 separate Excel workbooks.

Later on, combine them in a single Excel workbook manually. Aspose.Cells will give out of memory exception when copying sheets with such a huge data.

Below is the code snippet that creates 10 separate Excel files, each having 1 million rows.

import com.aspose.cells.*;
import java.util.*;

public class ExcelLargeTextImport
{
    private static String excelFile = Common.dataDir + "largedata.xlsx";

    public static void main(String args[])
    {
        try
        {
            Common.setLicenses();
            importToExcel();
        }
        catch(Exception ex)
        {
            System.out.println(ex.getMessage());
        }
    }

    private static void importToExcel() throws Exception
    {
        // Process each workbook in a method
        for (int sheetCounter=0 ; sheetCounter<10 ; sheetCounter++)
        {
            saveWorkbook(sheetCounter);
        }
    }

    private static void saveWorkbook(int sheetCounter) throws Exception
    {
        Workbook workbook = new Workbook();
        // Get the first sheet 
        Worksheet worksheet = workbook.getWorksheets().get(0);
        Cells cells = worksheet.getCells();

        // Initialize array list with 1 million records
        ArrayList<String> lines = new ArrayList<String>();
        int rowCount = 1000000;
        for (int i=0 ; i<rowCount ; i++)
        {
            lines.add(i + ";value1;value2;value3");
        }

        long lineNo = 1;
        for (String line : lines)
        {
            // Split the line by delimeter
            String[] values = line.split(";");

            // First cell
            Cell cell = cells.get("A" + lineNo);
            cell.setValue(values[0]);

            // Second cell
            cell = cells.get("B" + lineNo);
            cell.setValue(values[1]);

            // Third cell
            cell = cells.get("C" + lineNo);
            cell.setValue(values[2]);

            // Fourth cell
            cell = cells.get("D" + lineNo);
            cell.setValue(values[2]);

            lineNo++;
        }
        System.out.print(sheetCounter + " ");

        // Saving the Excel file
        workbook.save(excelFile.replace(".xlsx", sheetCounter + ".xlsx"));

        System.out.println("\nExcel file created");
    }
}

PS. I am a Developer Evangelist at Aspose.

like image 25
Saqib Razzaq Avatar answered Sep 28 '22 03:09

Saqib Razzaq