Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best language to parse extremely large Excel 2007 files [closed]

My boss has a habit of performing queries on our databases that return tens of thousands of rows and saving them into excel files. I, being the intern, constantly have to write scripts that work with the information from these files. Thus far I've tried VBScript and Powershell for my scripting needs. Both of these can take several minutes to perform even the simplest of tasks, which would mean that the script when finished would take most of an 8 hour day.

My workaround right now is simply to write a PowerShell script that removes all of the commas and newline characters from an xlsx file, saves the .xlsx files to .csv, and then have a Java program handle the data gathering and output, and have my script clean up the .csv files when finished. This runs in a matter of seconds for my current project, but I can't help but wonder if there's a more elegant alternative for my next one. Any suggestions?

like image 215
arcdrag Avatar asked Aug 24 '10 20:08

arcdrag


1 Answers

I kept getting all kinds of weird errors when working with .xlsx files.

Here's a simple example of using Apache POI to traverse an .xlsx file. See also Upgrading to POI 3.5, including converting existing HSSF Usermodel code to SS Usermodel (for XSSF and HSSF).

import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.DateUtil;
import org.apache.poi.ss.usermodel.FormulaEvaluator;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;

public class XlsxReader {

    public static void main(String[] args) throws IOException {
        InputStream myxls = new FileInputStream("test.xlsx");
        Workbook book = new XSSFWorkbook(myxls);
        FormulaEvaluator eval =
            book.getCreationHelper().createFormulaEvaluator();
        Sheet sheet = book.getSheetAt(0);
        for (Row row : sheet) {
            for (Cell cell : row) {
                printCell(cell, eval);
                System.out.print("; ");
            }
            System.out.println();
        }
        myxls.close();
    }

    private static void printCell(Cell cell, FormulaEvaluator eval) {
        switch (cell.getCellType()) {
            case Cell.CELL_TYPE_BLANK:
                System.out.print("EMPTY");
                break;
            case Cell.CELL_TYPE_STRING:
                System.out.print(cell.getStringCellValue());
                break;
            case Cell.CELL_TYPE_NUMERIC:
                if (DateUtil.isCellDateFormatted(cell)) {
                    System.out.print(cell.getDateCellValue());
                } else {
                    System.out.print(cell.getNumericCellValue());
                }
                break;
            case Cell.CELL_TYPE_BOOLEAN:
                System.out.print(cell.getBooleanCellValue());
                break;
            case Cell.CELL_TYPE_FORMULA:
                System.out.print(cell.getCellFormula());
                break;
            default:
                System.out.print("DEFAULT");
        }
    }
}
like image 59
trashgod Avatar answered Sep 23 '22 23:09

trashgod