Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Date format issue with POI XSSF and SAX (Event API)

I am using POI's Event API to process large volume of records without any memory foot print issues. Here is the refernce for it.

When i processing XLSX sheet, i am getting different format of Date value than specified format in excel sheet. Date format for a column in excel sheet is 'dd-mm-yyyy' where as I am getting the value in 'mm/dd/yy' format.

Can some one tell me how to get the actual format given in excel sheet. Reference of code snippet is given below.

ContentHandler handler = new XSSFSheetXMLHandler(styles, strings,
          new SheetContentsHandler() {
            public void startRow(int rowNum) {
            }
            public void endRow() {
            }
            public void cell(String cellReference, String formattedValue) {
                  System.out.println(formattedValue);
                } catch (IOException e) {
                    System.out.println(
                      "Exception during file writing");
                }
              }

Getting formmatedValue in cell method for date column is like 'mm/dd/yy' and hence i cant able to do the validations properly in my pl/sql program.

like image 610
janasainik Avatar asked Dec 12 '25 22:12

janasainik


2 Answers

Two points to keep in mind:

  1. The original Excel cell may have a format that doesn't work for you or may be formatted as general text.
  2. You may want to control exactly how dates, times or numeric values are formatted.

Another way to control the formatting of date, and other numeric values is to provide your own custom DataFormatter extending org.apache.poi.ss.usermodel.DataFormatter.

You simply override the formatRawCellContents() method (or other methods depending on your needs):

Sample code constructing the parser / handler:

public void processSheet(Styles styles, SharedStrings strings,
        SheetContentsHandler sheetHandler, InputStream sheetInputStream)
        throws IOException, SAXException {
    DataFormatter formatter = new CustomDataFormatter();
    InputSource sheetSource = new InputSource(sheetInputStream);
    try {
        XMLReader sheetParser = SAXHelper.newXMLReader();
        ContentHandler handler = new XSSFSheetXMLHandler(styles, null, strings, sheetHandler,
                formatter, false);
        sheetParser.setContentHandler(handler);
        sheetParser.parse(sheetSource);
    } catch (ParserConfigurationException e) {
        throw new RuntimeException("SAX parser appears to be broken - " + e.getMessage());
    }
}

private class CustomDataFormatter extends DataFormatter {

    @Override
    public String formatRawCellContents(double value, int formatIndex, String formatString,
            boolean use1904Windowing) {

        // Is it a date?
        if (DateUtil.isADateFormat(formatIndex, formatString)) {
            if (DateUtil.isValidExcelDate(value)) {
                Date d = DateUtil.getJavaDate(value, use1904Windowing);
                try {
                    return new SimpleDateFormat("yyyyMMdd").format(d);
                } catch (Exception e) {
                    logger.log(Level.SEVERE, "Bad date value in Excel: " + d, e);
                }
            }
        }
        return new DecimalFormat("##0.#####").format(value);
    }
}
like image 63
David Rappoport Avatar answered Dec 15 '25 12:12

David Rappoport


I had the very same problem. After a few days googling and research, I came up with a solution. Unfortunately, it isn't nice, but it works:

  1. Make a copy of org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler class in your project.
  2. Find the interface SheetContentsHandler in the class.
  3. Add a new method definition: String overriddenFormat(String cellRef, int formatIndex, String formatString);
  4. Find this method in the class: public void endElement(String uri, String localName, String name) throws SAXException.
  5. It has a long switch over the cell types.
  6. In the case NUMBER there is an if statement like this: if (this.formatString != null) {...
  7. Before that, paste this code:

    String overriddenFormat = output.overriddenFormat(cellRef, formatIndex, formatString);
    if (overriddenFormat != null) {
        this.formatIndex = -1;
        this.formatString = overriddenFormat;
    }
    
  8. Follow this article/answer: https://stackoverflow.com/a/11345859 but use your new class and interface.

  9. Now you can use unique date formats if it is needed.

My use case was: In a given sheet I have date values in G, H, and I columns, so my implementation of SheetContentsHandler.overriddenFormat is:

@Override
public String overriddenFormat(String cellRef, int formatIndex, String formatString) {
    if (cellRef.matches("(G|H|I)\\d+")) { //matches all cells in G, H, and I columns
        return "yyyy-mm-dd;@"; //this is the hungarian date format in excel
    }
    return null;
}

As you can see, in the endElement method I have overridden the formatIndex and formatString. The possible values of the formatIndex are described in org.apache.poi.ss.usermodel.DateUtil.isInternalDateFormat(int format). If the given value doesn't fit on these (and -1 does not fit), the formatString will be used through formatting the timestamp values. (The timestamp values are counted from about 1900.01.01 and have day-resolution.)

like image 38
Futó Tamás Avatar answered Dec 15 '25 10:12

Futó Tamás



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!