I searched how to detect that file is .xls
and I've found a solution like this (but not deprecated):
POIFSFileSystem:
@Deprecated
@Removal(version="4.0")
public static boolean hasPOIFSHeader(InputStream inp) throws IOException {
return FileMagic.valueOf(inp) == FileMagic.OLE2;
}
But this one returns true for all microsoft word documents for example for .doc
Is there a way to detect .xls
document?
Both .doc/.xls documents can are stored in the OLE2 storage format. The org.apache.poi.poifs.filesystem.FileMagic
helps you to detect the file storage format only and not sufficient alone to distinguish between .doc/.xls files.
Also it does not appear that there is any direct API available in POI library to determine the document type (excel or document) for given inputstream/file.
Below example my be helpful to determine if given stream is a valid .xls (or .xlsx)file with the caveat that it read the given inputstram and close it.
// slurp content from given input and close it
public static boolean isExcelFile(InputStream in) throws IOException {
try {
// it slurp the input stream
Workbook workbook = org.apache.poi.ss.usermodel.WorkbookFactory.create(in);
workbook.close();
return true;
} catch (java.lang.IllegalArgumentException | org.apache.poi.openxml4j.exceptions.InvalidFormatException e) {
return false;
}
}
You may found more information on excel file format on this link
Update Solution based on Apache Tika as suggested by gagravarr:
public class TikaBasedFileTypeDetector {
private Tika tika;
private TemporaryResources temporaryResources;
public void init() {
this.tika = new Tika();
this.temporaryResources = new TemporaryResources();
}
// clean up all the temporary resources
public void destroy() throws IOException {
temporaryResources.close();
}
// return content mime type
public String detectType(InputStream in) throws IOException {
TikaInputStream tikaInputStream = TikaInputStream.get(in, temporaryResources);
return tika.detect(tikaInputStream);
}
public boolean isExcelFile(InputStream in) throws IOException{
// see https://stackoverflow.com/a/4212908/1700467 for information on mimetypes
String type = detectType(in);
return type.startsWith("application/vnd.ms-excel") || //for Micorsoft document
type.startsWith("application/vnd.openxmlformats-officedocument.spreadsheetml"); // for OpenOffice xml format
}
}
See this answer on mime types.
You can work with Apache POI's - HSSF module.
That model (library) is written to read and write xls files (and latest for xlsx as well - although these are different languages).
With this code...
InputStream ExcelFileToRead = new FileInputStream("FileNameWithLink.xls");
HSSFWorkbook wb = new HSSFWorkbook(ExcelFileToRead);
HSSFSheet sheet = wb.getSheetAt(0);
...you can detect if it is readable xls file.
Going deeper you can use this code to try reading it etc. Actually that module is really easy to use.
There can be situations that it technically is .xls
file, but it may not be readable (there can be various problems with it).
Extra - XSSF is for .xlsx
and HSSF is for .xls
.
I haven't used other techniques as I always want to be sure that I will be able read that file later.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With