I am trying to create a simple java program which reads and extracts the content from the file(s) inside zip file. Zip file contains 3 files (txt, pdf, docx). I need to read the contents of all these files and I am using Apache Tika for this purpose.
Can somebody help me out here to achieve the functionality. I have tried this so far but no success
Code Snippet
public class SampleZipExtract { public static void main(String[] args) { List<String> tempString = new ArrayList<String>(); StringBuffer sbf = new StringBuffer(); File file = new File("C:\\Users\\xxx\\Desktop\\abc.zip"); InputStream input; try { input = new FileInputStream(file); ZipInputStream zip = new ZipInputStream(input); ZipEntry entry = zip.getNextEntry(); BodyContentHandler textHandler = new BodyContentHandler(); Metadata metadata = new Metadata(); Parser parser = new AutoDetectParser(); while (entry!= null){ if(entry.getName().endsWith(".txt") || entry.getName().endsWith(".pdf")|| entry.getName().endsWith(".docx")){ System.out.println("entry=" + entry.getName() + " " + entry.getSize()); parser.parse(input, textHandler, metadata, new ParseContext()); tempString.add(textHandler.toString()); } } zip.close(); input.close(); for (String text : tempString) { System.out.println("Apache Tika - Converted input string : " + text); sbf.append(text); System.out.println("Final text from all the three files " + sbf.toString()); } catch (FileNotFoundException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (SAXException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (TikaException e) { // TODO Auto-generated catch block e.printStackTrace(); } } }
Also, you can use the zip command with the -sf option to view the contents of the . zip file. Additionally, you can view the list of files in the . zip archive using the unzip command with the -l option.
To unzip filesOpen File Explorer and find the zipped folder. To unzip the entire folder, right-click to select Extract All, and then follow the instructions. To unzip a single file or folder, double-click the zipped folder to open it. Then, drag or copy the item from the zipped folder to a new location.
FME can read file or folder datasets stored in a compressed (zip) folder. Upon being read, the data will be extracted and used just as if it were a normal dataset.
Methods. getComment(): String – returns the zip file comment, or null if none. getEntry(String name): ZipEntry – returns the zip file entry for the specified name, or null if not found. getInputStream(ZipEntry entry) : InputStream – Returns an input stream for reading the contents of the specified zip file entry.
If you're wondering how to get the file content from each ZipEntry
it's actually quite simple. Here's a sample code:
public static void main(String[] args) throws IOException { ZipFile zipFile = new ZipFile("C:/test.zip"); Enumeration<? extends ZipEntry> entries = zipFile.entries(); while(entries.hasMoreElements()){ ZipEntry entry = entries.nextElement(); InputStream stream = zipFile.getInputStream(entry); } }
Once you have the InputStream you can read it however you want.
As of Java 7, the NIO Api provides a better and more generic way of accessing the contents of Zip or Jar files. Actually, it is now a unified API which allows you to treat Zip files exactly like normal files.
In order to extract all of the files contained inside of a zip file in this API, you'd do this:
In Java 8:
private void extractAll(URI fromZip, Path toDirectory) throws IOException{ FileSystems.newFileSystem(fromZip, Collections.emptyMap()) .getRootDirectories() .forEach(root -> { // in a full implementation, you'd have to // handle directories Files.walk(root).forEach(path -> Files.copy(path, toDirectory)); }); }
In java 7:
private void extractAll(URI fromZip, Path toDirectory) throws IOException{ FileSystem zipFs = FileSystems.newFileSystem(fromZip, Collections.emptyMap()); for(Path root : zipFs.getRootDirectories()) { Files.walkFileTree(root, new SimpleFileVisitor<Path>() { @Override public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException { // You can do anything you want with the path here Files.copy(file, toDirectory); return FileVisitResult.CONTINUE; } @Override public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attrs) throws IOException { // In a full implementation, you'd need to create each // sub-directory of the destination directory before // copying files into it return super.preVisitDirectory(dir, attrs); } }); } }
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With