Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read Content from Files which are inside Zip file

I am trying to create a simple java program which reads and extracts the content from the file(s) inside zip file. Zip file contains 3 files (txt, pdf, docx). I need to read the contents of all these files and I am using Apache Tika for this purpose.

Can somebody help me out here to achieve the functionality. I have tried this so far but no success

Code Snippet

public class SampleZipExtract {       public static void main(String[] args) {          List<String> tempString = new ArrayList<String>();         StringBuffer sbf = new StringBuffer();          File file = new File("C:\\Users\\xxx\\Desktop\\abc.zip");         InputStream input;         try {            input = new FileInputStream(file);           ZipInputStream zip = new ZipInputStream(input);           ZipEntry entry = zip.getNextEntry();            BodyContentHandler textHandler = new BodyContentHandler();           Metadata metadata = new Metadata();            Parser parser = new AutoDetectParser();            while (entry!= null){                  if(entry.getName().endsWith(".txt") ||                             entry.getName().endsWith(".pdf")||                            entry.getName().endsWith(".docx")){               System.out.println("entry=" + entry.getName() + " " + entry.getSize());                      parser.parse(input, textHandler, metadata, new ParseContext());                      tempString.add(textHandler.toString());                 }            }            zip.close();            input.close();             for (String text : tempString) {            System.out.println("Apache Tika - Converted input string : " + text);            sbf.append(text);            System.out.println("Final text from all the three files " + sbf.toString());         } catch (FileNotFoundException e) {             // TODO Auto-generated catch block             e.printStackTrace();         } catch (IOException e) {             // TODO Auto-generated catch block             e.printStackTrace();         } catch (SAXException e) {             // TODO Auto-generated catch block             e.printStackTrace();         } catch (TikaException e) {             // TODO Auto-generated catch block             e.printStackTrace();         }     } } 
like image 766
S Jagdeesh Avatar asked Mar 27 '13 18:03

S Jagdeesh


People also ask

How do you view the contents in a zipped file?

Also, you can use the zip command with the -sf option to view the contents of the . zip file. Additionally, you can view the list of files in the . zip archive using the unzip command with the -l option.

How do I extract data from a zip file?

To unzip filesOpen File Explorer and find the zipped folder. To unzip the entire folder, right-click to select Extract All, and then follow the instructions. To unzip a single file or folder, double-click the zipped folder to open it. Then, drag or copy the item from the zipped folder to a new location.

Can zip files be read?

FME can read file or folder datasets stored in a compressed (zip) folder. Upon being read, the data will be extracted and used just as if it were a normal dataset.

How can I read the content of a zip file without unzipping it in Java?

Methods. getComment(): String – returns the zip file comment, or null if none. getEntry(String name): ZipEntry – returns the zip file entry for the specified name, or null if not found. getInputStream(ZipEntry entry) : InputStream – Returns an input stream for reading the contents of the specified zip file entry.


2 Answers

If you're wondering how to get the file content from each ZipEntry it's actually quite simple. Here's a sample code:

public static void main(String[] args) throws IOException {     ZipFile zipFile = new ZipFile("C:/test.zip");      Enumeration<? extends ZipEntry> entries = zipFile.entries();      while(entries.hasMoreElements()){         ZipEntry entry = entries.nextElement();         InputStream stream = zipFile.getInputStream(entry);     } } 

Once you have the InputStream you can read it however you want.

like image 70
Rodrigo Sasaki Avatar answered Oct 05 '22 19:10

Rodrigo Sasaki


As of Java 7, the NIO Api provides a better and more generic way of accessing the contents of Zip or Jar files. Actually, it is now a unified API which allows you to treat Zip files exactly like normal files.

In order to extract all of the files contained inside of a zip file in this API, you'd do this:

In Java 8:

private void extractAll(URI fromZip, Path toDirectory) throws IOException{     FileSystems.newFileSystem(fromZip, Collections.emptyMap())             .getRootDirectories()             .forEach(root -> {                 // in a full implementation, you'd have to                 // handle directories                  Files.walk(root).forEach(path -> Files.copy(path, toDirectory));             }); } 

In java 7:

private void extractAll(URI fromZip, Path toDirectory) throws IOException{     FileSystem zipFs = FileSystems.newFileSystem(fromZip, Collections.emptyMap());      for(Path root : zipFs.getRootDirectories()) {         Files.walkFileTree(root, new SimpleFileVisitor<Path>() {             @Override             public FileVisitResult visitFile(Path file, BasicFileAttributes attrs)                      throws IOException {                 // You can do anything you want with the path here                 Files.copy(file, toDirectory);                 return FileVisitResult.CONTINUE;             }              @Override             public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attrs)                      throws IOException {                 // In a full implementation, you'd need to create each                  // sub-directory of the destination directory before                  // copying files into it                 return super.preVisitDirectory(dir, attrs);             }         });     } } 
like image 40
LordOfThePigs Avatar answered Oct 05 '22 20:10

LordOfThePigs