In Python, I want to browse all the sub directories in and only selectively extract a 7z file after checking its content. I do not want to extract all the files but I should be able to peep into the content iteratively/ recursively.
The main concern is the .7z zip is of size 15 GB but when it is unzipped it is 225 GB. Now my hard disk is 160 GB. Of those 225 GB I might need only valid 60 GB data. I can search for that only if I can go through the data in the individual file. Is there any os.walk kind of function on .7z file ?
https://dumps.wikimedia.org/other/static_html_dumps/current/en/*.7z
is the file, I am exploring.
7z l *.7z
7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=C.UTF-8,Utf16=on,HugeFiles=on,64 bits,4 CPUs Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz (406E3),ASM,AES-NI)
Scanning the drive for archives:
1 file, 15363543213 bytes (15 GiB)
Listing archive: wikipedia-en-html.tar.7z
--
Path = wikipedia-en-html.tar.7z
Type = 7z
Physical Size = 15363543213
Headers Size = 100
Method = LZMA:22
Solid = -
Blocks = 1
Date Time Attr Size Compressed Name
------------------- ----- ------------ ------------ ------------------------
2008-06-18 23:32:15 ..... 223674511360 15363543113 wikipedia-en-html.tar
------------------- ----- ------------ ------------ ------------------------
2008-06-18 23:32:15 223674511360 15363543113 1 files
import lzma
f7file = r"C:\Users\padmaraj.bhat\OneDrive - Accenture\Downloads\wiki-html\wikipedia-en-html.tar.7z"
f = lzma.open(f7file, 'rb')
for line in f:
lzma.decompress(line)
break
Traceback (most recent call last)
<ipython-input-5-d1a496a0c194> in <module>()
4
5 f = lzma.open(f7file, 'rb')
----> 6 for line in f:
7 lzma.decompress(line)
8 break
~\AppData\Local\Continuum\anaconda3\lib\lzma.py in readline(self, size)
220 """
221 self._check_can_read()
--> 222 return self._buffer.readline(size)
223
224 def write(self, data):
~\AppData\Local\Continuum\anaconda3\lib\_compression.py in readinto(self, b)
66 def readinto(self, b):
67 with memoryview(b) as view, view.cast("B") as byte_view:
---> 68 data = self.read(len(byte_view))
69 byte_view[:len(data)] = data
70 return len(data)
~\AppData\Local\Continuum\anaconda3\lib\_compression.py in read(self, size)
101 else:
102 rawblock = b""
--> 103 data = self._decompressor.decompress(rawblock, size)
104 if data:
105 break
LZMAError: Input format not supported by decoder
When I had to do something like that, I had to call the 7z
CLI via subprocess()
. In this way, you can determine file lists as well as file contents from the archive.
For example, for extracting files directly to stdout, you can use the -so
option.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With