To unzip a file in Python, use the ZipFile. extractall() method. The extractall() method takes a path, members, pwd as an argument and extracts all the contents. To work on zip files using Python, we will use an inbuilt python module called zipfile.
Run the Command in The Terminal With osUsing the os. system() command, we will execute the gunzip command to uncompress the files. This is it. You now have uncompressed multiple gzip files at once using Python and gunzip.
import gzip
import shutil
with gzip.open('file.txt.gz', 'rb') as f_in:
with open('file.txt', 'wb') as f_out:
shutil.copyfileobj(f_in, f_out)
From the documentation:
import gzip
with gzip.open('file.txt.gz', 'rb') as f:
file_content = f.read()
with gzip.open('features_train.csv.gz') as f:
features_train = pd.read_csv(f)
features_train.head()
from sh import gunzip
gunzip('/tmp/file1.gz')
Not an exact answer because you're using xml data and there is currently no pd.read_xml()
function (as of v0.23.4), but pandas (starting with v0.21.0) can uncompress the file for you! Thanks Wes!
import pandas as pd
import os
fn = '../data/file_to_load.json.gz'
print(os.path.isfile(fn))
df = pd.read_json(fn, lines=True, compression='gzip')
df.tail()
If you are parsing the file after unzipping it, don't forget to use decode() method, is necessary when you open a file as binary.
import gzip
with gzip.open(file.gz, 'rb') as f:
for line in f:
print(line.decode().strip())
It is very simple.. Here you go !!
import gzip
#path_to_file_to_be_extracted
ip = sample.gzip
#output file to be filled
op = open("output_file","w")
with gzip.open(ip,"rb") as ip_byte:
op.write(ip_byte.read().decode("utf-8")
wf.close()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With