Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

julia: how to read a bz2 compressed text file

Tags:

In R, I can read a whole compressed text file into a character vector as

readLines("file.txt.bz2")

readLines transparently decompresses .gz and .bz2 files but also works with non-compressed files. Is there something analogous available in julia? I can do

text = open(f -> read(f, String), "file.txt")

but this cannot open compressed files. What is the preferred way to read bzip2 files? Is there any approach (besides manually checking the filename extension) that can deduce compression format automatically?

like image 648
Ott Toomet Avatar asked Mar 07 '20 01:03

Ott Toomet


People also ask

How do I open a txt BZ2 file?

WinZip opens and extracts BZ2 Compressed Archive Files—and many more formats. We designed WinZip to open and extract from the widest range of file formats, including all of the following: RAR. 7Z.

How do I uncompress a BZ2?

To extract (unzip) a tar. bz2 file simply right-click the file you want to extract and select “Extract”. Windows users will need a tool named 7zip to extract tar.

What is BZ2 compression?

BZip2 compression is usually applied to TAR archive format, which is usually employed to provide archiving of data and metadata on Unix and Unix-like systems as BSD, Linux and macOS, and it can also be used as alternative compression algorithm in ZIP and 7Z files - resulting archives can be read from most file ...

How do I grep in BZ2 files?

All options specified are passed directly to grep. If no file is specified, the standard input is decompressed if necessary and fed to grep. Otherwise, the given files are decompressed (if necessary) and fed to grep. If bzgrep is invoked as bzegrep or bzfgrep, egrep or fgrep is used instead of grep.


1 Answers

I don't know about anything automatic but this is how you could (create and) read a bz2 compressed file:

using CodecBzip2 # after ] add CodecBzip2

# Creating a dummy bz2 file
mystring = "Hello StackOverflow!"
mystring_compressed = transcode(Bzip2Compressor, mystring)
write("testfile.bz2", mystring_compressed)

# Reading and uncompressing it
compressed = read("testfile.bz2")
plain = transcode(Bzip2Decompressor, compressed)
String(plain) # "Hello StackOverflow!"

There are also streaming variants available. For more see CodecBzip2.jl.

like image 120
carstenbauer Avatar answered Oct 02 '22 17:10

carstenbauer