I've found myself having to use a python script to access a webarchive.
What I have is a 'megawarc' web archive file from http://archive.org/details/archiveteam-fanfiction-warc-11. I need to un-megawarc this, using the python script found at https://github.com/alard/megawarc.
I'm trying to run the 'restore' command, and I have the three files needed (FILE.warc.gz, FILE.tar, and FILE.json.gz) from the first link.
I have both python 2.7 and 3.3 installed.
--------------update--------------
I've ran both this method..
python megawarc restore FILE
and this method..
Make sure you have the files megawarc and ordereddict.py in the same directory, with the files you want to convert. Rename the file megawarc to megawarc.py Open a python console in this directory
Type the following code (line by line) :
import sys
sys.argv = ['megawarc','restore','FILE']
import megawarc
megawarc.main()
using python 2.7, and this is what I get..
c:\Python27>python megawarc restore FILE
Traceback (most recent call last):
File "megawarc", line 563, in <module>
main()
File "megawarc", line 552, in main
mwr.process()
File "megawarc", line 460, in process
self.process_entry(entry, tar_out)
File "megawarc", line 478, in process_entry
entry["target"]["offset"], entry["target"]["size"])
File "megawarc", line 128, in copy_to_stream
raise Exception("End of file: %d bytes expected, but %d bytes read." % (buf_size, l))
Exception: End of file: 4096 bytes expected, but 236 bytes read.
Is there something else i'm missing?
I have the following files all in c:\python27
FILE.megawarc.json.gz
FILE.megawarc.tar
FILE.megawarc.warc.gz
megawarc
ordereddict.py
Is this some type of corrupt file error? Is there something i'm missing?
On the second link you provided, there are two important files :
megawarc
ordereddict.py
The executable script is megawarc
. To run it, you have to launch it in a shell with
python megawarc restore FILE
Alternatively, if you're using a UNIX-based system. You can do
chmod +x megawarc
To give megawarc script executable property and then run it with
./megawarc restore FILE
Here, FILE
is the actual name you should type if the 3 files you have are FILE.warc.gz
, FILE.tar
, and FILE.json.gz
. You have to change this parameter by the common prefix to your 3 input files if needed.
EDIT :
Okay, i found an alternative that would work if you don't have a standard shell to start the script in command line. What you have to do is :
megawarc
and ordereddict.py
in the same directory, with the files you want to convert.megawarc
to megawarc.py
Type the following code (line by line) :
import sys
sys.argv = ['megawarc','restore','FILE']
import megawarc
megawarc.main()
This should work, i've just tried it. Hope it will help.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With