What I want to do is download a .tar file with multiple directories with 2 files each. The problem is I can't find a way to read the tar file without actually extracting the files (using tar
).
The perfect solution would be something like:
#include <easytar>
Tarfile tar("somefile.tar");
std::string currentFile, currentFileName;
for(int i=0; i<tar.size(); i++){
file = tar.getFileText(i);
currentFileName = tar.getFileName(i);
// do stuff with it
}
I'm probably going to have to write this myself, but any ideas would be appreciated..
Simply right-click the item you want to compress, mouseover compress, and choose tar.gz. You can also right-click a tar.gz file, mouseover extract, and select an option to unpack the archive.
Open a command prompt, and cd to the directory. Type 7z x filename. tar at the command prompt (where filename. tar is the name of the tar file).
I figured this out myself after a bit of work. The tar file spec actually tells you everything you need to know.
First off, every file starts with a 512 byte header, so you can represent it with a char[512] or a char* pointing at somewhere in your larger char array (if you have the entire file loaded into one array for example).
The header looks like this:
location size field
0 100 File name
100 8 File mode
108 8 Owner's numeric user ID
116 8 Group's numeric user ID
124 12 File size in bytes
136 12 Last modification time in numeric Unix time format
148 8 Checksum for header block
156 1 Link indicator (file type)
157 100 Name of linked file
So if you want the file name, you grab it right here with string filename(buffer[0], 100);
. The file name is null padded, so you could do a check to make sure there's at least one null and then leave off the size if you want to save space.
Now we want to know if it's a file or a folder. The "link indicator" field has this information, so:
// Note that we're comparing to ascii numbers, not ints
switch(buffer[156]){
case '0': // intentionally dropping through
case '\0':
// normal file
break;
case '1':
// hard link
break;
case '2':
// symbolic link
break;
case '3':
// device file/special file
break;
case '4':
// block device
break;
case '5':
// directory
break;
case '6':
// named pipe
break;
}
At this point, we already have all of the information we need about directories, but we need one more thing from normal files: the actual file contents.
The length of the file can be stored in two different ways, either as a 0-or-space-padded null-terminated octal string, or "a base-256 coding that is indicated by setting the high-order bit of the leftmost byte of a numeric field".
Numeric values are encoded in octal numbers using ASCII digits, with leading zeroes. For historical reasons, a final NUL or space character should be used. Thus although there are 12 bytes reserved for storing the file size, only 11 octal digits can be stored. This gives a maximum file size of 8 gigabytes on archived files. To overcome this limitation, star in 2001 introduced a base-256 coding that is indicated by setting the high-order bit of the leftmost byte of a numeric field. GNU-tar and BSD-tar followed this idea. Additionally, versions of tar from before the first POSIX standard from 1988 pad the values with spaces instead of zeroes.
Here's how you would read the octal format, but I haven't written code for the base-256 version:
// in one function
int size_of_file = octal_string_to_int(&buffer[124], 11);
// elsewhere
int octal_string_to_int(char *current_char, unsigned int size){
unsigned int output = 0;
while(size > 0){
output = output * 8 + *current_char - '0';
current_char++;
size--;
}
return output;
}
Ok, so now we have everything except the actual file contents. All we have to do is grab the next size
bytes of data from the tar file and we'll have our file contents:
// Get to the next block after the header ends
location += 512;
file_contents = new char[size];
memcpy(file_contents, &buffer[location], size);
// Go to the next block by rounding up to 512
// This isn't necessarily the most efficient way to do this,
// but it's the most obvious.
location += (int)ceil(size / 512.0)
Have you looked at libtar?
From the fink package info:
libtar-1.2-1: Tar file manipulation API libtar is a C library for manipulating POSIX tar files. It handles adding and extracting files to/from a tar archive. libtar offers the following features:
* Flexible API - you can manipulate individual files or just extract a whole archive at once.
* Allows user-specified read() and write() functions, such as zlib's gzread() and gzwrite().
* Supports both POSIX 1003.1-1990 and GNU tar file formats.
Not c++ per se, but you can link to c pretty easily...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With