What I want to do is download a .tar file with multiple directories with 2 files each. The problem is I can't find a way to read the tar file without actually extracting the files (using <code>tar</code>). The perfect solution would be something like: <pre class="prettyprint"><code>#include <easytar> Tarfile tar("somefile.tar"); std::string currentFile, currentFileName; for(int i=0; i<tar.size(); i++){ file = tar.getFileText(i); currentFileName = tar.getFileName(i); // do stuff with it } </code></pre> I'm probably going to have to write this myself, but any ideas would be appreciated..

Have you looked at libtar? From the fink package info: <blockquote> libtar-1.2-1: Tar file manipulation API libtar is a C library for manipulating POSIX tar files. It handles adding and extracting files to/from a tar archive. libtar offers the following features: * Flexible API - you can manipulate individual files or just extract a whole archive at once. * Allows user-specified read() and write() functions, such as zlib's gzread() and gzwrite(). * Supports both POSIX 1003.1-1990 and GNU tar file formats. </blockquote> Not c++ per se, but you can link to c pretty easily...

How to parse a tar file in C++

Tags:

c++

tar

What I want to do is download a .tar file with multiple directories with 2 files each. The problem is I can't find a way to read the tar file without actually extracting the files (using tar).

The perfect solution would be something like:

#include <easytar>

Tarfile tar("somefile.tar");
std::string currentFile, currentFileName;
for(int i=0; i<tar.size(); i++){
  file = tar.getFileText(i);
  currentFileName = tar.getFileName(i);
  // do stuff with it
}

I'm probably going to have to write this myself, but any ideas would be appreciated..

828

asked Mar 24 '10 02:03

Brendan Long

2 Answers

I figured this out myself after a bit of work. The tar file spec actually tells you everything you need to know.

First off, every file starts with a 512 byte header, so you can represent it with a char[512] or a char* pointing at somewhere in your larger char array (if you have the entire file loaded into one array for example).

The header looks like this:

location  size  field
0         100   File name
100       8     File mode
108       8     Owner's numeric user ID
116       8     Group's numeric user ID
124       12    File size in bytes
136       12    Last modification time in numeric Unix time format
148       8     Checksum for header block
156       1     Link indicator (file type)
157       100   Name of linked file

So if you want the file name, you grab it right here with string filename(buffer[0], 100);. The file name is null padded, so you could do a check to make sure there's at least one null and then leave off the size if you want to save space.

Now we want to know if it's a file or a folder. The "link indicator" field has this information, so:

// Note that we're comparing to ascii numbers, not ints
switch(buffer[156]){
    case '0': // intentionally dropping through
    case '\0':
        // normal file
        break;
    case '1':
        // hard link
        break;
    case '2':
        // symbolic link
        break;
    case '3':
        // device file/special file
        break;
    case '4':
        // block device
        break;
    case '5':
        // directory
        break;
    case '6':
        // named pipe
        break;
}

At this point, we already have all of the information we need about directories, but we need one more thing from normal files: the actual file contents.

The length of the file can be stored in two different ways, either as a 0-or-space-padded null-terminated octal string, or "a base-256 coding that is indicated by setting the high-order bit of the leftmost byte of a numeric field".

Numeric values are encoded in octal numbers using ASCII digits, with leading zeroes. For historical reasons, a final NUL or space character should be used. Thus although there are 12 bytes reserved for storing the file size, only 11 octal digits can be stored. This gives a maximum file size of 8 gigabytes on archived files. To overcome this limitation, star in 2001 introduced a base-256 coding that is indicated by setting the high-order bit of the leftmost byte of a numeric field. GNU-tar and BSD-tar followed this idea. Additionally, versions of tar from before the first POSIX standard from 1988 pad the values with spaces instead of zeroes.

Here's how you would read the octal format, but I haven't written code for the base-256 version:

// in one function
int size_of_file = octal_string_to_int(&buffer[124], 11);

// elsewhere
int octal_string_to_int(char *current_char, unsigned int size){
    unsigned int output = 0;
    while(size > 0){
        output = output * 8 + *current_char - '0';
        current_char++;
        size--;
    }
    return output;
}

Ok, so now we have everything except the actual file contents. All we have to do is grab the next size bytes of data from the tar file and we'll have our file contents:

// Get to the next block after the header ends
location += 512;
file_contents = new char[size];
memcpy(file_contents, &buffer[location], size);
// Go to the next block by rounding up to 512
// This isn't necessarily the most efficient way to do this,
// but it's the most obvious.
location += (int)ceil(size / 512.0)

answered Sep 23 '22 06:09

Brendan Long

Have you looked at libtar?

From the fink package info:

libtar-1.2-1: Tar file manipulation API libtar is a C library for manipulating POSIX tar files. It handles adding and extracting files to/from a tar archive. libtar offers the following features:
* Flexible API - you can manipulate individual files or just extract a whole archive at once.
* Allows user-specified read() and write() functions, such as zlib's gzread() and gzwrite().
* Supports both POSIX 1003.1-1990 and GNU tar file formats.

Not c++ per se, but you can link to c pretty easily...

answered Sep 26 '22 06:09

dmckee --- ex-moderator kitten

Related questions
                            
                                Why does malloc() fail when there is enough memory?
                            
                                How to check if a Graph is a Planar Graph or not?
                            
                                Why don't some languages allow declaration of pointers?
                            
                                Overloading on R-value references and code duplication
                            
                                Computer vision, C++ OR Java
                            
                                address of this
                            
                                Does a pointer point to the LSB or MSB?
                            
                                How to draw line in OpenGL?
                            
                                Running C++ code outside of functions scope
                            
                                What is the best way to attach a debugger to a process in VC++ at just the right point in time?
                            
                                How can I remove the VS warning C4091: 'typedef ' : ignored on left of 'SPREADSHEET' when no variable is declared
                            
                                Performance difference between C++ and C# for mathematics
                            
                                What C++ library do I need to get this program to compile
                            
                                Why do you need pointers in this situation? [duplicate]
                            
                                Is this slower because of two lookups instead of one?
                            
                                The standard way to get sizeof(promoted(x)) [duplicate]
                            
                                How can I modify values in a map using range based for loop?
                            
                                C++ format macro / inline ostringstream
                            
                                C++: Rotating a vector around a certain point
                            
                                "No appropriate default constructor available" error in Visual C++

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to parse a tar file in C++

Tags:

c++

tar

Brendan Long

People also ask

2 Answers

Brendan Long

dmckee --- ex-moderator kitten

Recent Activity

Donate For Us