Using the archive/tar
package in Go, it doesn't seem possible to access the number of hardlinks a file has. However, I remember reading somewhere that tar'ing a directory or file can preserve the hardlinks.
Is there some package in Go that can help me do this?
tar
does preserve the hardlinks.
Here's a sample directory with three hard-linked files and one file with a single link:
foo% vdir .
total 16
-rw-r--r-- 3 kostix kostix 5 Jul 12 19:37 bar.txt
-rw-r--r-- 3 kostix kostix 5 Jul 12 19:37 foo.txt
-rw-r--r-- 3 kostix kostix 5 Jul 12 19:37 test.txt
-rw-r--r-- 1 kostix kostix 9 Jul 12 19:49 xyzzy.txt
Now we archive it using GNU tar
and verify it indeed added the links
(because we didn't pass it the --hard-dereferece
command-line option):
foo% tar -cf ../foo.tar .
foo% tar -tvf ../foo.tar
drwxr-xr-x kostix/kostix 0 2016-07-12 19:49 ./
-rw-r--r-- kostix/kostix 9 2016-07-12 19:49 ./xyzzy.txt
-rw-r--r-- kostix/kostix 5 2016-07-12 19:37 ./bar.txt
hrw-r--r-- kostix/kostix 0 2016-07-12 19:37 ./test.txt link to ./bar.txt
hrw-r--r-- kostix/kostix 0 2016-07-12 19:37 ./foo.txt link to ./bar.txt
The documentation of archive/tar
refers to a bunch of documents defining the standard on the tar
archive (and unfortunately, there's no a single standard: for instance, GNU tar does not support POSIX extended attributes, while BSD tar (which relies on libarchive
) does, and so does pax
).
To cite its bit on the hardlinks:
LNKTYPE
This flag represents a file linked to another file, of any type, previously archived. Such files are identified in Unix by each file having the same device and inode number. The linked-to name is specified in the linkname field with a trailing null.
So, a hadrlink is an enrty of a special type ('1') which refers to some preceding (already archived) file by its name.
So let's create a playground example.
We base64-encode our archive:
foo% base64 <../foo.tar | xclip -selection clipboard
…and write the code. The archive contains a single directory, one file (type '0') another file (type '0') followed by two hardlinks (type '1') to it.
The output from the playground example:
Archive entry '5': ./
Archive entry '0': ./xyzzy.txt
Archive entry '0': ./bar.txt
Archive entry '1': ./test.txt link to ./bar.txt
Archive entry '1': ./foo.txt link to ./bar.txt
So your link-counting code should:
Scan the entire archive record-by-record.
Remember any regular file (type archive/tar.TypeReg
or type archive/tar.TypeRegA
) already processed, and have a counter associated with it, which starts at 1.
Well, in reality, you'd better be exclusive and record entries of all types except symbolic links and directories — because tar archives can contain nodes for character and block devices, and FIFOs (named pipes).
When you encounter a hard link (type archive/tar.TypeReg
),
Linkname
field of its header.As the OP actually wanted to know how to manage hardlinks on the source filesystem, here's the update.
The chief idea is that on a filesystem with POSIX semantics:
A directory entry designating a file actually points to a special filesystem metadata block called "inode". The inode contains the number of directory entries pointing to it.
Creating a hardlink is actually just:
ln
s terms.Hence any file is uniquely identified by two integer numbers: the "device number" identifying the physical device hosting the filesystem on which the file is located, and inode number identifying the file's data.
It follows, that if two files have the same (device, inode) pairs, they represent the same content. Or, if we put it differently, one is a hardlink to the other.
So, adding files to a tar
archive while preserving the hardlinks works this way:
Having added a file, save its (device, inode) pair to some lookup table.
When adding another file, figure out its (device, inode) pair and look it up in that table.
If a matching entry is found, the file's data was already streamed, and we should add a hardlink.
Otherwise, behave as in step (1).
So here's the code:
package main
import (
"archive/tar"
"io"
"log"
"os"
"path/filepath"
"syscall"
)
type devino struct {
Dev uint64
Ino uint64
}
func main() {
log.SetFlags(0)
if len(os.Args) != 2 {
log.Fatalf("Usage: %s DIR\n", os.Args[0])
}
seen := make(map[devino]string)
tw := tar.NewWriter(os.Stdout)
err := filepath.Walk(os.Args[1],
func(fn string, fi os.FileInfo, we error) (err error) {
if we != nil {
log.Fatal("Error processing directory", we)
}
hdr, err := tar.FileInfoHeader(fi, "")
if err != nil {
return
}
if fi.IsDir() {
err = tw.WriteHeader(hdr)
return
}
st := fi.Sys().(*syscall.Stat_t)
di := devino{
Dev: st.Dev,
Ino: st.Ino,
}
orig, ok := seen[di]
if ok {
hdr.Typeflag = tar.TypeLink
hdr.Linkname = orig
hdr.Size = 0
err = tw.WriteHeader(hdr)
return
}
fd, err := os.Open(fn)
if err != nil {
return
}
err = tw.WriteHeader(hdr)
if err != nil {
return
}
_, err = io.Copy(tw, fd)
fd.Close() // Ignoring error for a file opened R/O
if err == nil {
seen[di] = fi.Name()
}
return err
})
if err != nil {
log.Fatal(err)
}
err = tw.Close()
if err != nil {
log.Fatal(err)
}
return
}
Note that it's quite inadequate:
It improperly deals with file and directory names.
It does not attempt to properly work with symlinks and FIFOs, and skip Unix-domain sockets etc.
It assumes it works in a POSIX environment.
On non-POSIX systems, the Sys()
method called on a value of type
os.FileInfo
might return something else rather than the POSIX'y
syscall.Stat_t
.
Say, on Windows, there are multiple filesystems hosted by different "disks" or "drives". I have no idea how Go handles that. Maybe the "device number" had to be emulated somehow for this case.
On the other hand, it shows how to handle hardlinks:
You might also want to use another approach to maintain the lookup table: if most of your files are expected to be located on the same physical filesystem, each entry wastes an uint64
for the device number of each entry. So a hierarchy of maps might be a sensible thing to do: the first maps device numbers to another map which maps inode numbers to file names.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With