I have the following Python code snippet:
import zlib
def object_read(repo, sha):
path = repo + "/objects/" + sha[0:2] + "/" + sha[2:]
with open (path, "rb") as f:
raw = zlib.decompress(f.read())
return len(raw)
print(object-read(".git", "1372c654fd9bd85617f0f8b949f1405b0bd71ee9"))
and one of its P6 counterparts:
#!/usr/bin/env perl6
use Compress::Zlib;
sub object-read( $repo, $sha ) {
my $path = $repo ~ "/objects/" ~ $sha.substr(0, 2) ~ "/" ~
$sha.substr(2, *);
given slurp($path, :bin) -> $f {
my $raw = uncompress($f).decode('utf8-c8'); # Probable error here?!
return $raw.chars;
}
}
put object-read(".git", "1372c654fd9bd85617f0f8b949f1405b0bd71ee9")
However, when I run them, they give me back off-by-one results:
$ python bin.py
75
$ perl6 bin.p6
74
@melpomene has hit the spot. You are not decoding in Python, and the number of bytes in the raw file might be a bit more; insert
say uncompress($f).elems;
before decoding to $raw
and you will see that it includes (in the file and in my system) 2 bytes more. Rendering via utf8-c8 might merge a couple of bytes into a single codepoint (or more). In general, the number of codepoints will be less than the number of bytes in an IO stream.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With