Assuming I'm adding data to IPFS like this:
$ echo Hello World | ipfs add
This will give me QmWATWQ7fVPP2EFGu71UkfnqhYXDYH566qy47CnJDgvs8u
- a CID which is a Base58 encoded Multihash.
Converting it to Base16, tells me that the hash digest for what IPFS has added is a SHA2-256 hash:
12 - 20 - 74410577111096cd817a3faed78630f2245636beded412d3b212a2e09ba593ca
<hash-type> - <hash-length> - <hash-digest>
I know that IPFS doesn't just hash the data, but actually serializes it as Unixfs protobuf first and then puts that in a dag.
I'd like to demystify, how to get to the 74410577111096cd817a3faed78630f2245636beded412d3b212a2e09ba593ca
but I'm not really sure how to get hold of the created dag that holds the Unixfs protobuf with the data.
For example I can write the serialized raw data to disk and inspect it with a protobuf decoder:
$ ipfs block get QmWATWQ7fVPP2EFGu71UkfnqhYXDYH566qy47CnJDgvs8u > /tmp/block.raw
$ protoc --decode_raw < /tmp/block.raw
This will give me the serialized data in a readable format:
1 {
1: 2
2: "Hello World\n"
3: 12
}
However, piping that through SHA-256 still gives me a different hash, which makes sense because IPFS puts the protobuf in a dag and multihashes that one.
$ protoc --decode_raw < /tmp/block.raw | shasum -a 256
So I decided to figure out how to get hold of that dag node, hash it myself to get to the hash I'm looking for.
I was hoping using ipfs dag get QmWATWQ7fVPP2EFGu71UkfnqhYXDYH566qy47CnJDgvs8u
will give me a multihash that can then be decoded, but it turns out it returns some other data hash that I don't know how to inspect:
$ ipfs dag get QmWATWQ7fVPP2EFGu71UkfnqhYXDYH566qy47CnJDgvs8u
$ {"data":"CAISDEhlbGxvIFdvcmxkChgM","links":[]}
Any ideas on how to decode data
from here?
UPDATE
data
is a Base64 representation of the original data: https://github.com/ipfs/go-ipfs/issues/4115
If you want to see what the hash of a file would be, without actually uploading it to IPFS, you can run ipfs add --only-hash , or ipfs add -n for short.
Multihash is a protocol for differentiating outputs from various well-established cryptographic hash functions, addressing size + encoding considerations. It is useful to write applications that future-proof their use of hashes, and allow multiple hash functions to coexist.
IPFS currently uses SHA-256 Open external link by default, which produces a 256 bit (32 byte) output, and that output is encoded with Base58 Open external link .
The hash you're looking for is the hash of the output of ipfs block get QmWATWQ7fVPP2EFGu71UkfnqhYXDYH566qy47CnJDgvs8u
. IPFS hashes the encoded value.
Instead of running:
protoc --decode_raw < /tmp/block.raw | shasum -a 256
Just run:
shasum -a 256 < /tmp/block.raw
but it turns out it returns some other data hash that I don't know how to inspect
That's because we currently use a protobuf inside of a protobuf. The outer protobuf has the structure {Data: DATA, Links: [{Name: ..., Size: ..., Hash: ...}]}
.
In:
1 {
1: 2
2: "Hello World\n"
3: 12
}
The 1 { ... }
part is the Data field of the outer protobuf. However, protoc --decode_raw *recursively* decodes this object so it decodes the
Data` field to:
For context, the relevant protobuf definitions are:
Outer:
// An IPFS MerkleDAG Link
message PBLink {
// multihash of the target object
optional bytes Hash = 1;
// utf string name. should be unique per object
optional string Name = 2;
// cumulative size of target object
optional uint64 Tsize = 3;
}
// An IPFS MerkleDAG Node
message PBNode {
// refs to other objects
repeated PBLink Links = 2;
// opaque user data
optional bytes Data = 1;
}
Inner:
message Data {
enum DataType {
Raw = 0;
Directory = 1;
File = 2;
Metadata = 3;
Symlink = 4;
HAMTShard = 5;
}
required DataType Type = 1;
optional bytes Data = 2;
optional uint64 filesize = 3;
repeated uint64 blocksizes = 4;
optional uint64 hashType = 5;
optional uint64 fanout = 6;
}
message Metadata {
optional string MimeType = 1;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With