Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Decoding base64 while using GitHub API to Download a File

I am using the GitHub API to download a file from GitHub. I have been able to successfully authenticate as well as get a response from github, and see a base64 encoded string representing the file contents.

Unfortunately, I get an unusual error (string length is not a multiple of 4) when decoding the base64 string.

The HTTP request is illustrated below:

GET /repos/:owner/:repo/contents/:path

The (partial) response is illustrated below:

{
    "name":....,
    "download_url":...",
    "type":"file",
    "content":"ewogICAgInN3YWdnZXIiOiAiM...
}

The issue I am encountering is that the length of the string is 15263 bytes, and I get an error in decoding the string (string length is not a multiple of 4). I am using node.js and the 'base64-js' npm module to decode the string. Code to execute the decoding is illustrated below:

var base64 = require('base64-js');
var contents = base64.toByteArray(fileContent);

The decoding causes an exception:

Error: Invalid string. Length must be a multiple of 4
    at placeHoldersCount (.../node_modules/base64-js/index.js:23:11)
    at Object.toByteArray (...node_modules/base64-js/index.js:42:18)
    :
    :

I would think that the GitHub API is sending me the correct data, so I figure that is not the issue.

Am I performing the decoding improperly or is there another problem I am overlooking?

Any help is appreciated.

like image 303
Eric Broda Avatar asked Nov 23 '16 15:11

Eric Broda


2 Answers

I experimented a bit and found a solution by using a different base64 decoding library as follows:

var base64 = require('js-base64').Base64;
var contents = base64.decode(res.content);

I am not sure if it is mandatory to have an encoded string length divisible by 4 (clearly my 15263 character length string is not divisible by 4) but the alternate library decoded the string properly.

A second solution which I also found to work is specific to how to use the GitHub API. By adding the following to the GitHub API call header, I was also able to get the decoded file contents:

'accept': 'application/vnd.github.VERSION.raw'
like image 56
Eric Broda Avatar answered Nov 09 '22 20:11

Eric Broda


After much experimenting, I think I nailed down the difference between the working and broken base64 decoding.

It appears GitHub Base-64 encodes with:

  • UTF-8 charset
  • Base 64 MIME encoder (RFC2045)

As opposed to a "basic" (RFC4648) Base64 encoder. Several languages seem to default to the basic encoder (including Java, which I was using). When I switched to a MIME encoder, I got the full contents of the file un-garbled. This would explain why switching libraries in some cases fixed the issue.

I will note the contents field contained newline characters - decoders are supposed to ignore them, but not all do, so if you still get errors, you may need to try removing them.

The media-type header will do the job better, however in my case I am trying to use the API via a GitHub App - at time of writing, GitHub requires a specific media type be used when doing that, and it returns the JSON response.

like image 28
romeara Avatar answered Nov 09 '22 20:11

romeara