Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I unzip a base64 encoded string in R?

Goal

Goal is to make configuration and code readable after it has been exported from an application that stores this data in base64 encoded and gzip-ped format.

Test in Linux-shell

Example of a string with code

"H4sIAAAAAAAAAIWSS0vEMBSF9/0VIYvubHUnNGlhfIDCwOCMuCyhTeOVTBLzGPTfmzY60yKju+Tc8N1z7o2RQYBqmTESuGthaDuHXJpWTRknzsZfowK0DrSi+Ki4x4qrTPShB8fPu/uIaN3VGVsGB4s49BcnrDKGjsJlwaF5P0sMtxY/swLadBeN/6jda9eBjrxfwrytQvcMjLgI3zLI999FJEuYSGmHpNdp9Gk7xWyQXkilRbL2NXnGdS18twuTvQfsqJkqHU6x0n7KlY5MLX2UjYOyxZqacBFIeDZyxdGettusYiwn+h7X/QadBnadY7oNVaGDS8eoXciZMAyTlckNxh+Vyid//4Qv+y3JeLwIAAA=="

Decoded and gunzip-ped in a Linux shell with the command:

echo $1 | base64 -d | gunzip -c

Which results in:

plugin_applies_if_config<split>plugin_config=<?xml version="1.0" encoding="UTF-8"?>
<BusinessRule>
  <BusinessPlugin BusinessRulePluginID="JavaScriptBusinessConditionWithBinds">
    <Parameters>
      <Parameter ID="Binds" Type="java.lang.String">&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
&lt;BindMap/&gt;
</Parameter>
      <Parameter ID="ErrorMessages" Type="java.lang.String"></Parameter>
      <Parameter ID="JavaScript" Type="java.lang.String">return false;</Parameter>
    </Parameters>
  </BusinessPlugin>
</BusinessRule>
<split>

Task accomplished. ...almost.

Turn into R-script

As i have several hundred of these strings, i want to perform similar commands as in the Linux shell in a script. And because i only know some R, i tried using R. I succesfully extracted the strings from the XML-document that was exported from the application and turned these in a data frame with columns id, name and code.

The following is a simplified example where i try to reproduce the Linux commands step by step.

encoded = "H4sIAAAAAAAAAIWSS0vEMBSF9/0VIYvubHUnNGlhfIDCwOCMuCyhTeOVTBLzGPTfmzY60yKju+Tc8N1z7o2RQYBqmTESuGthaDutBhDERcHXJpWTRknzsZfowK0DrSi+Ki4x4qrTPShB8fPu/uIaN3VGVsGB4s49BcnrDKGjsJlwaF5P0sMtxY/swLadBeN/6jda9eBjrxfwrytQvcMjLgI3zLI999FJEuYSGmHpNdp9Gk7xWyQXkilRbL2NXnGdS18twuTvQfsqJkqHU6x0n7KlY5MLX2UjYOyxZqacBFIeDZyxdGettusYiwn+h7X/QadBnadY7oNVaGDS8eoXciZMAyTlckNxh+Vyid//4Qv+y3JeLwIAAA=="

decoded = base64enc::base64decode(what=encoded)
# decoded = openssl::base64_decode(encoded)
# decoded = jsonlite::base64_dec(encoded)
# 3 times the same result

str(decoded)
# an array of raw-types. Maybe i need to convert to a string?
paste(decoded, collapse = "")

Doesn't look like the base64 decoded data in the Linux shell, but let's try to unzip...

decompressed <- 
  tryCatch({  
    memDecompress(from = paste(decoded, collapse = ""),
                  type = "gzip",
                  asChar = TRUE)
  },
  error = function(cond) {
    message(cond)
    return(NA)
  })
# fails with "internal error -3 in memDecompress(2)" 
(decompressed)

Clearly the input for 'gzip' is not what it expects. It must be some sort of binary string.

But how to get there? What am i doing wrong? Thanks for your advise!

like image 526
Uden VH Avatar asked Apr 09 '19 20:04

Uden VH


People also ask

How do I decode a Base64 string?

To decode with base64 you need to use the --decode flag. With encoded string, you can pipe an echo command into base64 as you did to encode it. Using the example encoding shown above, let's decode it back into its original form. Provided your encoding was not corrupted the output should be your original string.

Can we decrypt Base64?

Base64 is an encoding, the strings you've posted are encoded. You can DECODE the base64 values into bytes (so just a sequence of bits). And from there, you need to know what these bytes represent and what original encoding they were represented in, if you wish to convert them again to a legible format.

Can you Base64 a zip file?

you can use the expression Base64. encode(GZip. decompress($content)) in a mapper to decompress your zipped file and encode it into Base64 format.


1 Answers

The memDecompress function was improved in R version 4.0.0 to work properly. You should now be able to do

memDecompress(base64enc::base64decode(what=encoded), "gzip", asChar=TRUE)

Previous versions were troublesome because they ignored standard headers. Here's a word around for older versions of R. Basically we create a raw stream of bytes and then use gzcon to decompress them

con <- rawConnection(base64enc::base64decode(what=encoded))
readLines(gzcon(con))
close(con)

You will get a warning that there is an "incomplete final line" but that's just because it looks like there wasn't a new line at the end of the file. The data seems fine otherwise.

like image 93
MrFlick Avatar answered Oct 17 '22 23:10

MrFlick