Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Different MD5 hash for same string (CryptoJS.MD5)

Using CryptoJS I am calculating MD5 of the string at the bottom of this post, and send it to Amazon web services, however the MD5 value that I calculate, and the amazon calculates differs.

So I did some online tests, and realized that MD5 calculation differs in some md5 calculating websites too. For example, md5hashgenerator calculates same value with me, and onlinemd5 calculates same value as amazon.

What I need is to get the same MD5 value of Amazon using CryptoJS

- CryptoJS.MD5: ec20007986ee9e1a5152c35d07e87fcc

- Amazon Scratchpad MD5: ee288aa4858481d7b1d7422c6fc4b3af

- md5hashgenerator.com: ec20007986ee9e1a5152c35d07e87fcc

- onlinemd5.com: ee288aa4858481d7b1d7422c6fc4b3af


String to calculate MD5:

<?xml version="1.0" encoding="iso-8859-1"?>
<AmazonEnvelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:noNamespaceSchemaLocation="amzn-envelope.xsd">
  <Header>
    <DocumentVersion>1.01</DocumentVersion>
    <MerchantIdentifier>M_EXAMPLE_123456</MerchantIdentifier>
  </Header>
  <MessageType>Product</MessageType>
  <PurgeAndReplace>false</PurgeAndReplace>
  <Message>
    <MessageID>1</MessageID>
    <OperationType>Update</OperationType>
    <Product>
      <SKU>56789</SKU>
      <StandardProductID>
        <Type>ASIN</Type>
        <Value>B0EXAMPLEG</Value>
      </StandardProductID>
      <ProductTaxCode>A_GEN_NOTAX</ProductTaxCode>
      <DescriptionData>
        <Title>Example Product Title</Title>
        <Brand>Example Product Brand</Brand>
        <Description>This is an example product description.</Description>
        <BulletPoint>Example Bullet Point 1</BulletPoint>
        <BulletPoint>Example Bullet Point 2</BulletPoint>
        <MSRP currency="USD">25.19</MSRP>
        <Manufacturer>Example Product Manufacturer</Manufacturer>
        <ItemType>example-item-type</ItemType>
      </DescriptionData>
      <ProductData>
        <Health>
          <ProductType>
            <HealthMisc>
              <Ingredients>Example Ingredients</Ingredients>
              <Directions>Example Directions</Directions>
            </HealthMisc>
          </ProductType>
        </Health>
      </ProductData>
    </Product>
  </Message>
</AmazonEnvelope>

Edit: After some test, I realised that the difference is caused because of the "newline" character. So the question is why newline is treated differently in those tools and how can I achieve the same results with Amazon using CryptoJS?

like image 307
HOY Avatar asked Jun 03 '20 15:06

HOY


People also ask

Are all MD5 hashes the same?

Yes, MD5 checksums are platform agnostic and will produce the same value every time on the same file/string/whatever.

How do I combine MD5?

You can't combine multiple MD5 digests in a way that the result will be equal to the MD5 of the entire input. MD5 does some padding and uses the number of proccessed bytes in the final stage which makes the original engine state unrecoverable from the final digest value.

How do you generate MD5 hash of a string?

An MD5 hash is created by taking a string of an any length and encoding it into a 128-bit fingerprint. Encoding the same string using the MD5 algorithm will always result in the same 128-bit hash output.

Can MD5 hash be changed?

If you change just one bit in a file, no matter how large the file is, the hash output will be completely and irreversibly changed. Nothing less than an exact copy will pass the MD5 test.


2 Answers

md5 (and other hashing functions like sha*, Murmur...) work with binary data. Hence, how you convert your text to binary will change the resulting hash. Obviously, the same text in UTF-8, UTF-16 or UTF-32 will have different hash.

The case of newline is a bit more tricky. In the ancient time, people had to do two keystroke on their typewriter to get a newline: Carriage Return, that put the printhead back at the beginning of the line while staying at the same vertical position, and Line Feed that moved the printhead down one line, while staying at the same horizontal position.

In early computer days, people mimicked this, and US-ASCII has two codepoints regarding line termination: CR (0x0D) and LF (0x0A). A newline was made with the famous CRLF sequence. The HTTP/1.0 standard for example required CRLF as a separator between headers (I didn't check HTTP/1.1 nor HTTP/2).

Then people started thinking two characters for a single concept was a waste, and Unix systems started to use only LF, while Mac systems (before OS X) used only CR (and Windows, well, thought you had enough memory for all those superfluous bytes).

So I stored your text in a file called "tmp" on my Ubuntu computer, using LF as the line separator, and:

$ md5sum tmp 
ee288aa4858481d7b1d7422c6fc4b3af  tmp
$ unix2dos tmp 
unix2dos: converting file tmp to DOS format...
$ md5sum tmp 
ec20007986ee9e1a5152c35d07e87fcc  tmp

Voilà!!

(unix2dos is a tool for converting LF to CRLF).

like image 197
Étienne Miret Avatar answered Sep 22 '22 19:09

Étienne Miret


The returned hash differs due to the newline character. You can trim and remove all white spaces in the string before applying the md5 hash. This way the result should be the same. Here is an implementation with CryptoJS:

const CryptoJS = require("crypto-js");

let string = "xmlString".replace(/\s+/g, '');
let hash = CryptoJS.MD5(string).toString();
console.log(hash);

I don't know why newline is treated differently in the tools you used but after removing white spaces got the same result.

like image 34
guizo Avatar answered Sep 19 '22 19:09

guizo