Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert hex string to string of bytes

Tags:

c

I need to convert a (potentially very long) string like char * s = "2f0a3f" into the actual bytes it represents, when decoded from the hex representation. Currently I'm doing this, but it feels clunky and wrong.

  size_t hexlength = strlen(s);
  size_t binlength = hexlength / 2;

  unsigned char * buffer = malloc(binlength);
  long i = 0;
  char a, b;

  for (; i < hexlength; i += 2) {
    a = s[i + 0]; b = s[i + 1];
    buffer[i / 2] =
      ((a < '9' ? a - '0' : a - 'a' + 10) << 4) + (b < '9' ? b - '0' : b - 'a' + 10);
  }

Two things strike me as ugly about this:

  1. The way I'm dividing by two each time I push into the buffer
  2. The conditional logic to figure out the decimal value of the hex digits

Is there a better way? Preferably not using something I'd have to add a dependency on (since I want to ship this code with minimal cross-platform issues). My bitwise math is awful ;)

NOTE: The data has been pre-validated to all be lowercase and to be a correct string of hex pairs.

like image 755
d11wtq Avatar asked Sep 21 '12 17:09

d11wtq


2 Answers

/* allocate the buffer */
char * buffer = malloc((strlen(s) / 2) + 1);

char *h = s; /* this will walk through the hex string */
char *b = buffer; /* point inside the buffer */

/* offset into this string is the numeric value */
char xlate[] = "0123456789abcdef";

for ( ; *h; h += 2, ++b) /* go by twos through the hex string */
   *b = ((strchr(xlate, *h) - xlate) * 16) /* multiply leading digit by 16 */
       + ((strchr(xlate, *(h+1)) - xlate));

Edited to add

In 80x86 assembly lanugage, the heart of strchr() is basically one instruction - it doesn't loop.

Also: this does no bounds checking, won't work with Unicode console input, and will crash if passed an invalid character.

Also: thanks to those who pointed out some serious typos.

like image 64
egrunin Avatar answered Oct 18 '22 01:10

egrunin


Not that it'd make much difference, but I'd go with a multiplication over a division. Also it's worth splitting out the digit code, as you might want to port it to a platform where a-f are not adjacent in the character set (only joking!)

  inline int digittoint(char d) {
    return ((d) <= '9' ? (d) - '0' : (d) - 'a' + 10);
  }
  #define digittoint(d) ((d) <= '9' ? (d) - '0' : (d) - 'a' + 10)

  size_t hexlength = strlen(s);
  size_t binlength = hexlength / 2;

  unsigned char * buffer = malloc(binlength);
  long i = 0;
  char a, b;

  for (; i < binlength; ++i) {
    a = s[2 * i + 0]; b = s[2 * i + 1];
    buffer[i] = (digittoint(a) << 4) | digittoint(b);
  }

I've fixed a bug in your digit-to-int implementation, and replaced the + with bitwise or on the grounds that it better expresses your intent.

You can then experiment to find the best implementation of digittoint - conditional arithmetic as above, strspn, or a lookup table.

Here's a possible branchless implementation that - bonus! - works on uppercase letters:

inline int digittoint(char d) {
    return (d & 0x1f) + ((d >> 6) * 0x19) - 0x10;
}
like image 22
ecatmur Avatar answered Oct 18 '22 03:10

ecatmur