Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to escape string for generated C++?

I am writing python script which is generating C++ code based on the data.

I have python variable string which contains a string which can be composed of characters like " or newlines.

What is the best way to escape this string for code generation?

like image 388
Krizz Avatar asked Feb 18 '13 20:02

Krizz


Video Answer


3 Answers

The way I use is based on the observation that C++ strings basically obey the same rules regarding charactes and escaping as Javascript/JSON string.

Python since version 2.6 has a built-in JSON library which can serialize Python data into JSON. Therefore, the code is, assuming we don't need enclosing quotes, just as follows:

import json
string_for_printing = json.dumps(original_string).strip('"')
like image 57
Krizz Avatar answered Oct 14 '22 13:10

Krizz


I use this code, so far without problems:

def string(s, encoding='ascii'):
   if isinstance(s, unicode):
      s = s.encode(encoding)
   result = ''
   for c in s:
      if not (32 <= ord(c) < 127) or c in ('\\', '"'):
         result += '\\%03o' % ord(c)
      else:
         result += c
   return '"' + result + '"'

It uses octal escapes to avoid all potentially problematic characters.

like image 44
sth Avatar answered Oct 14 '22 13:10

sth


We can do better using specifics of C found here (https://www.gnu.org/software/gnu-c-manual/gnu-c-manual.html#Character-Constants) and Python's built-in printable function:

def c_escape():
  import string
  mp = []
  for c in range(256):
    if c == ord('\\'): mp.append("\\\\")
    elif c == ord('?'): mp.append("\\?")
    elif c == ord('\''): mp.append("\\'")
    elif c == ord('"'): mp.append("\\\"")
    elif c == ord('\a'): mp.append("\\a")
    elif c == ord('\b'): mp.append("\\b")
    elif c == ord('\f'): mp.append("\\f")
    elif c == ord('\n'): mp.append("\\n")
    elif c == ord('\r'): mp.append("\\r")
    elif c == ord('\t'): mp.append("\\t")
    elif c == ord('\v'): mp.append("\\v")
    elif chr(c) in string.printable: mp.append(chr(c))
    else:
      x = "\\%03o" % c
      mp.append(x if c>=64 else (("\\%%0%do" % (1+c>=8)) % c, x))
  return mp

This has the advantage of now being a mapping from ordinal value of a character ord(c) to the exact string. Using += for strings is slow and bad practice, so this allows for "".join(...) which is far more performant in Python. Not to mention, indexing a list/table is much faster than doing computations on characters each time through. Also do not waste octal characters either by checking if less characters are needed. However, to use this, you must verify the next character is not a 0 through 7 otherwise you must use the 3 digit octal format.

The table looks like:

[('\\0', '\\000'), ('\\1', '\\001'), ('\\2', '\\002'), ('\\3', '\\003'), ('\\4', '\\004'), ('\\5', '\\005'), ('\\6', '\\006'), '\\a', '\\b', '\\t', '\\n', '\\v', '\\f', '\\r', ('\\16', '\\016'), ('\\17', '\\017'), ('\\20', '\\020'), ('\\21', '\\021'), ('\\22', '\\022'), ('\\23', '\\023'), ('\\24', '\\024'), ('\\25', '\\025'), ('\\26', '\\026'), ('\\27', '\\027'), ('\\30', '\\030'), ('\\31', '\\031'), ('\\32', '\\032'), ('\\33', '\\033'), ('\\34', '\\034'), ('\\35', '\\035'), ('\\36', '\\036'), ('\\37', '\\037'), ' ', '!', '\\"', '#', '$', '%', '&', "\\'", '(', ')', '*', '+', ',', '-', '.', '/', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', ':', ';', '<', '=', '>', '\\?', '@', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '[', '\\\\', ']', '^', '_', '`', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '{', '|', '}', '~', '\\177', '\\200', '\\201', '\\202', '\\203', '\\204', '\\205', '\\206', '\\207', '\\210', '\\211', '\\212', '\\213', '\\214', '\\215', '\\216', '\\217', '\\220', '\\221', '\\222', '\\223', '\\224', '\\225', '\\226', '\\227', '\\230', '\\231', '\\232', '\\233', '\\234', '\\235', '\\236', '\\237', '\\240', '\\241', '\\242', '\\243', '\\244', '\\245', '\\246', '\\247', '\\250', '\\251', '\\252', '\\253', '\\254', '\\255', '\\256', '\\257', '\\260', '\\261', '\\262', '\\263', '\\264', '\\265', '\\266', '\\267', '\\270', '\\271', '\\272', '\\273', '\\274', '\\275', '\\276', '\\277', '\\300', '\\301', '\\302', '\\303', '\\304', '\\305', '\\306', '\\307', '\\310', '\\311', '\\312', '\\313', '\\314', '\\315', '\\316', '\\317', '\\320', '\\321', '\\322', '\\323', '\\324', '\\325', '\\326', '\\327', '\\330', '\\331', '\\332', '\\333', '\\334', '\\335', '\\336', '\\337', '\\340', '\\341', '\\342', '\\343', '\\344', '\\345', '\\346', '\\347', '\\350', '\\351', '\\352', '\\353', '\\354', '\\355', '\\356', '\\357', '\\360', '\\361', '\\362', '\\363', '\\364', '\\365', '\\366', '\\367', '\\370', '\\371', '\\372', '\\373', '\\374', '\\375', '\\376', '\\377']

Example usage encoding some 4-byte integers as C-strings in little-endian byte order with new lines inserted every 50 characters: v

mp = c_escape()
vals = [30,50,100]
bytearr = [z for i, x in enumerate(vals) for z in x.to_bytes(4, 'little', signed=x<0)]
"".join(mp[x] if not type(mp[x]) is tuple else mp[x][1 if not i == len(bytearr)-1 and bytearr[i+1] in list(range(ord('0'), ord('7')+1)) else 0] + ("\"\n\t\"" if (i % 50) == 49 else "") for i, x in enumerate(bytearr))

#output: '\\36\\0\\0\\0002\\0\\0\\0d\\0\\0\\0'
like image 28
Gregory Morse Avatar answered Oct 14 '22 14:10

Gregory Morse