How to escape string for generated C++?

Question

I am writing python script which is generating C++ code based on the data.

I have python variable string which contains a string which can be composed of characters like " or newlines.

What is the best way to escape this string for code generation?

Krizz · Accepted Answer

The way I use is based on the observation that C++ strings basically obey the same rules regarding charactes and escaping as Javascript/JSON string.

Python since version 2.6 has a built-in JSON library which can serialize Python data into JSON. Therefore, the code is, assuming we don't need enclosing quotes, just as follows:

import json
string_for_printing = json.dumps(original_string).strip('"')

sth · Answer

I use this code, so far without problems:

def string(s, encoding='ascii'):
   if isinstance(s, unicode):
      s = s.encode(encoding)
   result = ''
   for c in s:
      if not (32 <= ord(c) < 127) or c in ('\', '"'):
         result += '\%03o' % ord(c)
      else:
         result += c
   return '"' + result + '"'

It uses octal escapes to avoid all potentially problematic characters.

Gregory Morse · Answer

We can do better using specifics of C found here (https://www.gnu.org/software/gnu-c-manual/gnu-c-manual.html#Character-Constants) and Python's built-in printable function:

def c_escape():
  import string
  mp = []
  for c in range(256):
    if c == ord('\'): mp.append("\\")
    elif c == ord('?'): mp.append("\?")
    elif c == ord('\''): mp.append("\'")
    elif c == ord('"'): mp.append("\\"")
    elif c == ord('\a'): mp.append("\a")
    elif c == ord('\b'): mp.append("\b")
    elif c == ord('\f'): mp.append("\f")
    elif c == ord('
'): mp.append("\n")
    elif c == ord('
'): mp.append("\r")
    elif c == ord('	'): mp.append("\t")
    elif c == ord('\v'): mp.append("\v")
    elif chr(c) in string.printable: mp.append(chr(c))
    else:
      x = "\%03o" % c
      mp.append(x if c>=64 else (("\%%0%do" % (1+c>=8)) % c, x))
  return mp

This has the advantage of now being a mapping from ordinal value of a character ord(c) to the exact string. Using += for strings is slow and bad practice, so this allows for "".join(...) which is far more performant in Python. Not to mention, indexing a list/table is much faster than doing computations on characters each time through. Also do not waste octal characters either by checking if less characters are needed. However, to use this, you must verify the next character is not a 0 through 7 otherwise you must use the 3 digit octal format.

The table looks like:

[('\0', '\000'), ('\1', '\001'), ('\2', '\002'), ('\3', '\003'), ('\4', '\004'), ('\5', '\005'), ('\6', '\006'), '\a', '\b', '\t', '\n', '\v', '\f', '\r', ('\16', '\016'), ('\17', '\017'), ('\20', '\020'), ('\21', '\021'), ('\22', '\022'), ('\23', '\023'), ('\24', '\024'), ('\25', '\025'), ('\26', '\026'), ('\27', '\027'), ('\30', '\030'), ('\31', '\031'), ('\32', '\032'), ('\33', '\033'), ('\34', '\034'), ('\35', '\035'), ('\36', '\036'), ('\37', '\037'), ' ', '!', '\"', '#', '$', '%', '&', "\'", '(', ')', '*', '+', ',', '-', '.', '/', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', ':', ';', '<', '=', '>', '\?', '@', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '[', '\\', ']', '^', '_', '`', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '{', '|', '}', '~', '\177', '\200', '\201', '\202', '\203', '\204', '\205', '\206', '\207', '\210', '\211', '\212', '\213', '\214', '\215', '\216', '\217', '\220', '\221', '\222', '\223', '\224', '\225', '\226', '\227', '\230', '\231', '\232', '\233', '\234', '\235', '\236', '\237', '\240', '\241', '\242', '\243', '\244', '\245', '\246', '\247', '\250', '\251', '\252', '\253', '\254', '\255', '\256', '\257', '\260', '\261', '\262', '\263', '\264', '\265', '\266', '\267', '\270', '\271', '\272', '\273', '\274', '\275', '\276', '\277', '\300', '\301', '\302', '\303', '\304', '\305', '\306', '\307', '\310', '\311', '\312', '\313', '\314', '\315', '\316', '\317', '\320', '\321', '\322', '\323', '\324', '\325', '\326', '\327', '\330', '\331', '\332', '\333', '\334', '\335', '\336', '\337', '\340', '\341', '\342', '\343', '\344', '\345', '\346', '\347', '\350', '\351', '\352', '\353', '\354', '\355', '\356', '\357', '\360', '\361', '\362', '\363', '\364', '\365', '\366', '\367', '\370', '\371', '\372', '\373', '\374', '\375', '\376', '\377']

Example usage encoding some 4-byte integers as C-strings in little-endian byte order with new lines inserted every 50 characters: v

mp = c_escape()
vals = [30,50,100]
bytearr = [z for i, x in enumerate(vals) for z in x.to_bytes(4, 'little', signed=x<0)]
"".join(mp[x] if not type(mp[x]) is tuple else mp[x][1 if not i == len(bytearr)-1 and bytearr[i+1] in list(range(ord('0'), ord('7')+1)) else 0] + ("\"
	\"" if (i % 50) == 49 else "") for i, x in enumerate(bytearr))

#output: '\36\0\0\0002\0\0\0d\0\0\0'

How to escape string for generated C++?

Tags:

python

escaping

code-generation

Krizz

Video Answer

3 Answers

Krizz

sth

Gregory Morse

Recent Activity

Donate For Us

How to escape string for generated C++?

Tags:

python

escaping

code-generation

Krizz

Video Answer

3 Answers

Krizz

sth

Gregory Morse

Related questions

Recent Activity

Donate For Us