I have this code in my C file:
printf("Worker name is %s and id is %d", worker.name, worker.id);
I want, with Python, to be able to parse the format string and locate the "%s"
and "%d"
.
So I want to have a function:
>>> my_function("Worker name is %s and id is %d")
[Out1]: ((15, "%s"), (28, "%d))
I've tried to achieve this using libclang's Python bindings, and with pycparser, but I didn't see how can this be done with these tools.
I've also tried using regex to solve this, but this is not simple at all - think about use cases when the printf
has "%%s"
and stuff like this.
Both gcc and clang obviously do this as part of compiling - have no one exported this logic to Python?
You can certainly find properly formatted candidates with a regex.
Take a look at the definition of the C Format Specification. (Using Microsofts, but use what you want.)
It is:
%[flags] [width] [.precision] [{h | l | ll | w | I | I32 | I64}] type
You also have the special case of %%
which becomes %
in printf.
You can translate that pattern into a regex:
( # start of capture group 1
% # literal "%"
(?: # first option
(?:[-+0 #]{0,5}) # optional flags
(?:\d+|\*)? # width
(?:\.(?:\d+|\*))? # precision
(?:h|l|ll|w|I|I32|I64)? # size
[cCdiouxXeEfgGaAnpsSZ] # type
) | # OR
%%) # literal "%%"
Demo
And then into a Python regex:
import re
lines='''\
Worker name is %s and id is %d
That is %i%%
%c
Decimal: %d Justified: %.6d
%10c%5hc%5C%5lc
The temp is %.*f
%ss%lii
%*.*s | %.3d | %lC | %s%%%02d'''
cfmt='''\
( # start of capture group 1
% # literal "%"
(?: # first option
(?:[-+0 #]{0,5}) # optional flags
(?:\d+|\*)? # width
(?:\.(?:\d+|\*))? # precision
(?:h|l|ll|w|I|I32|I64)? # size
[cCdiouxXeEfgGaAnpsSZ] # type
) | # OR
%%) # literal "%%"
'''
for line in lines.splitlines():
print '"{}"\n\t{}\n'.format(line,
tuple((m.start(1), m.group(1)) for m in re.finditer(cfmt, line, flags=re.X)))
Prints:
"Worker name is %s and id is %d"
((15, '%s'), (28, '%d'))
"That is %i%%"
((8, '%i'), (10, '%%'))
"%c"
((0, '%c'),)
"Decimal: %d Justified: %.6d"
((9, '%d'), (24, '%.6d'))
"%10c%5hc%5C%5lc"
((0, '%10c'), (4, '%5hc'), (8, '%5C'), (11, '%5lc'))
"The temp is %.*f"
((12, '%.*f'),)
"%ss%lii"
((0, '%s'), (3, '%li'))
"%*.*s | %.3d | %lC | %s%%%02d"
((0, '%*.*s'), (8, '%.3d'), (15, '%lC'), (21, '%s'), (23, '%%'), (25, '%02d'))
A simple implementation might be the following generator:
def find_format_specifiers(s):
last_percent = False
for i in range(len(s)):
if s[i] == "%" and not last_percent:
if s[i+1] != "%":
yield (i, s[i:i+2])
last_percent = True
else:
last_percent = False
>>> list(find_format_specifiers("Worker name is %s and id is %d but %%q"))
[(15, '%s'), (28, '%d')]
This can be fairly easily extended to handle additional format specifier information like width and precision, if needed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With