Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Within a python script, check syntactic correctness of C code in str format

Tags:

python

c

llvm

clang

Necessarily within a python program and given an str variable that contains C code, I want to check fast if this code is syntactically correct, or not. Essentially, I only need to pass it through the compiler's front end.

My current implementation uses a temp file to dump the string and calls a clang process with subprocess (non-working code below to illustrate my solution). This is very slow for my needs.

src = "int main(){printf("This is a C program\n"); return 0;}"
with open(temp_file, 'w') as f:
  f.write(src)
  cmd = ["clang", abs_path(f), flags]
  subprocess.Popen(cmd)
  ## etc..

After looking around, I found out about clang.cindex module (pip clang), which I tried out. After reading a bit the main module, lines 2763-2837 (specifically line 2828) led me to the conclusion that the following code snippet will do what I need:

import clang.cindex
......
try:
  unit = clang.cindex.TranslationUnit.from_source(temp_code_file, ##args, etc.)
  print("Compiled!")
except clang.cindex.TranslationUnitLoadError:
  print("Did not compile!")

However, it seems that even if the source file contains obvious syntactic errors, an exception is not raised. Anyone knows what am I missing to make this work ?

On a general context, any suggestions on how to do this task as fast as possible would be more than welcome. Even with clang.cindex, I cannot get away from writing my string-represented code to a temp file, which may be an additional overhead. Writing a python parser could solve this but is an overkill at the moment, no matter how much I need speed.

like image 370
Foivos Ts Avatar asked Feb 02 '26 01:02

Foivos Ts


1 Answers

The compilation itself succeeds even if the file has syntax errors. Consider the following example:

import clang.cindex

with open('broken.c', 'w') as f:
    f.write('foo bar baz')

unit = clang.cindex.TranslationUnit.from_source('broken.c')
for d in unit.diagnostics:
    print(d.severity, d)

Run it and you will get

3 broken.c:1:1: error: unknown type name 'foo'
3 broken.c:1:8: error: expected ';' after top level declarator

The severity member of is an int, with the value from the enum CXDiagnosticSeverity with values

  • CXDiagnostic_Ignored = 0
  • CXDiagnostic_Note = 1
  • CXDiagnostic_Warning = 2
  • CXDiagnostic_Error = 3
  • CXDiagnostic_Fatal = 4
like image 149


Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!