Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing Python function calls to get argument positions

I want code that can analyze a function call like this:

whatever(foo, baz(), 'puppet', 24+2, meow=3, *meowargs, **meowargs)

And return the positions of each and every argument, in this case foo, baz(), 'puppet', 24+2, meow=3, *meowargs, **meowargs.

I tried using the _ast module, and it seems to be just the thing for the job, but unfortunately there were problems. For example, in an argument like baz() which is a function call itself, I couldn't find a simple way to get its length. (And even if I found one, I don't want a bunch of special cases for every different kind of argument.)

I also looked at the tokenize module but couldn't see how to use it to get the arguments.

Any idea how to solve this?

like image 780
Ram Rachum Avatar asked May 19 '13 13:05

Ram Rachum


People also ask

What is argument parsing in Python?

Using argparse is how you let the user of your program provide values for variables at runtime. It's a means of communication between the writer of a program and the user. That user might be your future self. 😃 Using argparse means the doesn't need to go into the code and make changes to the script.

How do you parse a function in Python?

Python parsing is done using various ways such as the use of parser module, parsing using regular expressions, parsing using some string methods such as split() and strip(), parsing using pandas such as reading CSV file to text by using read. csv, etc.

What is argument parsing?

Argument Parsing using sys.Your program will accept an arbitrary number of arguments passed from the command-line (or terminal) while getting executed. The program will print out the arguments that were passed and the total number of arguments.


1 Answers

This code uses a combination of ast (to find the initial argument offsets) and regular expressions (to identify boundaries of the arguments):

import ast
import re

def collect_offsets(call_string):
    def _abs_offset(lineno, col_offset):
        current_lineno = 0
        total = 0
        for line in call_string.splitlines():
            current_lineno += 1
            if current_lineno == lineno:
                return col_offset + total
            total += len(line)
    # parse call_string with ast
    call = ast.parse(call_string).body[0].value
    # collect offsets provided by ast
    offsets = []
    for arg in call.args:
        a = arg
        while isinstance(a, ast.BinOp):
            a = a.left
        offsets.append(_abs_offset(a.lineno, a.col_offset))
    for kw in call.keywords:
        offsets.append(_abs_offset(kw.value.lineno, kw.value.col_offset))
    if call.starargs:
        offsets.append(_abs_offset(call.starargs.lineno, call.starargs.col_offset))
    if call.kwargs:
        offsets.append(_abs_offset(call.kwargs.lineno, call.kwargs.col_offset))
    offsets.append(len(call_string))
    return offsets

def argpos(call_string):
    def _find_start(prev_end, offset):
        s = call_string[prev_end:offset]
        m = re.search('(\(|,)(\s*)(.*?)$', s)
        return prev_end + m.regs[3][0]
    def _find_end(start, next_offset):
        s = call_string[start:next_offset]
        m = re.search('(\s*)$', s[:max(s.rfind(','), s.rfind(')'))])
        return start + m.start()

    offsets = collect_offsets(call_string)   

    result = []
    # previous end
    end = 0
    # given offsets = [9, 14, 21, ...],
    # zip(offsets, offsets[1:]) returns [(9, 14), (14, 21), ...]
    for offset, next_offset in zip(offsets, offsets[1:]):
        #print 'I:', offset, next_offset
        start = _find_start(end, offset)
        end = _find_end(start, next_offset)
        #print 'R:', start, end
        result.append((start, end))
    return result

if __name__ == '__main__':
    try:
        while True:
            call_string = raw_input()
            positions = argpos(call_string)
            for p in positions:
                print ' ' * p[0] + '^' + ((' ' * (p[1] - p[0] - 2) + '^') if p[1] - p[0] > 1 else '')
            print positions
    except EOFError, KeyboardInterrupt:
        pass

Output:

whatever(foo, baz(), 'puppet', 24+2, meow=3, *meowargs, **meowargs)
         ^ ^
              ^   ^
                     ^      ^
                               ^  ^
                                     ^    ^
                                             ^       ^
                                                        ^        ^
[(9, 12), (14, 19), (21, 29), (31, 35), (37, 43), (45, 54), (56, 66)]
f(1, len(document_text) - 1 - position)
  ^
     ^                               ^
[(2, 3), (5, 38)]
like image 114
utapyngo Avatar answered Sep 26 '22 12:09

utapyngo