Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Splitting list of possibly nested function expressions in Python

Tags:

python

list

split

PostgreSQL allows indexes to be created on expressions, e.g., CREATE INDEX ON films ((lower(title))). It also has a pg_get_expr() information function that translates the internal format of the expression into text, i.e., lower(title) in the former example. The expressions can get quite hairy at times. Here are some examples (in Python):

sample_exprs = [
    'lower(c2)',
    'lower(c2), lower(c3)',
    "btrim(c3, 'x'::text), lower(c2)",
    "date_part('month'::text, dt), date_part('day'::text, dt)",
    '"substring"(c2, "position"(c2, \'_begin\'::text)), "substring"(c2, "position"(c2, \'_end\'::text))',
    "(((c2)::text || ', '::text) || c3), ((c3 || ' '::text) || (c2)::text)",
    'f1(f2(arga, f3()), arg1), f4(arg2, f5(argb, argc)), f6(arg3)']

The last item isn't really from Postgres but is just an extreme example of what my code ought to handle.

I wrote a Python function to split the textual lists into the component expressions. For example, that last item is broken down into:

 f1(f2(arga, f3()), arg1)
 f4(arg2, f5(argb, argc))
 f6(arg3)

I experimented with str methods like find() and count() and also considered regexes, but in the end I wrote a function that is what I would have written in C (essentially counting open and close parens to find where to break the text). Here's the function:

def split_exprs(idx_exprs):
    keyexprs = []
    nopen = nclose = beg = curr = 0
    for c in idx_exprs:
        curr += 1
        if c == '(':
            nopen += 1
        elif c == ')':
            nclose += 1
            if nopen > 0 and nopen == nclose:
                if idx_exprs[beg] == ',':
                    beg += 1
                if idx_exprs[beg] == ' ':
                    beg += 1
                keyexprs.append(idx_exprs[beg:curr])
                beg = curr
                nopen = nclose = 0
    return keyexprs

The question is whether there is a more Pythonic or elegant way to do this or to use regexes to solve this.

like image 658
Joe Abbate Avatar asked Nov 04 '22 14:11

Joe Abbate


2 Answers

Here is my version, more pythonic, less clutter I think, and works on stream of chars , though I don't see any advantage in that :)

def split_fns(fns):
    level = 0
    stack = [[]]
    for ch in fns:
        if level == 0 and ch in [' ',',']:
            continue        
        stack[-1].append(ch)

        if ch == "(":
            level += 1
        elif ch == ")":
            level -= 1
            if level == 0:
                stack.append([])

    return ["".join(t) for t in stack if t]
like image 74
Anurag Uniyal Avatar answered Nov 15 '22 06:11

Anurag Uniyal


If you're looking to make it more pythonic, the only way I can think is readability.

Additionally I avoid one branch and one variable by counting stacks. The only pythonic suggestion I can give is to use the 'index' variable with the enumerate(...) function. As in

for i, j in enumerate(<iterable>)

This will create a variable, i, that will equal the current loop number, where j will be the expected iteration variable.

def split_fns(fns):
    paren_stack_level = 0
    last_pos = 0
    output = []
    for curr_pos, curr_char in enumerate(fns):
        if curr_char == "(":
            paren_stack_level += 1
        elif curr_char == ")":
            paren_stack_level -= 1
            if not paren_stack_level:
                output.append( fns[last_pos:curr_pos+1].lstrip(" ,") )
                last_pos = curr_pos+1
    return output

for i in sample_exprs:
    print(split_fns(i))
like image 43
jsvk Avatar answered Nov 15 '22 06:11

jsvk