PostgreSQL allows indexes to be created on expressions, e.g., CREATE INDEX ON films ((lower(title)))
. It also has a pg_get_expr()
information function that translates the internal format of the expression into text, i.e., lower(title)
in the former example. The expressions can get quite hairy at times. Here are some examples (in Python):
sample_exprs = [
'lower(c2)',
'lower(c2), lower(c3)',
"btrim(c3, 'x'::text), lower(c2)",
"date_part('month'::text, dt), date_part('day'::text, dt)",
'"substring"(c2, "position"(c2, \'_begin\'::text)), "substring"(c2, "position"(c2, \'_end\'::text))',
"(((c2)::text || ', '::text) || c3), ((c3 || ' '::text) || (c2)::text)",
'f1(f2(arga, f3()), arg1), f4(arg2, f5(argb, argc)), f6(arg3)']
The last item isn't really from Postgres but is just an extreme example of what my code ought to handle.
I wrote a Python function to split the textual lists into the component expressions. For example, that last item is broken down into:
f1(f2(arga, f3()), arg1)
f4(arg2, f5(argb, argc))
f6(arg3)
I experimented with str
methods like find()
and count()
and also considered regexes, but in the end I wrote a function that is what I would have written in C (essentially counting open and close parens to find where to break the text). Here's the function:
def split_exprs(idx_exprs):
keyexprs = []
nopen = nclose = beg = curr = 0
for c in idx_exprs:
curr += 1
if c == '(':
nopen += 1
elif c == ')':
nclose += 1
if nopen > 0 and nopen == nclose:
if idx_exprs[beg] == ',':
beg += 1
if idx_exprs[beg] == ' ':
beg += 1
keyexprs.append(idx_exprs[beg:curr])
beg = curr
nopen = nclose = 0
return keyexprs
The question is whether there is a more Pythonic or elegant way to do this or to use regexes to solve this.
Here is my version, more pythonic, less clutter I think, and works on stream of chars , though I don't see any advantage in that :)
def split_fns(fns):
level = 0
stack = [[]]
for ch in fns:
if level == 0 and ch in [' ',',']:
continue
stack[-1].append(ch)
if ch == "(":
level += 1
elif ch == ")":
level -= 1
if level == 0:
stack.append([])
return ["".join(t) for t in stack if t]
If you're looking to make it more pythonic, the only way I can think is readability.
Additionally I avoid one branch and one variable by counting stacks. The only pythonic suggestion I can give is to use the 'index' variable with the enumerate(...)
function. As in
for i, j in enumerate(<iterable>)
This will create a variable, i
, that will equal the current loop number, where j
will be the expected iteration variable.
def split_fns(fns):
paren_stack_level = 0
last_pos = 0
output = []
for curr_pos, curr_char in enumerate(fns):
if curr_char == "(":
paren_stack_level += 1
elif curr_char == ")":
paren_stack_level -= 1
if not paren_stack_level:
output.append( fns[last_pos:curr_pos+1].lstrip(" ,") )
last_pos = curr_pos+1
return output
for i in sample_exprs:
print(split_fns(i))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With