Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiently replace multi-line list strings with single_line list string

Tags:

python

regex

I'm trying to parse output from OS X's mdls command. For some keys, the value is a list of values. I need to capture these key, value pairs correctly. All lists of values start with a ( and then end with a ).

I need to be able to iterate over all the key, value pairs so that I can properly parse multiple outputs (i.e. mdls run on multiple files to produce a single output, where there is no distinction between where one file's metadata ends and the other's begins). I have some sample code below.

Is there a more efficient way to do this?

import re

mdls_output = """kMDItemAuthors                 = (
    margheim
)
kMDItemContentCreationDate     = 2015-07-10 14:41:01 +0000
kMDItemContentModificationDate = 2015-07-10 14:41:01 +0000
kMDItemContentType             = "com.adobe.pdf"
kMDItemContentTypeTree         = (
    "com.adobe.pdf",
    "public.data",
    "public.item",
    "public.composite-content",
    "public.content"
)
kMDItemCreator                 = "Safari"
kMDItemDateAdded               = 2015-07-10 14:41:01 +0000
"""

mdls_lists = re.findall(r"^\w+\s+=\s\(\n.*?\n\)$", mdls_output, re.S | re.M)
single_line_lists = [re.sub(r'\s+', ' ', x.strip()) for x in mdls_lists]
for i, mdls_list in enumerate(mdls_lists):
    mdls_output = mdls_output.replace(mdls_list, single_line_lists[i])
print(mdls_output)
like image 325
smargh Avatar asked Oct 31 '22 00:10

smargh


1 Answers

You can take advantage of python's regex substitute which can take a function as replacement string. The function is called for each match with the match object. The returned string replaces the match.

def myfn(m):
    return re.sub(r'\s+', ' ', m.group().strip())

pat = re.compile(r"^\w+\s+=\s\(\n.*?\n\)$", re.S | re.M)
mdls_output = pat.sub(myfn, mdls_output)
like image 157
meuh Avatar answered Nov 15 '22 03:11

meuh