convert substrings to dict

Tags:

Looking for a elegant way to convert a list of substrings and the text between them to key, value pairs in a dict. Example:

s = 'k1:some text k2:more text k3:and still more'
key_list = ['k1','k2','k3']
(missing code)
# s_dict = {'k1':'some text', 'k2':'more text', 'k3':'and still more'}

This is solvable using str.find(), etc, but I know there's a better solution than what I've hacked together.

356

asked Feb 15 '18 07:02

anon01

3 Answers

Option 1
If the keys don't have spaces or colons, you can simplify your solution with dict + re.findall (import re, first):

>>> dict(re.findall('(\S+):(.*?)(?=\s\S+:|$)', s))
{'k1': 'some text', 'k2': 'more text', 'k3': 'and still more'}

Only the placement of the colon (:) determines how keys/values are matched.

Details

(\S+)   # match the key (anything that is not a space)
:       # colon (not matched)
(.*?)   # non-greedy match - one or more characters - this matches the value 
(?=     # use lookahead to determine when to stop matching the value
\s      # space
\S+:    # anything that is not a space followed by a colon 
|       # regex OR
$)      # EOL

Note that this code assumes the structure as presented in the question. It will fail on strings with invalid structures.

Option 2
Look ma, no regex...
This operates on the same assumption as the one above.

Split on colon (:)
All elements but the first and last will need to be split again, on space (to separate keys and values)
zip adjacent elements, and convert to dictionary

v = s.split(':')
v[1:-1] = [j for i in v[1:-1] for j in i.rsplit(None, 1)]

dict(zip(v[::2], v[1::2]))
{'k1': 'some text', 'k2': 'more text', 'k3': 'and still more'}

answered Oct 26 '22 02:10

cs95

If the keys don't have spaces or colons in it, you could:

split according to alpha followed by colon to get the tokens
zip half shifted slices in a dict comprehension to rebuild the dict

like this:

import re,itertools
s = 'k1:some text k2:more text k3:and still more'
toks = [x for x in re.split("(\w+):",s) if x]  # we need to filter off empty tokens
# toks => ['k1', 'some text ', 'k2', 'more text ', 'k3', 'and still more']
d = {k:v for k,v in zip(itertools.islice(toks,None,None,2),itertools.islice(toks,1,None,2))}
print(d)

result:

{'k2': 'more text ', 'k1': 'some text ', 'k3': 'and still more'}

using itertools.islice avoids to create sub-lists like toks[::2] would do

answered Oct 26 '22 01:10

Jean-François Fabre

Another regex magic with splitting the input string on key/value pairs:

import re

s = 'k1:some text k2:more text k3:and still more'
pat = re.compile(r'\s+(?=\w+:)')
result = dict(i.split(':') for i in pat.split(s))

print(result)

The output:

{'k1': 'some text', 'k2': 'more text', 'k3': 'and still more'}

using re.compile() and saving the resulting regular expression object for reuse is more efficient when the expression will be used several times in a single program
\s+(?=\w+:) - the crucial pattern to split the input string by whitespace character(s) \s+ if it's followed by a "key"(a word \w+ with colon :).
(?=...) - stands for lookahead positive assertion

answered Oct 26 '22 02:10

RomanPerekhrest

Related questions
                            
                                Django forms - how to override field validation
                            
                                winreg.OpenKey throws filenotfound error for existing registry keys
                            
                                How to create surface plot from greyscale image with Matplotlib?
                            
                                Extend numpy mask by n cells to the right for each bad value, efficiently
                            
                                dict.get(key, default) vs dict.get(key) or default
                            
                                Python: Remove numbers at the beginning of a string
                            
                                Getting <script> and <div> tags from Plotly using Python
                            
                                How to format a multi line string with triple quotes inside using Python?
                            
                                Is `await` in Python3 Cooperative Multitasking?
                            
                                Yum Install libhdf5-dev on Amazon Linux
                            
                                How to mock random.choice in python?
                            
                                Ansible - how to remove an item from a list?
                            
                                Pandas - Filter across all columns
                            
                                How to change the columns name from a tuple to string?
                            
                                Error while installing Scrapy error: Microsoft Visual C++ 14.0 is required
                            
                                mysql.connector - You have an error in your SQL syntax; near '%s' at line 1 [closed]
                            
                                Python: reading 12-bit binary files
                            
                                Turn series of dictionaries into a DataFrame - Pandas
                            
                                Updating a row using SQLAlchemy ORM
                            
                                Improving Gensim Doc2vec results

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

convert substrings to dict

Tags:

python

string

dictionary