Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

convert substrings to dict

Looking for a elegant way to convert a list of substrings and the text between them to key, value pairs in a dict. Example:

s = 'k1:some text k2:more text k3:and still more'
key_list = ['k1','k2','k3']
(missing code)
# s_dict = {'k1':'some text', 'k2':'more text', 'k3':'and still more'}  

This is solvable using str.find(), etc, but I know there's a better solution than what I've hacked together.

like image 356
anon01 Avatar asked Feb 15 '18 07:02

anon01


People also ask

How do I change a string to a dictionary?

Method 1: Splitting a string to generate key:value pair of the dictionary In this approach, the given string will be analysed and with the use of split() method, the string will be split in such a way that it generates the key:value pair for the creation of a dictionary.

How do you convert a string representation to a dictionary in Python?

To convert a string to dictionary, we have to ensure that the string contains a valid representation of dictionary. This can be done by eval() function. Abstract Syntax Tree (ast) module of Python has literal_eval() method which safely evaluates valid Python literal structure.

How do you convert a variable to a dictionary in Python?

Using zip and dictThe dict() can be used to take input parameters and convert them to a dictionary. We also use the zip function to group the keys and values together which finally become the key value pair in the dictionary.

How do you make a list into a dictionary?

To convert a list to dictionary, we can use list comprehension and make a key:value pair of consecutive elements. Finally, typecase the list to dict type.


3 Answers

Option 1
If the keys don't have spaces or colons, you can simplify your solution with dict + re.findall (import re, first):

>>> dict(re.findall('(\S+):(.*?)(?=\s\S+:|$)', s))
{'k1': 'some text', 'k2': 'more text', 'k3': 'and still more'}

Only the placement of the colon (:) determines how keys/values are matched.

Details

(\S+)   # match the key (anything that is not a space)
:       # colon (not matched)
(.*?)   # non-greedy match - one or more characters - this matches the value 
(?=     # use lookahead to determine when to stop matching the value
\s      # space
\S+:    # anything that is not a space followed by a colon 
|       # regex OR
$)      # EOL

Note that this code assumes the structure as presented in the question. It will fail on strings with invalid structures.


Option 2
Look ma, no regex...
This operates on the same assumption as the one above.

  1. Split on colon (:)
  2. All elements but the first and last will need to be split again, on space (to separate keys and values)
  3. zip adjacent elements, and convert to dictionary

v = s.split(':')
v[1:-1] = [j for i in v[1:-1] for j in i.rsplit(None, 1)]

dict(zip(v[::2], v[1::2]))
{'k1': 'some text', 'k2': 'more text', 'k3': 'and still more'}
like image 80
cs95 Avatar answered Oct 26 '22 02:10

cs95


If the keys don't have spaces or colons in it, you could:

  • split according to alpha followed by colon to get the tokens
  • zip half shifted slices in a dict comprehension to rebuild the dict

like this:

import re,itertools
s = 'k1:some text k2:more text k3:and still more'
toks = [x for x in re.split("(\w+):",s) if x]  # we need to filter off empty tokens
# toks => ['k1', 'some text ', 'k2', 'more text ', 'k3', 'and still more']
d = {k:v for k,v in zip(itertools.islice(toks,None,None,2),itertools.islice(toks,1,None,2))}
print(d)

result:

{'k2': 'more text ', 'k1': 'some text ', 'k3': 'and still more'}

using itertools.islice avoids to create sub-lists like toks[::2] would do

like image 41
Jean-François Fabre Avatar answered Oct 26 '22 01:10

Jean-François Fabre


Another regex magic with splitting the input string on key/value pairs:

import re

s = 'k1:some text k2:more text k3:and still more'
pat = re.compile(r'\s+(?=\w+:)')
result = dict(i.split(':') for i in pat.split(s))

print(result)

The output:

{'k1': 'some text', 'k2': 'more text', 'k3': 'and still more'}

  • using re.compile() and saving the resulting regular expression object for reuse is more efficient when the expression will be used several times in a single program
  • \s+(?=\w+:) - the crucial pattern to split the input string by whitespace character(s) \s+ if it's followed by a "key"(a word \w+ with colon :).
    (?=...) - stands for lookahead positive assertion
like image 45
RomanPerekhrest Avatar answered Oct 26 '22 02:10

RomanPerekhrest