Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

split string and make key value pair

I have a following string in python:

Date: 07/14/1995 Time: 11:31:50 Subject text: Something-cool

I want to prepare a dict() from it with following key: [value]

{"Date":["07/13/1995"], "Time": ["11:31:50"], "Subject text":["Something-cool"]}

If I split the string with : I get the following. How can I get the above desired result?

>>> text.split(": ")
['Date', '07/14/1995 Time', '11:31:50 Subject text', 'Something-cool']
like image 973
Anthony Avatar asked May 27 '18 03:05

Anthony


People also ask

How do you split a string into values?

If you are splitting a string at a separator character, use the IndexOf or IndexOfAny method to locate a separator character in the string. If you are splitting a string at a separator string, use the IndexOf or IndexOfAny method to locate the first character of the separator string.

How do you split a key-value pair in Python?

Method 1: Split dictionary keys and values using inbuilt functions. Here, we will use the inbuilt function of Python that is . keys() function in Python, and . values() function in Python to get the keys and values into separate lists.

How do you split a string into characters?

To split a string with specific character as delimiter in Java, call split() method on the string object, and pass the specific character as argument to the split() method. The method returns a String Array with the splits as elements in the array.


1 Answers

Let's use re.findall here:

>>> import re
>>> dict(re.findall(r'(?=\S|^)(.+?): (\S+)', text))
{'Date': '07/14/1995', 'Subject text': 'Something-cool', 'Time': '11:31:50'}

Or, if you insist on the format,

>>> {k : [v] for k, v in re.findall(r'(?=\S|^)(.+?): (\S+)', text)}
{
   'Date'        : ['07/14/1995'],
   'Subject text': ['Something-cool'],
   'Time'        : ['11:31:50']
}

Details

(?=   # lookahead 
\S    # anything that isn't a space
|     # OR
^     # start of line
) 
(.+?) # 1st capture group - 1 or more characters, until...
:     # ...a colon
\s    # space
(\S+) # 2nd capture group - one or more characters that are not wsp 

Semantically, this regex means "get me all pairs of items that follow this particular pattern of something followed by a colon and whitespace and a bunch of characters that are not whitespace". The lookahead at the start is so that the groups are not captured with a leading whitespace (and lookbehinds support only fixed-width assertions, so).

Note: This will fail if your values have spaces in them.


If you're doing this for multiple lines in a text file, let's build on this regex and use a defaultdict:

from collections import defaultdict
d = defaultdict(list)

with open(file) as f:
    for text in file:
        for k, v in re.findall(r'(?=\S|^)(.+?): (\S+)', text.rstrip()):
            d[k].append(v)

This will add one or more values to your dictionary for a given key.

like image 72
cs95 Avatar answered Sep 29 '22 03:09

cs95