Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python iterate over list and join lines without a special character to the previous item

I'm wondering if anyone has a sort of hacky / cool solution to this problem . I have a text file like so:

NAME:name
ID:id
PERSON:person
LOCATION:location

NAME:name
morenamestuff
ID:id
PERSON:person
LOCATION:location

JUNK

So I have some blocks that all contain lines that can be split into a dict, and some that cannot. How can I take lines without the : character and join them to the previous line? Here's what I'm currently doing

# loop through chunk
    # the first element of dat is a Title, so skip that
    key_map = dict(x.split(':') for x in dat[1:])

But I of course get an error because the second chunk has a line without the : character. So I wanted my dict to look something like this after correctly splitting it:

# there will be a key_map for each chunk of data
key_map['NAME'] == 'name morenamestuff' # 3rd line appended to previous
key_map['ID'] == 'id'
key_map['PERSON'] = 'person'
key_map['LOCATION'] = 'location

Solution

EDIT: Here's my final solution on github, and the full code here:

parseScript.py

import re
import string

bad_chars = '(){}"<>[] '     # characers we want to strip from the string
key_map = []

# parse file
with open("dat.txt") as f:
    data = f.read()
    data = data.strip('\n')
    data = re.split('}|\[{', data)

# format file
with open("format.dat") as f:
    formatData = [x.strip('\n') for x in f.readlines()]

data = filter(len, data)

# strip and split each station
for dat in data[1:-1]:
    # perform black magic, don't even try to understand this
    dat = dat.translate(string.maketrans("", "", ), bad_chars).split(',')
    key_map.append(dict(x.split(':') for x in dat if ':' in x ))
    if ':' not in dat[1]:key_map['NAME']+=dat[k][2]


for station in range(0, len(key_map)):
    for opt in formatData:
        print opt,":",key_map[station][opt]
    print ""

dat.txt

View raw here

format.dat

NAME
STID
LONGITUDE
LATITUDE
ELEVATION
STATE
ID

out.dat

View raw here

like image 962
Syntactic Fructose Avatar asked Jun 11 '15 02:06

Syntactic Fructose


People also ask

How do I remove special characters from a list in Python?

The str. isalnum() method checks a string for the alphabet or number, and this property is used to remove special characters. The replace() method is used to replace special characters with empty characters or null values.

How do I remove all special characters from a list?

Method 1: Using map() + str.strip() In this, we employ strip(), which has the ability to remove the trailing and leading special unwanted characters from string list. The map(), is used to extend the logic to each element in list.

How do I remove special characters from a string in Python?

Using 'str. replace() , we can replace a specific character. If we want to remove that specific character, replace that character with an empty string. The str. replace() method will replace all occurrences of the specific character mentioned.

How do you iterate through a string in a list Python?

Use the string index number to loop through the string One way to iterate over a string is to use for i in range(len(str)): . In this loop, the variable i receives the index so that each character can be accessed using str[i] .


1 Answers

When in doubt, write your own generator.

Add in itertools.groupby to chunk by groups of text delimited by whitespace breaks.

def chunker(s):
     it = iter(s)
     out = [next(it)]
     for line in it:
         if ':' in line or not line:
             yield ' '.join(out)
             out = []
         out.append(line)
     if out:
         yield ' '.join(out)

usage:

from itertools import groupby

[dict(x.split(':') for x in g) for k,g in groupby(chunker(lines), bool) if k]
Out[65]: 
[{'ID': 'id', 'LOCATION': 'location', 'NAME': 'name', 'PERSON': 'person'},
 {'ID': 'id',
  'LOCATION': 'location',
  'NAME': 'name morenamestuff',
  'PERSON': 'person'}]

(if those fields are always the same, I'd go with something like creating some namedtuples instead of a bunch of dicts)

from collections import namedtuple

Thing = namedtuple('Thing', 'ID LOCATION NAME PERSON')

[Thing(**dict(x.split(':') for x in g)) for k,g in groupby(chunker(lines), bool) if k]
Out[76]: 
[Thing(ID='id', LOCATION='location', NAME='name', PERSON='person'),
 Thing(ID='id', LOCATION='location', NAME='name morenamestuff', PERSON='person')]
like image 68
roippi Avatar answered Oct 29 '22 22:10

roippi