Sorry for the vague title, but it's hard to explain concisely. Basically, imagine I have a list (in Python) that looks like this: <pre class="prettyprint"><code>['a', 'b', 'c\nd', 'e', 'f\ng', 'h', 'i'] </code></pre> From that, I want to get this: <pre class="prettyprint"><code>['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i'] </code></pre> One way I was thinking of doing this was using <code>reduce</code> like so: <pre class="prettyprint"><code>reduce(lambda x, y: x + y.split('\n'), lst, []) </code></pre> But I don't think this is very efficient, since it doesn't take advantage of the fact that we know every nth element has the separator in it. Any suggestions? Edit: for more background on how the array was constructed, which may the problem. I have text in the form: <pre class="prettyprint"><code>Ignorable line Field name 1|Field name 2|Field name 3|Field name 4 Value 1|Value 2|Value 3|Value 4 Value 1|Value 2|Value 3|Value 4 ... </code></pre> Where we can have an arbitrary amount of field names, and there will always an equal number of values as field names on line. Note that we can have new lines in the values. We only know that the will be separated by a '|'. So we could have <pre class="prettyprint"><code>Value 1|This is an long value that extends over multiple lines|Value 3|Value 4 </code></pre> How I currently do this is by doing a <code>s.split('\n', 2)</code> so that we get the field names in their own string, and the values in their own string. Then, when splitting the values by '|', we get the list of the form I originally mentioned.

You can just do <code>('\n'.join(lst)).split()</code> to get the 2nd list. <pre class="prettyprint"><code>In [17]: %timeit reduce(lambda x, y: x + y.split('\n'), lst, []) 100000 loops, best of 3: 9.64 µs per loop In [18]: %timeit ('\n'.join(lst)).split() 1000000 loops, best of 3: 1.09 µs per loop </code></pre> Thanks to @Joran Beasley for suggesting <code>split()</code> over <code>split('\n')</code>! <h3>Edit</h3> Now I see your updated question, I think we can avoid getting into such a situation in the beginning, see (using <code>re</code>): <pre class="prettyprint"><code>In [71]: L=re.findall('([^|]+)\|', ''.join(['|'+item+'|' if item.count('|')==3 else item for item in S.split('\n')[1:]])+'|') In [72]: zip(*[L[i::4] for i in range(4)]) #4 being the number of fields. Out[72]: [('Field name 1', 'Field name 2', 'Field name 3', 'Field name 4'), ('Value 1', 'Value 2', 'Value 3', 'Value 4'), ('Value 1', 'This is an longvalue that extends over multiplelines', 'Value 3', 'Value 4')] </code></pre> Looks like a dataset for <code>SAS</code> initially, am I right?

premature optimization is the root of all evil if you are actually experiencing performance issues because of this code thats one thing, but I doubt you are. when you optimize you are often sacrificing readability what I would do if it was me <pre class="prettyprint"><code>list(itertools.chain(*[item.split() for item in lst])) </code></pre> which is very clear what your doing

Best way to split every nth string element and merge into array?

Sorry for the vague title, but it's hard to explain concisely.

Basically, imagine I have a list (in Python) that looks like this:

['a', 'b', 'c\nd', 'e', 'f\ng', 'h', 'i']

From that, I want to get this:

['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']

One way I was thinking of doing this was using reduce like so:

reduce(lambda x, y: x + y.split('\n'), lst, [])

But I don't think this is very efficient, since it doesn't take advantage of the fact that we know every nth element has the separator in it. Any suggestions?

Edit: for more background on how the array was constructed, which may the problem.

I have text in the form:

Ignorable line
Field name 1|Field name 2|Field name 3|Field name 4
Value 1|Value 2|Value 3|Value 4
Value 1|Value 2|Value 3|Value 4
...

Where we can have an arbitrary amount of field names, and there will always an equal number of values as field names on line. Note that we can have new lines in the values. We only know that the will be separated by a '|'. So we could have

Value 1|This is an long
value that extends over multiple
lines|Value 3|Value 4

How I currently do this is by doing a s.split('\n', 2) so that we get the field names in their own string, and the values in their own string. Then, when splitting the values by '|', we get the list of the form I originally mentioned.

How do you split a string into substrings in Python?

Python split() Method Syntax When you need to split a string into substrings, you can use the split() method. In the above syntax: <string> is any valid Python string, sep is the separator that you'd like to split on.

You can just do ('\n'.join(lst)).split() to get the 2nd list.

In [17]:

%timeit reduce(lambda x, y: x + y.split('\n'), lst, [])
100000 loops, best of 3: 9.64 µs per loop
In [18]:

%timeit ('\n'.join(lst)).split() 
1000000 loops, best of 3: 1.09 µs per loop

Thanks to @Joran Beasley for suggesting split() over split('\n')!

Edit

Now I see your updated question, I think we can avoid getting into such a situation in the beginning, see (using re):

In [71]:

L=re.findall('([^|]+)\|',
           ''.join(['|'+item+'|' if item.count('|')==3 else item for item in S.split('\n')[1:]])+'|')
In [72]:

zip(*[L[i::4] for i in range(4)]) #4 being the number of fields.
Out[72]:
[('Field name 1', 'Field name 2', 'Field name 3', 'Field name 4'),
 ('Value 1', 'Value 2', 'Value 3', 'Value 4'),
 ('Value 1',
  'This is an longvalue that extends over multiplelines',
  'Value 3',
  'Value 4')]

Looks like a dataset for SAS initially, am I right?

premature optimization is the root of all evil

if you are actually experiencing performance issues because of this code thats one thing, but I doubt you are.

when you optimize you are often sacrificing readability what I would do if it was me

list(itertools.chain(*[item.split() for item in lst]))

which is very clear what your doing

Best way to split every nth string element and merge into array?

Tags:

python

list

mp94

People also ask

2 Answers

Edit

CT Zhu

Joran Beasley

Recent Activity

Donate For Us

Best way to split every nth string element and merge into array?

Tags:

python

list

mp94

People also ask

2 Answers

Edit

CT Zhu

Joran Beasley

Related questions

Recent Activity

Donate For Us