Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Splitting strings in python

I have a string which is like this:

this is [bracket test] "and quotes test "

I'm trying to write something in Python to split it up by space while ignoring spaces within square braces and quotes. The result I'm looking for is:

['this','is','bracket test','and quotes test ']

like image 554
user31256 Avatar asked Oct 24 '08 17:10

user31256


People also ask

How do you split a string in Python?

The split() method splits a string into a list. You can specify the separator, default separator is any whitespace. Note: When maxsplit is specified, the list will contain the specified number of elements plus one.

How do you split a string into two strings in Python?

Use Newline (\n) Character In Python, the string is split by the use of the newline (\n) character.

What is splitting in Python?

The string manipulation function in Python used to break down a bigger string into several smaller strings is called the split() function in Python. The split() function returns the strings as a list.


2 Answers

Here's a simplistic solution that works with your test input:

import re
re.findall('\[[^\]]*\]|\"[^\"]*\"|\S+',s)

This will return any code that matches either

  • a open bracket followed by zero or more non-close-bracket characters followed by a close bracket,
  • a double-quote followed by zero or more non-quote characters followed by a quote,
  • any group of non-whitespace characters

This works with your example, but might fail for many real-world strings you may encounter. For example, you didn't say what you expect with unbalanced brackets or quotes,or how you want single quotes or escape characters to work. For simple cases, though, the above might be good enough.

like image 155
Bryan Oakley Avatar answered Sep 21 '22 08:09

Bryan Oakley


To complete Bryan post and match exactly the answer :

>>> import re
>>> txt = 'this is [bracket test] "and quotes test "'
>>> [x[1:-1] if x[0] in '["' else x for x in re.findall('\[[^\]]*\]|\"[^\"]*\"|\S+', txt)]
['this', 'is', 'bracket test', 'and quotes test ']

Don't misunderstand the whole syntax used : This is not several statments on a single line but a single functional statment (more bugproof).

like image 37
PhE Avatar answered Sep 22 '22 08:09

PhE