Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python split string on whitespace

I am trying to convert a copy/pasted text to a csv, which I can split after. The problem is that there are whitespace tabs in it that I can't seem to get rid of

Example Copy/Paste:

Amarr Hybrid Tech Decryptor 12  Decryptors - Hybrid         12 m3
Ancient Coordinates Database    23  Sleeper Components          2.30 m3
Caldari Hybrid Tech Decryptor   17  Decryptors - Hybrid         17 m3
Carbon  17  General         34 m3
Cartesian Temporal Coordinator  4   Ancient Salvage         0.04 m3
Central System Controller   2   Ancient Salvage         0.02 m3

Now I'm trying to get something like this:

Amarr Hybrid Tech Decryptor,12,Decryptors - Hybrid,12,m3,
Ancient Coordinates Database,23,Sleeper Components,2.30,m3,
Caldari Hybrid Tech Decryptor,17,Decryptors - Hybrid,17,m3,
Carbon,17,General,34,m3,
Cartesian Temporal Coordinator,4,Ancient Salvage,0.04,m3,
Central System Controller,2,Ancient Salvage,0.02,m3,

(will always be those 5 separations per line

I have been trying to do this on various ways Split by comma and strip whitespace in Python but I can't seem to get it to work.

@login_required
def index(request):
    if request.method == "POST":
        form = SellListForm(request.POST)
        if form.is_valid():
            selllist = form.save(commit=False)
            selllist.user = request.user
            string = selllist.sell
            string = [x.strip() for x in string.split(',')] 
            print string
            return HttpResponseRedirect(reverse('processed'))
    else:
        form = SellListForm()
    return render(request, 'index.html', {'form': form})

returns

[u'<<<SULTS STUFF>>>\t\t\tVoucher\t\t\t0 m3\r\nAmarr Hybrid Tech Decryptor\t12\tDecryptors - Hybrid\t\t\t12 m3\r\nAncient Coordinates Database\t23\tSleeper Components\t\t\t2.30 m3\r\nCaldari Hybrid Tech Decryptor\t17\tDecryptors - Hybrid\t\t\t17 m3\r\nCarbon\t17\tGeneral\t\t\t34 m3\r\nCartesian Temporal Coordinator\t4\tAncient Salvage\t\t\t0.04 m3\r\nCentral System Controller\t2\tAncient Salvage\t\t\t0.02 m3']
like image 248
Hans de Jong Avatar asked Jan 24 '14 11:01

Hans de Jong


People also ask

How do you split a string in whitespace in Python?

The Pythonic way of splitting on a string in Python uses the str. split(sep) function. It splits the string based on the specified delimiter sep . When the delimiter is not provided, the consecutive whitespace is treated as a separator.

How do you split with whitespace?

To split a string keeping the whitespace, call the split() method passing it the following regular expression - /(\s+)/ . The regular expression uses a capturing group to preserve the whitespace when splitting the string.

How to split a string by space in Python?

We shall then split the string by space using String.split () method. split () method returns list of chunks. In this example, we will take a string with chunks separated by one or more single space characters. Then we shall split the string using re.split () function. re.split () returns chunks in a list.

How to split a string using whitespaces as a delimiter?

In summary, the best and most optimal way to split a string using whitespaces as a delimiter is the built-in split () method. It’s attached to the string object and considers leading and trailing whitespaces by default. Using this also doesn’t need any knowledge of regular expressions.

What is the use of whitespace in Python?

Whitespace is a character or set of characters that represents vertical or horizontal space. The split function takes a single optional argument. If you use this function without a parameter, it separates words by single or series of whitespace characters, as long as there is no other character between them.

How many whitespace characters are there in a string?

Here, the only two whitespace characters are the two spaces. As a result, splitting this string by whitespace would result in a list of three strings:


1 Answers

I see that you have several \t sometimes. I'd use the re module to split correctly:

for line in lines:
    linedata = re.split(r'\t+', line)
    print ",".join(linedata)
like image 57
Maxime Lorant Avatar answered Sep 28 '22 08:09

Maxime Lorant