I am trying to convert a copy/pasted text to a csv, which I can split after. The problem is that there are whitespace tabs in it that I can't seem to get rid of
Example Copy/Paste:
Amarr Hybrid Tech Decryptor 12 Decryptors - Hybrid 12 m3
Ancient Coordinates Database 23 Sleeper Components 2.30 m3
Caldari Hybrid Tech Decryptor 17 Decryptors - Hybrid 17 m3
Carbon 17 General 34 m3
Cartesian Temporal Coordinator 4 Ancient Salvage 0.04 m3
Central System Controller 2 Ancient Salvage 0.02 m3
Now I'm trying to get something like this:
Amarr Hybrid Tech Decryptor,12,Decryptors - Hybrid,12,m3,
Ancient Coordinates Database,23,Sleeper Components,2.30,m3,
Caldari Hybrid Tech Decryptor,17,Decryptors - Hybrid,17,m3,
Carbon,17,General,34,m3,
Cartesian Temporal Coordinator,4,Ancient Salvage,0.04,m3,
Central System Controller,2,Ancient Salvage,0.02,m3,
(will always be those 5 separations per line
I have been trying to do this on various ways Split by comma and strip whitespace in Python but I can't seem to get it to work.
@login_required
def index(request):
if request.method == "POST":
form = SellListForm(request.POST)
if form.is_valid():
selllist = form.save(commit=False)
selllist.user = request.user
string = selllist.sell
string = [x.strip() for x in string.split(',')]
print string
return HttpResponseRedirect(reverse('processed'))
else:
form = SellListForm()
return render(request, 'index.html', {'form': form})
returns
[u'<<<SULTS STUFF>>>\t\t\tVoucher\t\t\t0 m3\r\nAmarr Hybrid Tech Decryptor\t12\tDecryptors - Hybrid\t\t\t12 m3\r\nAncient Coordinates Database\t23\tSleeper Components\t\t\t2.30 m3\r\nCaldari Hybrid Tech Decryptor\t17\tDecryptors - Hybrid\t\t\t17 m3\r\nCarbon\t17\tGeneral\t\t\t34 m3\r\nCartesian Temporal Coordinator\t4\tAncient Salvage\t\t\t0.04 m3\r\nCentral System Controller\t2\tAncient Salvage\t\t\t0.02 m3']
The Pythonic way of splitting on a string in Python uses the str. split(sep) function. It splits the string based on the specified delimiter sep . When the delimiter is not provided, the consecutive whitespace is treated as a separator.
To split a string keeping the whitespace, call the split() method passing it the following regular expression - /(\s+)/ . The regular expression uses a capturing group to preserve the whitespace when splitting the string.
We shall then split the string by space using String.split () method. split () method returns list of chunks. In this example, we will take a string with chunks separated by one or more single space characters. Then we shall split the string using re.split () function. re.split () returns chunks in a list.
In summary, the best and most optimal way to split a string using whitespaces as a delimiter is the built-in split () method. It’s attached to the string object and considers leading and trailing whitespaces by default. Using this also doesn’t need any knowledge of regular expressions.
Whitespace is a character or set of characters that represents vertical or horizontal space. The split function takes a single optional argument. If you use this function without a parameter, it separates words by single or series of whitespace characters, as long as there is no other character between them.
Here, the only two whitespace characters are the two spaces. As a result, splitting this string by whitespace would result in a list of three strings:
I see that you have several \t
sometimes. I'd use the re
module to split correctly:
for line in lines:
linedata = re.split(r'\t+', line)
print ",".join(linedata)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With