Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

`re.split()` in Python working strangely

Tags:

python

split

Having a bit of a predicament in python. I'd like to take a .txt file with many comments and split it into a list. However, I'd like to split on all punctuation, spaces and \n. When I run the following python code, it splits my text file in weird spots. NOTE: Below I am only trying to split on periods and endlines to test it out. But it is still often getting rid of the last letter in words.

import regex as re
with open('G:/My Documents/AHRQUnstructuredComments2.txt','r') as infile:
    nf = infile.read()
    wList = re.split('. | \n, nf)

print(wList)
like image 725
John W Avatar asked Jan 22 '26 02:01

John W


2 Answers

You need to fix the quote marks and make a slight change to the regular expression:

import regex as re
with open('G:/My Documents/AHRQUnstructuredComments2.txt','r') as infile:
    nf = infile.read()
    wList = re.split('\W+' nf)

print(wList)
like image 192
Ajax1234 Avatar answered Jan 23 '26 15:01

Ajax1234


In regex, the character . means any character. You have to escape it, \., to capture periods.

like image 44
Jared Goguen Avatar answered Jan 23 '26 15:01

Jared Goguen



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!