Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting a String to a List of Words?

I'm trying to convert a string to a list of words using python. I want to take something like the following:

string = 'This is a string, with words!' 

Then convert to something like this :

list = ['This', 'is', 'a', 'string', 'with', 'words'] 

Notice the omission of punctuation and spaces. What would be the fastest way of going about this?

like image 454
rectangletangle Avatar asked May 31 '11 00:05

rectangletangle


People also ask

How do I turn a string into a word list?

How to Convert a String to a List of Words. Another way to convert a string to a list is by using the split() Python method. The split() method splits a string into a list, where each list item is each word that makes up the string. Each word will be an individual list item.

Can you convert a string to a list?

Strings can be converted to lists using list() .

How do you make a list of words in Python?

Given a Sentence, write a Python program to convert the given sentence into list of words. The simplest approach provided by Python to convert the given list of Sentence into words with separate indices is to use split() method. This method split a string into a list where each word is a list item.

How do you turn a string into a list in Python?

To convert string to list in Python, use the string split() method. The split() is a built-in Python method that splits the strings and stores them in the list.


1 Answers

Try this:

import re  mystr = 'This is a string, with words!' wordList = re.sub("[^\w]", " ",  mystr).split() 

How it works:

From the docs :

re.sub(pattern, repl, string, count=0, flags=0) 

Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl. If the pattern isn’t found, string is returned unchanged. repl can be a string or a function.

so in our case :

pattern is any non-alphanumeric character.

[\w] means any alphanumeric character and is equal to the character set [a-zA-Z0-9_]

a to z, A to Z , 0 to 9 and underscore.

so we match any non-alphanumeric character and replace it with a space .

and then we split() it which splits string by space and converts it to a list

so 'hello-world'

becomes 'hello world'

with re.sub

and then ['hello' , 'world']

after split()

let me know if any doubts come up.

like image 102
Bryan Avatar answered Sep 21 '22 00:09

Bryan