Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filtering out strings that only contains digits and/or punctuation - python

I need to filter out only strings that contains only digits and/or a fix set of punctuation.

I've tried checking each character and then summing the Boolean conditions to check if it is equal to the len(str). Is there a more pythonic way to do this:

>>> import string
>>> x = ['12,523', '3.46', "this is not", "foo bar 42", "23fa"]
>>> [i for i in x if [True if j.isdigit() else False for j in i] ]
['12,523', '3.46', 'this is not', 'foo bar 42']
>>> [i for i in x if sum([True if j.isdigit() or j in string.punctuation else False for j in i]) == len(i)]
['12,523', '3.46']
like image 922
alvas Avatar asked Feb 11 '14 08:02

alvas


People also ask

How do you remove numbers and punctuation from a string in Python?

One of the easiest ways to remove punctuation from a string in Python is to use the str. translate() method. The translate method typically takes a translation table, which we'll do using the . maketrans() method.

How do I remove punctuation from a panda string?

To remove punctuation with Python Pandas, we can use the DataFrame's str. replace method. We call replace with a regex string that matches all punctuation characters and replace them with empty strings. replace returns a new DataFrame column and we assign that to df['text'] .

What is the name of the string method that can be used to remove punctuation from strings?

Method 1: Remove Punctuation from a String with Translate translate method is empty strings, and the third input is a Python list of the punctuation that should be removed. This instructs the Python method to eliminate punctuation from a string.


1 Answers

Using all with generator expression, you don't need to count, compare length:

>>> [i for i in x if all(j.isdigit() or j in string.punctuation for j in i)]
['12,523', '3.46']

BTW, above and OP's code will include strings that contains only punctuations.

>>> x = [',,,', '...', '123', 'not number']
>>> [i for i in x if all(j.isdigit() or j in string.punctuation for j in i)]
[',,,', '...', '123']

To handle that, add more condition:

>>> [i for i in x if all(j.isdigit() or j in string.punctuation for j in i) and any(j.isdigit() for j in i)]
['123']

You can make it a little bit faster by storing the result of string.punctuation in a set.

>>> puncs = set(string.punctuation)
>>> [i for i in x if all(j.isdigit() or j in puncs for j in i) and any(j.isdigit() for j in i)]
['123']
like image 153
falsetru Avatar answered Sep 28 '22 03:09

falsetru