I have a list of tokenized text (list_of_words) that looks something like this:
list_of_words =
['08/20/2014',
'10:04:27',
'pm',
'complet',
'vendor',
'per',
'mfg/recommend',
'08/20/2014',
'10:04:27',
'pm',
'complet',
...]
and I'm trying to strip out all the instances of dates and times from this list. I've tried using the .remove() function, to no avail. I've tried passing wildcard characters, such as '../../...." to a list of stopwords I was sorting with, but that didn't work. I finally tried writing the following code:
for line in list_of_words:
if re.search('[0-9]{2}/[09]{2}/[0-9]{4}',line):
list_of_words.remove(line)
but that doesn't work either. How can I strip out everything formatted like a date or time from my list?
How to Remove an Element from a List Using the remove() Method in Python. To remove an element from a list using the remove() method, specify the value of that element and pass it as an argument to the method. remove() will search the list to find it and remove it.
In this tutorial, we will learn about the Python List remove() method with the help of examples. The remove() method removes the first matching element (which is passed as an argument) from the list.
There are three ways in which you can Remove elements from List: Using the remove() method. Using the list object's pop() method. Using the del operator.
The date class is used to instantiate date objects in Python. When an object of this class is instantiated, it represents a date in the format YYYY-MM-DD. Constructor of this class needs three mandatory arguments year, month and date.
^(?:(?:[0-9]{2}[:\/,]){2}[0-9]{2,4}|am|pm)$
This regular expression will do the following:
12/23/2016
and times 12:34:56
am
or pm
which are probably part of the preceding time in the source listLive Demo
Sample List
08/20/2014
10:04:27
pm
complete
vendor
per
mfg/recommend
08/20/2014
10:04:27
pm
complete
List After Processing
complete
vendor
per
mfg/recommend
complete
Sample Python Script
import re
SourceList = ['08/20/2014',
'10:04:27',
'pm',
'complete',
'vendor',
'per',
'mfg/recommend',
'08/20/2014',
'10:04:27',
'pm',
'complete']
OutputList = filter(
lambda ThisWord: not re.match('^(?:(?:[0-9]{2}[:\/,]){2}[0-9]{2,4}|am|pm)$', ThisWord),
SourceList)
for ThisValue in OutputList:
print ThisValue
NODE EXPLANATION
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
(?: group, but do not capture (2 times):
----------------------------------------------------------------------
[0-9]{2} any character of: '0' to '9' (2 times)
----------------------------------------------------------------------
[:\/,] any character of: ':', '\/', ','
----------------------------------------------------------------------
){2} end of grouping
----------------------------------------------------------------------
[0-9]{2,4} any character of: '0' to '9' (between 2
and 4 times (matching the most amount
possible))
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
am 'am'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
pm 'pm'
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
$ before an optional \n, and the end of the
string
----------------------------------------------------------------------
if you want math the time and date string in your list, maybe you can try below regex:
[0-9]{2}[\/,:][0-9]{2}[\/,:][0-9]{2,4}
add the python code:
import re
list_of_words = [
'08/20/2014',
'10:04:27',
'pm',
'complet',
'vendor',
'per',
'mfg/recommend',
'08/20/2014',
'10:04:27',
'pm',
'complet'
]
new_list = [item for item in list_of_words if not re.search(r'[0-9]{2}[\/,:][0-9]{2}[\/,:][0-9]{2,4}', item)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With