Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove dates from a list in Python

Tags:

python

regex

nltk

I have a list of tokenized text (list_of_words) that looks something like this:

list_of_words = 
['08/20/2014',
 '10:04:27',
 'pm',
 'complet',
 'vendor',
 'per',
 'mfg/recommend',
 '08/20/2014',
 '10:04:27',
 'pm',
 'complet',
 ...]

and I'm trying to strip out all the instances of dates and times from this list. I've tried using the .remove() function, to no avail. I've tried passing wildcard characters, such as '../../...." to a list of stopwords I was sorting with, but that didn't work. I finally tried writing the following code:

for line in list_of_words:
    if re.search('[0-9]{2}/[09]{2}/[0-9]{4}',line):
        list_of_words.remove(line)

but that doesn't work either. How can I strip out everything formatted like a date or time from my list?

like image 532
MrYuck Avatar asked May 27 '16 00:05

MrYuck


People also ask

How do you remove data from a list in Python?

How to Remove an Element from a List Using the remove() Method in Python. To remove an element from a list using the remove() method, specify the value of that element and pass it as an argument to the method. remove() will search the list to find it and remove it.

What is remove () in Python?

In this tutorial, we will learn about the Python List remove() method with the help of examples. The remove() method removes the first matching element (which is passed as an argument) from the list.

How do I remove a specific item from a list?

There are three ways in which you can Remove elements from List: Using the remove() method. Using the list object's pop() method. Using the del operator.

What does Date () do in Python?

The date class is used to instantiate date objects in Python. When an object of this class is instantiated, it represents a date in the format YYYY-MM-DD. Constructor of this class needs three mandatory arguments year, month and date.


2 Answers

Description

^(?:(?:[0-9]{2}[:\/,]){2}[0-9]{2,4}|am|pm)$

Regular expression visualization

This regular expression will do the following:

  • find strings which look like dates 12/23/2016 and times 12:34:56
  • find strings which also are also am or pm which are probably part of the preceding time in the source list

Example

Live Demo

  • Regex: https://regex101.com/r/yE8oB9/2
  • Python: http://codepad.org/X9D3pd7s

Sample List

08/20/2014
10:04:27
pm
complete
vendor
per
mfg/recommend
08/20/2014
10:04:27
pm
complete

List After Processing

complete
vendor
per
mfg/recommend
complete

Sample Python Script

import re

SourceList = ['08/20/2014',
                 '10:04:27',
                 'pm',
                 'complete',
                 'vendor',
                 'per',
                 'mfg/recommend',
                 '08/20/2014',
                 '10:04:27',
                 'pm', 
                 'complete']

OutputList = filter(
    lambda ThisWord: not re.match('^(?:(?:[0-9]{2}[:\/,]){2}[0-9]{2,4}|am|pm)$', ThisWord),
    SourceList)


for ThisValue in OutputList:
  print ThisValue

Explanation

NODE                     EXPLANATION
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  (?:                      group, but do not capture:
----------------------------------------------------------------------
    (?:                      group, but do not capture (2 times):
----------------------------------------------------------------------
      [0-9]{2}                 any character of: '0' to '9' (2 times)
----------------------------------------------------------------------
      [:\/,]                   any character of: ':', '\/', ','
----------------------------------------------------------------------
    ){2}                     end of grouping
----------------------------------------------------------------------
    [0-9]{2,4}               any character of: '0' to '9' (between 2
                             and 4 times (matching the most amount
                             possible))
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    am                       'am'
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    pm                       'pm'
----------------------------------------------------------------------
  )                        end of grouping
----------------------------------------------------------------------
  $                        before an optional \n, and the end of the
                           string
----------------------------------------------------------------------
like image 179
Ro Yo Mi Avatar answered Sep 25 '22 03:09

Ro Yo Mi


if you want math the time and date string in your list, maybe you can try below regex:

[0-9]{2}[\/,:][0-9]{2}[\/,:][0-9]{2,4}

enter image description here

add the python code:

import re

list_of_words = [
 '08/20/2014',
 '10:04:27',
 'pm',
 'complet',
 'vendor',
 'per',
 'mfg/recommend',
 '08/20/2014',
 '10:04:27',
 'pm',
 'complet'
]
new_list = [item for item in list_of_words if not re.search(r'[0-9]{2}[\/,:][0-9]{2}[\/,:][0-9]{2,4}', item)]
like image 44
BertramLAU Avatar answered Sep 26 '22 03:09

BertramLAU