Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extracting year from string in python

Tags:

python

regex

How can I parse the foll. in python to extract the year:

'years since 1250-01-01 0:0:0'

The answer should be 1250

like image 888
user308827 Avatar asked Oct 19 '16 03:10

user308827


People also ask

How do you extract a string from text in Python?

You can extract a substring from a string before a specific character using the rpartition() method. rpartition() method partitions the given string based on the last occurrence of the delimiter and it generates tuples that contain three elements where.

How do you find the part of a string in Python?

Python provides different ways and methods to generate a substring, to check if a substring is present, to get the index of a substring, and more. start - The starting index of the substring. stop - The final index of a substring. step - A number specifying the step of the slicing.


1 Answers

There are all sorts of ways to do it, here are several options:

  • dateutil parser in a "fuzzy" mode:

    In [1]: s = 'years since 1250-01-01 0:0:0'
    
    In [2]: from dateutil.parser import parse
    
    In [3]: parse(s, fuzzy=True).year  # resulting year would be an integer
    Out[3]: 1250
    
  • regular expressions with a capturing group:

    In [2]: import re
    
    In [3]: re.search(r"years since (\d{4})", s).group(1)
    Out[3]: '1250'
    
  • splitting by "since" and then by a dash:

    In [2]: s.split("since", 1)[1].split("-", 1)[0].strip()
    Out[2]: '1250'
    
  • or may be even splitting by the first dash and slicing the first substring:

    In [2]: s.split("-", 1)[0][-4:]
    Out[2]: '1250'
    

The last two involve more "moving parts" and might not be applicable depending on possible variations of the input string.

like image 56
alecxe Avatar answered Sep 17 '22 03:09

alecxe