Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert YouTube API duration to seconds?

For the sake of interest I want to convert video durations from YouTubes ISO 8601 to seconds. To future proof my solution, I picked a really long video to test it against.

The API provides this for its duration - "duration": "P1W2DT6H21M32S"

I tried parsing this duration with dateutil as suggested in stackoverflow.com/questions/969285.

import dateutil.parser
duration = = dateutil.parser.parse('P1W2DT6H21M32S')

This throws an exception

TypeError: unsupported operand type(s) for +=: 'NoneType' and 'int'

What am I missing?

like image 381
Morgan Wilde Avatar asked May 24 '13 19:05

Morgan Wilde


4 Answers

Python's built-in dateutil module only supports parsing ISO 8601 dates, not ISO 8601 durations. For that, you can use the "isodate" library (in pypi at https://pypi.python.org/pypi/isodate -- install through pip or easy_install). This library has full support for ISO 8601 durations, converting them to datetime.timedelta objects. So once you've imported the library, it's as simple as:

import isodate
dur = isodate.parse_duration('P1W2DT6H21M32S')
print(dur.total_seconds())
like image 115
jlmcdonald Avatar answered Nov 19 '22 17:11

jlmcdonald


Works on python 2.7+. Adopted from a JavaScript one-liner for Youtube v3 question here.

import re

def YTDurationToSeconds(duration):
  match = re.match('PT(\d+H)?(\d+M)?(\d+S)?', duration).groups()
  hours = _js_parseInt(match[0]) if match[0] else 0
  minutes = _js_parseInt(match[1]) if match[1] else 0
  seconds = _js_parseInt(match[2]) if match[2] else 0
  return hours * 3600 + minutes * 60 + seconds

# js-like parseInt
# https://gist.github.com/douglasmiranda/2174255
def _js_parseInt(string):
    return int(''.join([x for x in string if x.isdigit()]))

# example output 
YTDurationToSeconds(u'PT15M33S')
# 933

Handles iso8061 duration format to extent Youtube Uses up to hours

like image 27
StanleyZheng Avatar answered Nov 19 '22 15:11

StanleyZheng


Here's my answer which takes 9000's regex solution (thank you - amazing mastery of regex!) and finishes the job for the original poster's YouTube use case i.e. converting hours, minutes, and seconds to seconds. I used .groups() instead of .groupdict(), followed by a couple of lovingly constructed list comprehensions.

import re

def yt_time(duration="P1W2DT6H21M32S"):
    """
    Converts YouTube duration (ISO 8061)
    into Seconds

    see http://en.wikipedia.org/wiki/ISO_8601#Durations
    """
    ISO_8601 = re.compile(
        'P'   # designates a period
        '(?:(?P<years>\d+)Y)?'   # years
        '(?:(?P<months>\d+)M)?'  # months
        '(?:(?P<weeks>\d+)W)?'   # weeks
        '(?:(?P<days>\d+)D)?'    # days
        '(?:T' # time part must begin with a T
        '(?:(?P<hours>\d+)H)?'   # hours
        '(?:(?P<minutes>\d+)M)?' # minutes
        '(?:(?P<seconds>\d+)S)?' # seconds
        ')?')   # end of time part
    # Convert regex matches into a short list of time units
    units = list(ISO_8601.match(duration).groups()[-3:])
    # Put list in ascending order & remove 'None' types
    units = list(reversed([int(x) if x != None else 0 for x in units]))
    # Do the maths
    return sum([x*60**units.index(x) for x in units])

Sorry for not posting higher up - still new here and not enough reputation points to add comments.

like image 7
Peter F Avatar answered Nov 19 '22 16:11

Peter F


Isn't the video 1 week, 2 days, 6 hours 21 minutes 32 seconds long?

Youtube shows it as 222 hours 21 minutes 17 seconds; 1 * 7 * 24 + 2 * 24 + 6 = 222. I don't know where 17 seconds vs 32 seconds discrepancy comes from, though; can as well be a rounding error.

To my mind, writing a parser for that is not that hard. Unfortunately dateutil does not seem to parse intervals, only datetime points.

Update:

I see that there's a package for this, but just as an example of regexp power, brevity, and incomprehensible syntax, here's a parser for you:

import re

# see http://en.wikipedia.org/wiki/ISO_8601#Durations
ISO_8601_period_rx = re.compile(
    'P'   # designates a period
    '(?:(?P<years>\d+)Y)?'   # years
    '(?:(?P<months>\d+)M)?'  # months
    '(?:(?P<weeks>\d+)W)?'   # weeks
    '(?:(?P<days>\d+)D)?'    # days
    '(?:T' # time part must begin with a T
    '(?:(?P<hours>\d+)H)?'   # hourss
    '(?:(?P<minutes>\d+)M)?' # minutes
    '(?:(?P<seconds>\d+)S)?' # seconds
    ')?'   # end of time part
)


from pprint import pprint
pprint(ISO_8601_period_rx.match('P1W2DT6H21M32S').groupdict())

# {'days': '2',
#  'hours': '6',
#  'minutes': '21',
#  'months': None,
#  'seconds': '32',
#  'weeks': '1',
#  'years': None}

I deliberately am not calculating the exact number of seconds from these data here. It looks trivial (see above), but really isn't. For instance, distance of 2 months from January 1st is 58 days (30+28) or 59 (30+29), depending on year, while from March 1st it's always 61 days. A proper calendar implementation should take all this into account; for a Youtube clip length calculation, it must be excessive.

like image 4
9000 Avatar answered Nov 19 '22 15:11

9000