Get full url from shorten url using python

Question

I am having list of urls like ,

l=['bit.ly/1bdDlXc','bit.ly/1bdDlXc',.......,'bit.ly/1bdDlXc']

I just want to see the full url from the short one for every element in that list.

Here is my approach,

import urllib2

for i in l:
    print urllib2.urlopen(i).url

But when list contains thousands of url , the program takes long time.

My question : Is there is any way to reduce execution time or any other approach I have to follow ?

Roberto Reale · Accepted Answer

First method

As suggested, one way to accomplish the task would be to use the official api to bitly, which has, however, limitations (e.g., no more than 15 shortUrl's per request).

Second method

As an alternative, one could just avoid getting the contents, e.g. by using the HEAD HTTP method instead of GET. Here is just a sample code, which makes use of the excellent requests package:

import requests

l=['bit.ly/1bdDlXc','bit.ly/1bdDlXc',.......,'bit.ly/1bdDlXc']

for i in l:
    print requests.head("http://"+i).headers['location']

Robᵩ · Answer

I'd try twisted's asynchronous web client. Be careful with this, though, it doesn't rate-limit at all.

#!/usr/bin/python2.7

from twisted.internet import reactor
from twisted.internet.defer import Deferred, DeferredList, DeferredLock
from twisted.internet.defer import inlineCallbacks
from twisted.web.client import Agent, HTTPConnectionPool
from twisted.web.http_headers import Headers
from pprint import pprint
from collections import defaultdict
from urlparse import urlparse
from random import randrange
import fileinput

pool = HTTPConnectionPool(reactor)
pool.maxPersistentPerHost = 16
agent = Agent(reactor, pool)
locks = defaultdict(DeferredLock)
locations = {}

def getLock(url, simultaneous = 1):
    return locks[urlparse(url).netloc, randrange(simultaneous)]

@inlineCallbacks
def getMapping(url):
    # Limit ourselves to 4 simultaneous connections per host
    # Tweak this as desired, but make sure that it no larger than
    # pool.maxPersistentPerHost
    lock = getLock(url,4)
    yield lock.acquire()
    try:
        resp = yield agent.request('HEAD', url)
        locations[url] = resp.headers.getRawHeaders('location',[None])[0]
    except Exception as e:
        locations[url] = str(e)
    finally:
        lock.release()


dl = DeferredList(getMapping(url.strip()) for url in fileinput.input())
dl.addCallback(lambda _: reactor.stop())

reactor.run()
pprint(locations)

Get full url from shorten url using python

Tags:

python

Nishant Nawarkhede

2 Answers

Roberto Reale

Robᵩ

Recent Activity

Donate For Us

Get full url from shorten url using python

Tags:

python

Nishant Nawarkhede

2 Answers

Roberto Reale

Robᵩ

Related questions

Recent Activity

Donate For Us