Python Requests getting ('Connection aborted.', BadStatusLine("''",)) error

Question

def download_torrent(url):
    fname = os.getcwd() + '/' + url.split('title=')[-1] + '.torrent'
    try:
        schema = ('http:')
        r = requests.get(schema + url, stream=True)
        with open(fname, 'wb') as f:
            for chunk in r.iter_content(chunk_size=1024):
                if chunk:
                    f.write(chunk)
                    f.flush()
    except requests.exceptions.RequestException as e:
        print('
' + OutColors.LR + str(e))
        sys.exit(1)

    return fname

In that block of code I am getting an error when I run the full script. When I go to actually download the torrent, I get:

('Connection aborted.', BadStatusLine("''",))

I only posted the block of code that I think is relevant above. The entire script is below. It's from pantuts, but I don't think it's maintained any longer, and I am trying to get it running with python3. From my research, the error might mean I'm using http instead of https, but I have tried both.

Original script

sorbet · Accepted Answer

The error you get indicates the host isn't responding in the expected manner. In this case, it's because it detects that you're trying to scrape it and deliberately disconnecting you.

If you try your requests code with this URL from a test website: http://mirror.internode.on.net/pub/test/5meg.test1, you'll see that it downloads normally.

To get around this, fake your user agent. Your user agent identifies your web browser, and web hosts commonly check it to detect bots.

Use the headers field to set your user agent. Here's an example which tells the webhost you're Firefox.

headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 6.0; WOW64; rv:24.0) Gecko/20100101 Firefox/24.0' }
r = requests.get(url, headers=headers)

There are lots of other discrepancies¹ between bots and human-operated browsers that web hosts can check for, but user agent is one of the easiest and common ones.

If you want your scraper to be harder to detect, you'll want to use a headless browser like headless Chrome² (or ghost.py if you want to stick with Python), which you can trust will behave like a real browser (because it is!).

_Footnotes:

_{¹Possible other checks include checks for if images aren't being downloaded, page resources aren't downloaded in the normal order, pages being downloaded faster than a human can read them, and cookies not being set properly. Google flags mouse movements deemed insufficiently human-like.}

_{²Headless Chrome is the most competent headless browser in 2018, but if its weight is a problem for you, its slightly-outdated predecessors, PhantomJS and ghost.py, are lighter weight and still usable.}

Mkurbanov · Answer

try this:

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.0; WOW64; rv:24.0) Gecko/20100101 Firefox/24.0',
    'ACCEPT' : 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
    'ACCEPT-ENCODING' : 'gzip, deflate, br',
    'ACCEPT-LANGUAGE' : 'ru-RU,ru;q=0.9,en-US;q=0.8,en;q=0.7',
    'REFERER' : 'https://www.google.com/'
}

    r = requests.get("http://yourdomain.com/", headers=headers)

M.ison · Answer

In my case, i must remove the user agent fields from headers

url='https://...'
headers = {}
requests.get(url, headers=headers)

once i set 'User-Agent', it getting ('Connection aborted.', BadStatusLine("''",)) and this error occurs only with the individual site. my first post,i get many helps from this site, hope it can help others who find here

Python Requests getting ('Connection aborted.', BadStatusLine("''",)) error

Tags:

python

python-3.x

python-requests

eurabilis

Video Answer

3 Answers

sorbet

Mkurbanov

M.ison

Recent Activity

Donate For Us

Python Requests getting ('Connection aborted.', BadStatusLine("''",)) error

Tags:

python

python-3.x

python-requests

eurabilis

Video Answer

3 Answers

sorbet

Mkurbanov

M.ison

Related questions

Recent Activity

Donate For Us