Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I build a list of async tasks with argument for AsyncHTMLSession().run?

From the documentation I have this example I've tested and works..

from requests_html import AsyncHTMLSession

asession = AsyncHTMLSession()

async def get_pythonorg():
    r = await asession.get('https://python.org/')

async def get_reddit():
    r = await asession.get('https://reddit.com/')

async def get_google():
    r = await asession.get('https://google.com/')

result = asession.run(get_pythonorg, get_reddit, get_google)

But what if my urls are variable? I'd like to do this..

from requests_html import AsyncHTMLSession

urls = ('https://python.org/', 'https://reddit.com/', 'https://google.com/')

asession = AsyncHTMLSession()

async def get_url(url):
    r = await asession.get(url)

tasks = []
for url in urls:
    tasks.append(get_url(url=url))

result = asession.run(*tasks)

but I get..

Traceback (most recent call last):   File "./test.py", line 17, in <module>
    result = asession.run(*tasks)   File "/home/deanresin/.local/lib/python3.7/site-packages/requests_html.py", line 772, in run
    asyncio.ensure_future(coro()) for coro in coros   File "/home/deanresin/.local/lib/python3.7/site-packages/requests_html.py", line 772, in <listcomp>
    asyncio.ensure_future(coro()) for coro in coros TypeError: 'coroutine' object is not callable sys:1: RuntimeWarning: coroutine 'get_url' was never awaited

1 Answers

TLTR:

It is because you are passing coroutines objects and not coroutines functions.

You can do:

from requests_html import AsyncHTMLSession

urls = ('https://python.org/', 'https://reddit.com/', 'https://google.com/')

asession = AsyncHTMLSession()

async def get_url(url):
    r = await asession.get(url)
    # if you want async javascript rendered page:
    await r.html.arender() 
    return r

all_responses = asession.run(*[lambda url=url: get_url(url) for url in urls])

Explanations:

The error is coming from result = asession.run(*tasks) so let's see the source code of AsyncHTMLSession.run() :

def run(self, *coros):
    """ Pass in all the coroutines you want to run, it will wrap each one
        in a task, run it and wait for the result. Return a list with all
        results, this is returned in the same order coros are passed in. """
    tasks = [
        asyncio.ensure_future(coro()) for coro in coros
    ]
    done, _ = self.loop.run_until_complete(asyncio.wait(tasks))
    return [t.result() for t in done]

So in the following list comprehension you are normally passing a callable coroutine function and not coroutine object

tasks = [
        asyncio.ensure_future(coro()) for coro in coros
    ]

But you then in your error you have for coro in coros TypeError: 'coroutine' object is not callable.
So you are passing a list of coroutines objects and not coroutines functions.

Indeed when you are doing this:

tasks = []
for url in urls:
    tasks.append(get_url(url=url))

You are making a list of coroutines objects by calling your coroutine function.

So in order to make a list of coroutines functions you can use lambda function like this:

[lambda url=url: get_url(url) for url in urls]

Note the url=url in order to make the url parameter accessed when the lambda is defined.
More informations about this here.

like image 177
Dorian Massoulier Avatar answered Oct 30 '25 13:10

Dorian Massoulier



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!