Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bonobo: How to use multiple data sources?

I am using Bonobo very first time. I was able to figure out basic example. I am interested to know how can I put two different kind of data inputs in extract step? Say I am scraping data from two different sites, how to add them in pipeline?

Thanks

like image 610
Volatil3 Avatar asked Nov 07 '22 08:11

Volatil3


1 Answers

You can have two different extract steps (or n different ones).

For example:

import bonobo


def extract_1():
    yield "x1", "a"
    yield "x1", "b"
    yield "x1", "c"


def extract_2():
    yield "x2", "a"
    yield "x2", "b"
    yield "x2", "c"


def extract_3():
    yield "x3", "a"
    yield "x3", "b"
    yield "x3", "c"


def normalize(name, value):
    yield name.upper(), value


def get_graph(**options):
    graph = bonobo.Graph()
    graph.add_chain(normalize, print, _input=None)
    graph.add_chain(extract_1, _output=normalize)
    graph.add_chain(extract_2, _output=normalize)
    graph.add_chain(extract_3, _output=normalize)
    return graph


if __name__ == "__main__":
    with bonobo.parse_args() as options:
        bonobo.run(get_graph(**options))

Note that each node has a first-in-first-out constraint, but that "normalize" will get nodes in a random order as the extractors yield data.

like image 88
Romain Avatar answered Nov 11 '22 21:11

Romain