I am using Bonobo very first time. I was able to figure out basic example. I am interested to know how can I put two different kind of data inputs in extract step? Say I am scraping data from two different sites, how to add them in pipeline?
Thanks
You can have two different extract steps (or n different ones).
For example:
import bonobo
def extract_1():
yield "x1", "a"
yield "x1", "b"
yield "x1", "c"
def extract_2():
yield "x2", "a"
yield "x2", "b"
yield "x2", "c"
def extract_3():
yield "x3", "a"
yield "x3", "b"
yield "x3", "c"
def normalize(name, value):
yield name.upper(), value
def get_graph(**options):
graph = bonobo.Graph()
graph.add_chain(normalize, print, _input=None)
graph.add_chain(extract_1, _output=normalize)
graph.add_chain(extract_2, _output=normalize)
graph.add_chain(extract_3, _output=normalize)
return graph
if __name__ == "__main__":
with bonobo.parse_args() as options:
bonobo.run(get_graph(**options))
Note that each node has a first-in-first-out constraint, but that "normalize" will get nodes in a random order as the extractors yield data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With