I'm trying to write an etl pipeline using luigi. As far as I understand from the documentation a task in luigi can generate a target that can be either some type of file storage or a database. To decrese the processing time I would like to have as an output an in-memory list. Is this possible? Do I have to create a custom target?
A Luigi task is where the execution of your pipeline and the definition of each task’s input and output dependencies take place. Tasks are the building blocks that you will create your pipeline from. You define them in a class, which contains: A run () method that holds the logic for executing the task.
write the most common words to target output file. Run the pipeline with the following command: Luigi will execute the remaining tasks needed to generate the summary of the top words: You can visualize the execution of the pipeline from the Luigi scheduler. Select the GetTopBooks task in the task list and press the View Graph button.
Luigi is a Python package that manages long-running batch processing, which is the automated running of data processing jobs on batches of items. Luigi allows you to define a data processing job as a set of dependent tasks.
To execute the task you created, run the following command: Here, you run the task using python -m instead of executing the luigi command directly; this is because Luigi can only execute code that is within the current PYTHONPATH. You can alternatively add PYTHONPATH='.' to the front of your Luigi command, like so:
I found out I can use a MockFile as a target. A good example:
http://gouthamanbalaraman.com/blog/building-luigi-task-pipeline.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With