Passing Python objects between Tasks in Luigi?

Tags:

I was coding my first project in Python 3.6 using Spotify's Luigi to arrange some Natural Language Processing Tasks in a pipeline.

I noticed that the output() function of a Task class always returns some kind of Target object, which is just some file somewhere, be it local or remote. Because my Tasks produce more complex data structures like parse trees, it's pretty awkward for me to write them into files as strings and read them again after.

Therefore I would like to ask if there is any possibility to pass Python objects between the tasks within a pipeline?

250

asked Feb 28 '17 17:02

Kaleidophon

1 Answers

Short answer: No.

Luigi parameters are limited to date/datetime objects, string, int and float. See docs for reference.

That means that you need to serialize your complex data structure as a string (using json, msgpack, whatever serializer you like, and even compress it) and pass it as a string parameter.

Of course, you may write a custom Parameter subclass, but you'll need to implement the serialize and parse methods basically.

But take into account: if you use parameters instead of saving your calculated data to a target, you will be loosing one key advantage of using Luigi: if the parent task in the tree fails more than the count of retries you specify, then you´ll need to run the task that calculates that complex data structure again. If your tasks calculates complex data or takes a considerable amount of time or consumes a lot of resources, then you should save the output as a target in order to not having to do all that expensive computation again.

And looking beyond: another task may need that data too, so why not save it?

Also, notice that targets are not only files: you may save your data to a database table, Redis, Hadoop, an Elastic Search index, and many more: http://luigi.readthedocs.io/en/stable/api/luigi.contrib.html#submodules

132

answered Nov 15 '22 17:11

matagus

Related questions
                            
                                RuntimeWarning: invalid value encountered in arccos
                            
                                pandas: sorting observations within groupby groups
                            
                                Api key and Django Rest Framework Auth Token
                            
                                Setting default value after initialization in SelectField flask-WTForms
                            
                                Python: how to add a column to a pandas dataframe between two columns?
                            
                                Lowercasing script in Python vs Perl
                            
                                VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
                            
                                build a DataFrame with columns from tuple of arrays
                            
                                Is it possible to kill the parent thread from within a child thread in python?
                            
                                python thrift error ```TSocket read 0 bytes```
                            
                                Sum of several columns from a pandas dataframe
                            
                                Rank mismatch: Rank of labels (received 2) should equal rank of logits minus 1 (received 2)
                            
                                Jacobian and Hessian inputs in `scipy.optimize.minimize`
                            
                                How to create a pygame surface from a numpy array of float32?
                            
                                How to adjust subplot size in seaborn?
                            
                                How to find what matched in any() with Python?
                            
                                delete rows based on a condition in pandas
                            
                                How to create vector of symbolic variables in sympy
                            
                                Pandas division of two columns with groupby
                            
                                When using cx_Freeze and tkinter I get: "DLL load failed: The specified module could not be found." (Python 3.5.3)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Passing Python objects between Tasks in Luigi?

Tags:

python

python-3.6

luigi

Kaleidophon

People also ask

1 Answers

matagus

Recent Activity

Donate For Us