Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there an easy way to have "checkpoints" in an extended python script?

To preface my question let me give a bit of context: I'm currently working on a data pipeline that has a number of different steps. Each step can go wrong and many take some time (not a huge amount, but on the order of minutes).

For this reason the pipeline is currently heavily supervised by humans. An analyst goes through each step, running the python code in a Jupyter notebook and upon experiencing problems will make minor code adjustments inline and repeat that section.

In the long run, the goal here is to have zero human intervention. However, in the shorter term we'd like to make this process more seamless. The simplest way to do that seems like it would be to split each section into it's own script, and have a parent script that runs each bit and verifies output. However, we also need the ability to rerun the a file with an identical setup if it fails.

For example:

run a --> ✅
run b --> ✅ (b relies on some data produced by a)
run c --> ❌ (c relies on data produced by a and b)
// make some changes to c
run c --> ✅ (c should run in an identical state to its original run)

The most obvious way to do this would be to write output from each script to a file, and load all of these scripts into the next one. This would work, but seems a bit inelegant. A database seems another valid option, but a lot of the data doesn't fit cleanly into a db format.

Does anyone have any suggestions for some ways to achieve what I'm looking for? If anything is unclear I'm also more than happy to clarify any points!

like image 395
Peter Dolan Avatar asked Apr 11 '18 19:04

Peter Dolan


1 Answers

You could create an object that basically maintains the state after each step and use pickle to serialize that object to a file.

Then it's up to your python script to unpickle that file and then decide which step it needs to start from based on the state.

https://wiki.python.org/moin/UsingPickle

like image 69
Eric Yang Avatar answered Oct 04 '22 03:10

Eric Yang