I have 3 separate jupyter notebook files that deal with separate data frames. I clean and manipulate the data in these notebooks for each df. Is there a way to reference the cleaned up/final data in a separate notebook?
My concern is that if I work on all 3 dfs in one notebook and then do more with it after (merge/join), it will be a mile long. I also don't want to re-write a bunch of code just to get data ready for use in my new notebook.
If you are using pandas data frames then one approach is to use pandas.DataFrame.to_csv()
and pandas.read_csv()
to save and load the cleaned data between each step.
If this is your data:
import pandas as pd
raw_data = {'id': [10, 20, 30],
'name': ['foo', 'bar', 'baz']
}
input = pd.DataFrame(raw_data, columns = ['id', 'name'])
Then in notebook1.ipynb, process it like this:
# load
df = pd.read_csv('input.csv', index_col=0)
# manipulate frame here
# ...
# save
df.to_csv('result1.csv')
...and repeat that process for each stage in the chain.
# load
df = pd.read_csv('result1.csv', index_col=0)
# manipulate frame here
# ...
# save
df.to_csv('result2.csv')
At the end, your notebook collection will look like this:
Documentation:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With