Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Import data frame from one Jupyter Notebook file to another

I have 3 separate jupyter notebook files that deal with separate data frames. I clean and manipulate the data in these notebooks for each df. Is there a way to reference the cleaned up/final data in a separate notebook?

My concern is that if I work on all 3 dfs in one notebook and then do more with it after (merge/join), it will be a mile long. I also don't want to re-write a bunch of code just to get data ready for use in my new notebook.

like image 607
user3088202 Avatar asked Oct 10 '17 19:10

user3088202


1 Answers

If you are using pandas data frames then one approach is to use pandas.DataFrame.to_csv() and pandas.read_csv() to save and load the cleaned data between each step.

  1. Notebook1 loads input1 and saves result1.
  2. Notebook2 loads result1 and saves result2.
  3. Notebook3 loads result2 and saves result3.

If this is your data:

import pandas as pd
raw_data = {'id': [10, 20, 30], 
            'name': ['foo', 'bar', 'baz']
           }
input = pd.DataFrame(raw_data, columns = ['id', 'name'])

Then in notebook1.ipynb, process it like this:

# load
df = pd.read_csv('input.csv', index_col=0)
# manipulate frame here
# ...
# save
df.to_csv('result1.csv')

...and repeat that process for each stage in the chain.

# load
df = pd.read_csv('result1.csv', index_col=0)
# manipulate frame here
# ...
# save
df.to_csv('result2.csv')

At the end, your notebook collection will look like this:

  • input.csv
  • notebook1.ipynb
  • notebook2.ipynb
  • notebook3.ipynb
  • result1.csv
  • result2.csv
  • result3.csv

Documentation:

  • pandas.read_csv
  • pandas.DataFrame.to_csv
like image 164
JeremyDouglass Avatar answered Oct 21 '22 07:10

JeremyDouglass