Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to save sklearn model on s3 using joblib.dump?

I have a sklearn model and I want to save the pickle file on my s3 bucket using joblib.dump

I used joblib.dump(model, 'model.pkl') to save the model locally, but I do not know how to save it to s3 bucket.

s3_resource = boto3.resource('s3')
s3_resource.Bucket('my-bucket').Object("model.pkl").put(Body=joblib.dump(model, 'model.pkl'))

I expect the pickled file to be on my s3 bucket.

like image 687
the_dummy Avatar asked Jun 12 '19 23:06

the_dummy


People also ask

What is Joblib dump?

By default, joblib.dump() uses the zlib compression method as it gives the best tradeoff between speed and disk space. The other supported compression methods are 'gzip', 'bz2', 'lzma' and 'xz': >>> # Dumping in a gzip compressed file using a compress level of 3. >>> joblib.

Can Joblib load pickle?

WARNING: joblib. load relies on the pickle module and can therefore execute arbitrary Python code. It should therefore never be used to load files from untrusted sources.

What are Joblib files?

Joblib is a set of tools to provide lightweight pipelining in Python. In particular: transparent disk-caching of functions and lazy re-evaluation (memoize pattern) easy simple parallel computing.


1 Answers

Here's a way that worked for me. Pretty straight forward and easy. I'm using joblib (it's better for storing large sklearn models) but you could use pickle too.
Also, I'm using temporary files for transferring to/from S3. But if you want, you could store the file in a more permanent location.

import tempfile
import boto3
import joblib

s3_client = boto3.client('s3')
bucket_name = "my-bucket"
key = "model.pkl"

# WRITE
with tempfile.TemporaryFile() as fp:
    joblib.dump(model, fp)
    fp.seek(0)
    s3_client.put_object(Body=fp.read(), Bucket=bucket_name, Key=key)

# READ
with tempfile.TemporaryFile() as fp:
    s3_client.download_fileobj(Fileobj=fp, Bucket=bucket_name, Key=key)
    fp.seek(0)
    model = joblib.load(fp)

# DELETE
s3_client.delete_object(Bucket=bucket_name, Key=key)
like image 161
Alexei Andreev Avatar answered Sep 19 '22 18:09

Alexei Andreev