Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Metaflow data objects not storing in S3

I have the following Metaflow file that runs successfully with the following step :

@step
def scale(self):
    import redshift
    import pandas as pd
    self.event_matrix = self.jointable.pivot_table(index='user_name', columns='event_name', values='odds')
    self.event_matrix_t_scaled = self.event_matrix.T.apply(redshift.scale_user)
    self.tester = 1

    self.next(self.end)

when I open up a notebook and run

run = Flow("Recommender").latest_successful_run
print(f'Using run: {run}')
print(run.data)

It outputs

<MetaflowData: event_user_scaled_matrix, tester, event_matrix, jointable, event_matrix_t_scaled>

When I run run.data.event_matrix, it returns a data frame, however when I run run.data. event_user_scaled_matrix , run.data. event_matrix_t_scaled and run.data. tester, these all return the error:

S3 datastore operation _get_s3_object failed (An error occurred (400) when calling the HeadObject operation: Bad Request). Retrying 7 more times..

which leads me to believe that these objects are not getting written to an S3 bucket. But I don't understand what is different between the object that works and all of this which do not work.

Can someone help me see what I am missing?

like image 883
ai.jennetta Avatar asked Mar 23 '26 23:03

ai.jennetta


1 Answers

You can see the path that Metaflow is trying to access in S3 by using run.end_task.artifacts.tester._object for example. This may allow you to debug where the file is supposed to be and why it may no longer be there. There should be no difference between the artifacts you mentioned.

Source: I am a Metaflow developer.

like image 81
Romain Avatar answered Mar 26 '26 13:03

Romain