Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I print debugging info from Sagemaker training?

I have a jupyter notebook script that just launches a training script, presumably in a docker container.

I added some print statements in that training script but it's not showing up in the notebook or CloudWatch.

I'm using regular print() statement. How should I log debugging from the training script?

like image 820
kane Avatar asked Dec 11 '18 02:12

kane


People also ask

Where does SageMaker store training data?

If you use file mode, SageMaker downloads the training data from the storage location to a local directory in the Docker container. Training starts after the full dataset has been downloaded. In file mode, the training instance must have enough storage space to fit the entire dataset.

How do I view job logs in SageMaker?

Amazon SageMaker algorithms produce Amazon CloudWatch logs, which provide detailed information on the training process. To see the logs, in the AWS management console, choose CloudWatch, choose Logs, and then choose the /aws/sagemaker/TrainingJobs log group.

How do you debug SageMaker pipeline?

To enable SageMaker Debugger in your training jobs, you need to define the additional parameters to configure the debugger. First, use debug_hook_config to select the tensor groups you want to collect for analysis and specify the frequency at which you want to save them.


Video Answer


1 Answers

I've seen this when Python tries to buffer stdout, which doesn't always play nice with Docker -- adding ENV PYTHONUNBUFFERED=1 to your Dockerfile (and then rebuilding the image) would solve this problem, if this is the cause.

like image 60
Andre Avatar answered Oct 21 '22 12:10

Andre