I have a jupyter notebook script that just launches a training script, presumably in a docker container.
I added some print statements in that training script but it's not showing up in the notebook or CloudWatch.
I'm using regular print() statement. How should I log debugging from the training script?
If you use file mode, SageMaker downloads the training data from the storage location to a local directory in the Docker container. Training starts after the full dataset has been downloaded. In file mode, the training instance must have enough storage space to fit the entire dataset.
Amazon SageMaker algorithms produce Amazon CloudWatch logs, which provide detailed information on the training process. To see the logs, in the AWS management console, choose CloudWatch, choose Logs, and then choose the /aws/sagemaker/TrainingJobs log group.
To enable SageMaker Debugger in your training jobs, you need to define the additional parameters to configure the debugger. First, use debug_hook_config to select the tensor groups you want to collect for analysis and specify the frequency at which you want to save them.
I've seen this when Python tries to buffer stdout, which doesn't always play nice with Docker -- adding ENV PYTHONUNBUFFERED=1 to your Dockerfile (and then rebuilding the image) would solve this problem, if this is the cause.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With