aws sagemaker training pipe mode reading random number of bytes

Question

I am using my own algorithm and loading data in json format from s3. Because of the huge size of data, I need to setup pipe mode. I have followed the instructions as given in: https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/pipe_bring_your_own/train.py. As a result, I am able to setup pipe and read data successfully. Only problem is that fifo pipe is not reading the specified amount of bytes. For example, given path to s3-fifo-channel,

    number_of_bytes_to_read = 555444333
    with open(fifo_path, "rb", buffering=0) as fifo:
        while True:
            data = fifo.read(number_of_bytes_to_read)

The length of data should be 555444333 bytes, but it is always less 12,123,123 bytes or so. Data in S3 looks as following:

s3://s3-bucket/1122/part1.json
s3://s3-bucket/1122/part2.json
s3://s3-bucket/1133/part1.json
s3://s3-bucket/1133/part2.json

and so. Is there any way to enforce the number of bytes to be read? Any suggestion will be helpful. Thanks.

AbdulRehmanLiaqat · Accepted Answer

We just needed to add some positive value in the buffering and the problem was solved. Code will buffer 555444333 Bytes and then process 111222333 bytes each time. Since our files are in Json, we can easily convert incoming bytes to string and then clean strings by removing incomplete json parts. Final code looks like:

number_of_bytes_to_read = 111222333
number_of_bytes_to_buffer = 555444333
with open(fifo_path, "rb", buffering=number_of_bytes_to_buffer) as fifo:
    while True:
      data = fifo.read(number_of_bytes_to_read)

aws sagemaker training pipe mode reading random number of bytes

Tags:

python

amazon-web-services

machine-learning

streaming

amazon-sagemaker

AbdulRehmanLiaqat

1 Answers

AbdulRehmanLiaqat

Recent Activity

Donate For Us

aws sagemaker training pipe mode reading random number of bytes

Tags:

python

amazon-web-services

machine-learning

streaming

amazon-sagemaker

AbdulRehmanLiaqat

1 Answers

AbdulRehmanLiaqat

Related questions

Recent Activity

Donate For Us