Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to loop through Amazon S3 bucket and count the number of lines in its file/key using Python?

Is it possible to loop through the file/key in Amazon S3 bucket, read the contents and count the number of lines using Python?

For Example:

  1. My bucket: "my-bucket-name"
  2. File/Key : "test.txt" 

I need to loop through the file "test.txt" and count the number of line in the raw file.

Sample Code:

for bucket in conn.get_all_buckets():
    if bucket.name == "my-bucket-name":
        for file in bucket.list():
            #need to count the number lines in each file and print to a log.
like image 202
Renukadevi Avatar asked May 30 '16 06:05

Renukadevi


People also ask

How do I count the number of items in a S3 bucket?

Open the AWS S3 console and click on your bucket's name. In the Objects tab, click the top row checkbox to select all files and folders or select the folders you want to count the files for. Click on the Actions button and select Calculate total size.

Is S3 infinitely scalable?

Amazon S3 is object storage built to store and retrieve any amount of data from anywhere. It's a simple storage service that offers industry leading durability, availability, performance, security, and virtually unlimited scalability at very low costs.

Is it possible to loop through Amazon S3 bucket?

Amazon S3 is only a storage service. You must get the file in order to perform actions on it (e.g. reading number of files). Show activity on this post. You can loops through a bucket using boto3 list_objects_v2.

How do I store data in Amazon S3?

To store your data in Amazon S3, you work with resources known as buckets and objects. A bucket is a container for objects. An object is a file and any metadata that describes that file. To store an object in Amazon S3, you create a bucket and then upload the object to a bucket.

How do I list all objects in an S3 bucket?

Invoke the list_objects_v2 () method with the bucket name to list all the objects in the S3 bucket. It returns the dictionary object with the object details. Iterate the returned dictionary and display the object names using the obj [key].

What happens when I enable an S3 bucket key for KMS?

When you enable an S3 Bucket Key for your bucket, new objects that you upload to the bucket use an S3 Bucket Key for server-side encryption using AWS KMS. If you upload, modify, or copy an object in a bucket that has an S3 Bucket Key enabled, the S3 Bucket Key settings for that object might be updated to align with bucket configuration.


2 Answers

Using boto3 you can do the following:

import boto3

# create the s3 resource
s3 = boto3.resource('s3')

# get the file object
obj = s3.Object('bucket_name', 'key')

# read the file contents in memory
file_contents = obj.get()["Body"].read()

# print the occurrences of the new line character to get the number of lines
print file_contents.count('\n')

If you want to do this for all objects in a bucket, you can use the following code snippet:

bucket = s3.Bucket('bucket_name')
for obj in bucket.objects.all():
    file_contents = obj.get()["Body"].read()
    print file_contents.count('\n')

Here is the reference to boto3 documentation for more functionality: http://boto3.readthedocs.io/en/latest/reference/services/s3.html#object

Update: (Using boto 2)

import boto
s3 = boto.connect_s3()  # establish connection
bucket = s3.get_bucket('bucket_name')  # get bucket

for key in bucket.list(prefix='key'):  # list objects at a given prefix
    file_contents = key.get_contents_as_string()  # get file contents
    print file_contents.count('\n')  # print the occurrences of the new line character to get the number of lines
like image 53
tamjd1 Avatar answered Oct 26 '22 16:10

tamjd1


Reading large files to memory sometimes is far from ideal. Instead you may find the following more of use:

s3 = boto3.client('s3')
obj = s3.get_object(Bucket='bucketname', Key=fileKey)


nlines = 0
for _ in obj['Body'].iter_lines(): nlines+=1

print (nlines)
like image 29
user2589273 Avatar answered Oct 26 '22 16:10

user2589273