Is it possible to loop through Amazon S3 bucket and count the number of lines in its file/key using Python?

Tags:

Is it possible to loop through the file/key in Amazon S3 bucket, read the contents and count the number of lines using Python?

For Example:

  1. My bucket: "my-bucket-name"
  2. File/Key : "test.txt"

I need to loop through the file "test.txt" and count the number of line in the raw file.

Sample Code:

for bucket in conn.get_all_buckets():
    if bucket.name == "my-bucket-name":
        for file in bucket.list():
            #need to count the number lines in each file and print to a log.

202

asked May 30 '16 06:05

Renukadevi

2 Answers

Using boto3 you can do the following:

import boto3

# create the s3 resource
s3 = boto3.resource('s3')

# get the file object
obj = s3.Object('bucket_name', 'key')

# read the file contents in memory
file_contents = obj.get()["Body"].read()

# print the occurrences of the new line character to get the number of lines
print file_contents.count('\n')

If you want to do this for all objects in a bucket, you can use the following code snippet:

bucket = s3.Bucket('bucket_name')
for obj in bucket.objects.all():
    file_contents = obj.get()["Body"].read()
    print file_contents.count('\n')

Here is the reference to boto3 documentation for more functionality: http://boto3.readthedocs.io/en/latest/reference/services/s3.html#object

Update: (Using boto 2)

import boto
s3 = boto.connect_s3()  # establish connection
bucket = s3.get_bucket('bucket_name')  # get bucket

for key in bucket.list(prefix='key'):  # list objects at a given prefix
    file_contents = key.get_contents_as_string()  # get file contents
    print file_contents.count('\n')  # print the occurrences of the new line character to get the number of lines

answered Oct 26 '22 16:10

tamjd1

Reading large files to memory sometimes is far from ideal. Instead you may find the following more of use:

s3 = boto3.client('s3')
obj = s3.get_object(Bucket='bucketname', Key=fileKey)


nlines = 0
for _ in obj['Body'].iter_lines(): nlines+=1

print (nlines)

answered Oct 26 '22 16:10

user2589273

Related questions
                            
                                What is the difference between the title() method and wm_title() method in the Tkinter class?
                            
                                Tkinter TTK Button Bold Font
                            
                                Unexpected Behavior of itertools.groupby
                            
                                Flask application on uwsgi gives a TypeError: 'Flask' object is not iterable
                            
                                how to remove a object in a python list
                            
                                ScrapyJS - How to properly wait for page load?
                            
                                What is the difference between an S3 Object and an ObjectSummary?
                            
                                Explicit passing of Self when calling super class's __init__ in python
                            
                                Installing imutils in ubuntu
                            
                                Plotting with SymPy
                            
                                Cumulative operations on dtype objects
                            
                                Django - Filter a date within a range with validation
                            
                                Convert a Haskell code to Python or pseudocode
                            
                                FFT in numpy vs FFT in MATLAB do not have the same results
                            
                                Array of ints in numba
                            
                                numpy: How can I select specific indexes in an np array for k-fold cross validation?
                            
                                How can I read in a binary file from hdfs into a Spark dataframe?
                            
                                different colors for rows in barh chart from pandas dataframe python
                            
                                Remove Action Bar Icon Kivy
                            
                                Numpy finding element index in another array

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is it possible to loop through Amazon S3 bucket and count the number of lines in its file/key using Python?

Tags:

python

amazon-web-services

amazon-s3

boto

Renukadevi

People also ask

2 Answers

tamjd1

user2589273

Recent Activity

Donate For Us