How do you full text search an amazon s3 bucket?

People also ask

Can you search in S3?

S3 doesn't have a native "search this bucket" since the actual content is unknown - also, since S3 is key/value based there is no native way to access many nodes at once ala more traditional datastores that offer a (SELECT * FROM ...

Can you query an S3 bucket?

Amazon S3 Select and Amazon S3 Glacier Select enable customers to run structured query language SQL queries directly on data stored in S3 and Amazon S3 Glacier. With S3 Select, you simply store your data on S3 and query using SQL statements to filter the contents of S3 objects, retrieving only the data that you need.

How do I read data on Amazon S3?

In the Amazon S3 console, choose your S3 bucket, choose the file that you want to open or download, choose Actions, and then choose Open or Download. If you are downloading an object, specify where you want to save it. The procedure for saving the object depends on the browser and operating system that you are using.

The only way to do this will be via CloudSearch, which can use S3 as a source. It works using rapid retrieval to build an index. This should work very well but thoroughly check out the pricing model to make sure that this won't be too costly for you.

The alternative is as Jack said - you'd otherwise need to transfer the files out of S3 to an EC2 and build a search application there.

Since october 1st, 2015 Amazon offers another search service with Elastic Search, in more or less the same vein as cloud search you can stream data from Amazon S3 buckets.

It will work with a lambda function to make sure any new data sent to an S3 bucket triggers an event notification to this Lambda and update the ES index.

All steps are well detailed in amazon doc with Java and Javascript example.

At a high level, setting up to stream data to Amazon ES requires the following steps:

Creating an Amazon S3 bucket and an Amazon ES domain
Creating a Lambda deployment package.
Configuring a Lambda function.
Granting authorization to stream data to Amazon ES.

Although not an AWS native service, there is Mixpeek, which runs text extraction like Tika, Tesseract and ImageAI on your S3 files then places them in a Lucene index to make them searchable.

You integrate it as follows:

Download the module: https://github.com/mixpeek/mixpeek-python

Import the module and your API keys:

 from mixpeek import Mixpeek, S3
 from config import mixpeek_api_key, aws

Instantiate the S3 class (which uses boto3 and requests):

 s3 = S3(
     aws_access_key_id=aws['aws_access_key_id'],
     aws_secret_access_key=aws['aws_secret_access_key'],
     region_name='us-east-2',
     mixpeek_api_key=mixpeek_api_key
 )

Upload one or more existing S3 files:

     # upload all S3 files in bucket "demo"            
     s3.upload_all(bucket_name="demo")

     # upload one single file called "prescription.pdf" in bucket "demo"
     s3.upload_one(s3_file_name="prescription.pdf", bucket_name="demo")

Now simply search using the Mixpeek module:

     # mixpeek api direct
     mix = Mixpeek(
         api_key=mixpeek_api_key
     )
     # search
     result = mix.search(query="Heartgard")
     print(result)

Where result can be:

 [
     {
         "_id": "REDACTED",
         "api_key": "REDACTED",
         "highlights": [
             {
                 "path": "document_str",
                 "score": 0.8759502172470093,
                 "texts": [
                     {
                         "type": "text",
                         "value": "Vetco Prescription\nVetcoClinics.com\n\nCustomer:\n\nAddress: Canine\n\nPhone: Australian Shepherd\n\nDate of Service: 2 Years 8 Months\n\nPrescription\nExpiration Date:\n\nWeight: 41.75\n\nSex: Female\n\n℞  "
                     },
                     {
                         "type": "hit",
                         "value": "Heartgard"
                     },
                     {
                         "type": "text",
                         "value": " Plus Green 26-50 lbs (Ivermectin 135 mcg/Pyrantel 114 mg)\n\nInstructions: Give one chewable tablet by mouth once monthly for protection against heartworms, and the treatment and\ncontrol of roundworms, and hookworms. "
                     }
                 ]
             }
         ],
         "metadata": {
             "date_inserted": "2021-10-07 03:19:23.632000",
             "filename": "prescription.pdf"
         },
         "score": 0.13313256204128265
     }
 ]

Then you parse the results

Related questions
                            
                                PDF to JPG conversion using PHP
                            
                                PHP - define static array of objects
                            
                                What is the meaning of "break 2"?
                            
                                PHP Commands Out of Sync error
                            
                                How to implement Google OpenID authentication in PHP & Test on Localhost
                            
                                Parse and create ISO 8601 Date and time intervals, like PT15M in PHP
                            
                                What to use instead of apc user data cache in php 5.5?
                            
                                How can I create a one time download link with Amazon S3?
                            
                                Why can't you inherit from a not-yet-defined class which inherits from a not-yet-defined class?
                            
                                tcpdf - start with existing PDF document
                            
                                Understanding the Domain object + Data mapper pattern?
                            
                                Lucene with PHP
                            
                                How to upload large files above 500MB in PHP [duplicate]
                            
                                Load Wordpress post content into DIV using AJAX
                            
                                when to use DIRECTORY_SEPARATOR in PHP code?
                            
                                What is this error? "Database query failed: Data truncated for column 'column_name' at row 1
                            
                                php recursive folder readdir vs find performance
                            
                                Understanding PHP declare() and ticks
                            
                                how to integrate facebook login with your website?
                            
                                Composer/Laravel: How to add/update a specific package

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do you full text search an amazon s3 bucket?

Tags:

php

amazon-web-services

amazon-s3

People also ask

Recent Activity

Donate For Us