Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sqlite3: Connect to a database in cloud (S3)

I have a little sqlite database(110kb) in a s3 bucket. I want to connect to that database every time I run my python application.

An option is just to download database everytime I run the python application and connect it as normally. But I want to know if exists a way to connect to that sqlite database through memory, using S3FileSystem and open. I'm using sqlite3 library and python 3.6

like image 408
Joaquin Avatar asked Jun 26 '19 16:06

Joaquin


People also ask

How do I connect to an existing SQLite database?

Use the connect() method To establish a connection to SQLite, you need to pass the database name you want to connect. If you specify the database file name that already presents on the disk, it will connect to it. But if your specified SQLite database file doesn't exist, SQLite creates a new database for you.

Can you use SQLite on AWS?

To setup and install SQLite server on any of the cloud platforms (Azure, AWS, Google GCP), the best way is to use the SQLite server image from the cloud marketplaces.

Which sqlite3 function returns a SQLite database connection?

The sqlite3. connect() function returns a Connection object that we will use to interact with the SQLite database held in the file aquarium.

Which of the module is used to interact with SQLite?

Python SQLite3 module is used to integrate the SQLite database with Python. It is a standardized Python DBI API 2.0 and provides a straightforward and simple-to-use interface for interacting with SQLite databases.


2 Answers

Yes, it's possible with EFS:

https://www.lambrospetrou.com/articles/aws-lambda-and-sqlite-over-efs/

AWS recently released integration between AWS Lambda and Amazon EFS. It is supporting NFSv4 lock upgrading/downgrading which is needed by SQLite. This means SQLite engine can have read/write access to files stored on EFS filesystem.

like image 40
Alex B Avatar answered Sep 20 '22 10:09

Alex B


As other answers indicate, you probably don't want to use SQLite as a primary database in the cloud.

However, as part of a fun side project I wrote an Amazon Athena data source connector that allows you to query SQLite databases in S3 from Athena. In order to do that, I wrote a read-only SQLite interface for S3.

SQLite has a concept of an OS Interface or VFS. Using a Python SQLite wrapper called APSW, you can write a VFS implementation for arbitrary filesystems. This is what I did in my project and I've included the implementation below.

In order to use this, you would first register the VFS and then create a new SQLite connection with this implementation as the driver.

I should note this isn't optimized at all, so will likely still require reading full databases from S3 depending on your queries. But doesn't sound like an issue in this specific case.

S3FS = S3VFS()  # S3VFS defined below

# This odd format is used due to SQLite requirements
sqlite_uri = "file:/{}/{}.sqlite?bucket={}&immutable=1".format(
  S3_PREFIX,
  DATABASE_NAME,
  S3_BUCKET
)

connection = apsw.Connection(sqlite_uri,
  flags=apsw.SQLITE_OPEN_READONLY | apsw.SQLITE_OPEN_URI,
  vfs=S3FS.vfsname
)
cursor = connection.cursor()

Once you have the cursor, you can execute standard SQL statements like so:

for x,y,z in cursor.execute("select x,y,z from foo"):
    print (cursor.getdescription())  # shows column names and declared types
    print (x,y,z)

VFS Implementation (requires APSW library and boto3 for S3 connectivity)

import apsw
import sys
import boto3

VFS_S3_CLIENT = boto3.client('s3')


class S3VFS(apsw.VFS):
    def __init__(self, vfsname="s3", basevfs=""):
        self.vfsname=vfsname
        self.basevfs=basevfs
        apsw.VFS.__init__(self, self.vfsname, self.basevfs)

    def xOpen(self, name, flags):
        return S3VFSFile(self.basevfs, name, flags)


class S3VFSFile():
    def __init__(self, inheritfromvfsname, filename, flags):
        self.bucket = filename.uri_parameter("bucket")
        self.key = filename.filename().lstrip("/")
        print("Initiated S3 VFS for file: {}".format(self._get_s3_url()))

    def xRead(self, amount, offset):
        response = VFS_S3_CLIENT.get_object(Bucket=self.bucket, Key=self.key, Range='bytes={}-{}'.format(offset, offset + amount))
        response_data = response['Body'].read()
        return response_data

    def xFileSize(self):
        client = boto3.client('s3')
        response = client.head_object( Bucket=self.bucket, Key=self.key)
        return response['ContentLength']

    def xClose(self):
        pass

    def xFileControl(self, op, ptr):
        return False

    def _get_s3_url(self):
        return "s3://{}/{}".format(self.bucket, self.key)
like image 78
dacort Avatar answered Sep 22 '22 10:09

dacort