I have a little sqlite database(110kb) in a s3 bucket. I want to connect to that database every time I run my python application. An option is just to download database everytime I run the python application and connect it as normally. But I want to know if exists a way to connect to that sqlite database through memory, using <code>S3FileSystem</code> and <code>open</code>. I'm using <code>sqlite3</code> library and python 3.6

As other answers indicate, you probably don't want to use SQLite as a primary database in the cloud. However, as part of a fun side project I wrote an Amazon Athena data source connector that allows you to query SQLite databases in S3 from Athena. In order to do that, I wrote a read-only SQLite interface for S3. SQLite has a concept of an OS Interface or VFS. Using a Python SQLite wrapper called APSW, you can write a VFS implementation for arbitrary filesystems. This is what I did in my project and I've included the implementation below. In order to use this, you would first register the VFS and then create a new SQLite connection with this implementation as the driver. I should note this isn't optimized at all, so will likely still require reading full databases from S3 depending on your queries. But doesn't sound like an issue in this specific case. <pre class="prettyprint lang-py prettyprint-override"><code>S3FS = S3VFS() # S3VFS defined below # This odd format is used due to SQLite requirements sqlite_uri = "file:/{}/{}.sqlite?bucket={}&immutable=1".format( S3_PREFIX, DATABASE_NAME, S3_BUCKET ) connection = apsw.Connection(sqlite_uri, flags=apsw.SQLITE_OPEN_READONLY | apsw.SQLITE_OPEN_URI, vfs=S3FS.vfsname ) cursor = connection.cursor() </code></pre> Once you have the cursor, you can execute standard SQL statements like so: <pre class="prettyprint lang-py prettyprint-override"><code>for x,y,z in cursor.execute("select x,y,z from foo"): print (cursor.getdescription()) # shows column names and declared types print (x,y,z) </code></pre> VFS Implementation (requires APSW library and boto3 for S3 connectivity) <pre class="prettyprint lang-py prettyprint-override"><code>import apsw import sys import boto3 VFS_S3_CLIENT = boto3.client('s3') class S3VFS(apsw.VFS): def __init__(self, vfsname="s3", basevfs=""): self.vfsname=vfsname self.basevfs=basevfs apsw.VFS.__init__(self, self.vfsname, self.basevfs) def xOpen(self, name, flags): return S3VFSFile(self.basevfs, name, flags) class S3VFSFile(): def __init__(self, inheritfromvfsname, filename, flags): self.bucket = filename.uri_parameter("bucket") self.key = filename.filename().lstrip("/") print("Initiated S3 VFS for file: {}".format(self._get_s3_url())) def xRead(self, amount, offset): response = VFS_S3_CLIENT.get_object(Bucket=self.bucket, Key=self.key, Range='bytes={}-{}'.format(offset, offset + amount)) response_data = response['Body'].read() return response_data def xFileSize(self): client = boto3.client('s3') response = client.head_object( Bucket=self.bucket, Key=self.key) return response['ContentLength'] def xClose(self): pass def xFileControl(self, op, ptr): return False def _get_s3_url(self): return "s3://{}/{}".format(self.bucket, self.key) </code></pre>

sqlite3: Connect to a database in cloud (S3)

Tags:

python

sqlite

python-3.x

amazon-s3

in-memory-database

I have a little sqlite database(110kb) in a s3 bucket. I want to connect to that database every time I run my python application.

An option is just to download database everytime I run the python application and connect it as normally. But I want to know if exists a way to connect to that sqlite database through memory, using S3FileSystem and open. I'm using sqlite3 library and python 3.6

408

asked Jun 26 '19 16:06

Joaquin

2 Answers

Yes, it's possible with EFS:

https://www.lambrospetrou.com/articles/aws-lambda-and-sqlite-over-efs/

AWS recently released integration between AWS Lambda and Amazon EFS. It is supporting NFSv4 lock upgrading/downgrading which is needed by SQLite. This means SQLite engine can have read/write access to files stored on EFS filesystem.

answered Sep 20 '22 10:09

Alex B

As other answers indicate, you probably don't want to use SQLite as a primary database in the cloud.

However, as part of a fun side project I wrote an Amazon Athena data source connector that allows you to query SQLite databases in S3 from Athena. In order to do that, I wrote a read-only SQLite interface for S3.

SQLite has a concept of an OS Interface or VFS. Using a Python SQLite wrapper called APSW, you can write a VFS implementation for arbitrary filesystems. This is what I did in my project and I've included the implementation below.

In order to use this, you would first register the VFS and then create a new SQLite connection with this implementation as the driver.

I should note this isn't optimized at all, so will likely still require reading full databases from S3 depending on your queries. But doesn't sound like an issue in this specific case.

S3FS = S3VFS()  # S3VFS defined below

# This odd format is used due to SQLite requirements
sqlite_uri = "file:/{}/{}.sqlite?bucket={}&immutable=1".format(
  S3_PREFIX,
  DATABASE_NAME,
  S3_BUCKET
)

connection = apsw.Connection(sqlite_uri,
  flags=apsw.SQLITE_OPEN_READONLY | apsw.SQLITE_OPEN_URI,
  vfs=S3FS.vfsname
)
cursor = connection.cursor()

Once you have the cursor, you can execute standard SQL statements like so:

for x,y,z in cursor.execute("select x,y,z from foo"):
    print (cursor.getdescription())  # shows column names and declared types
    print (x,y,z)

VFS Implementation (requires APSW library and boto3 for S3 connectivity)

import apsw
import sys
import boto3

VFS_S3_CLIENT = boto3.client('s3')


class S3VFS(apsw.VFS):
    def __init__(self, vfsname="s3", basevfs=""):
        self.vfsname=vfsname
        self.basevfs=basevfs
        apsw.VFS.__init__(self, self.vfsname, self.basevfs)

    def xOpen(self, name, flags):
        return S3VFSFile(self.basevfs, name, flags)


class S3VFSFile():
    def __init__(self, inheritfromvfsname, filename, flags):
        self.bucket = filename.uri_parameter("bucket")
        self.key = filename.filename().lstrip("/")
        print("Initiated S3 VFS for file: {}".format(self._get_s3_url()))

    def xRead(self, amount, offset):
        response = VFS_S3_CLIENT.get_object(Bucket=self.bucket, Key=self.key, Range='bytes={}-{}'.format(offset, offset + amount))
        response_data = response['Body'].read()
        return response_data

    def xFileSize(self):
        client = boto3.client('s3')
        response = client.head_object( Bucket=self.bucket, Key=self.key)
        return response['ContentLength']

    def xClose(self):
        pass

    def xFileControl(self, op, ptr):
        return False

    def _get_s3_url(self):
        return "s3://{}/{}".format(self.bucket, self.key)

answered Sep 22 '22 10:09

dacort

Related questions
                            
                                Converting hex to int in python
                            
                                delete U+200B zero-width space characters using sublime text 3
                            
                                how to convert csv to dictionary using pandas
                            
                                After starting process, how to get parent's PID in the child?
                            
                                "NotImplementedError: Use label() to access a node label"
                            
                                How to install PyGame on Python 3.4?
                            
                                Is there a scala/java equivalent of Python 3's collections.Counter
                            
                                Lambda and multiple statements in Python
                            
                                "TypeError: 'set' object is not callable"
                            
                                Generating NXN spirals
                            
                                malformed header from script index.py Bad header
                            
                                How to disable java script in Chrome Driver Selenium Python
                            
                                __str__ returned non-string (type tuple)
                            
                                keras.utils.to_categorical() - name keras not defined
                            
                                Unable to locate elements on webpage with headless chrome
                            
                                Element is not clickable at point (x,y.5) because another element obscures it
                            
                                OperationalError: (2059, "Authentication plugin 'caching_sha2_password' cannot be loaded: The specified module could not be found.\r\n") [duplicate]
                            
                                Jupyter notebook signal only works in main thread
                            
                                a,b = b,a in python vs std::swap() in C++
                            
                                Clean way to convert quarterly periods to datetime in pandas

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With