Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Amazon EC2 & S3 When using Python / SQLite?

Suppose that I have a huge SQLite file (say, 500[MB]) stored in Amazon S3. Can a python script that is run on a small EC2 instance directly access and modify that SQLite file? or must I first copy the file to the EC2 instance, change it there and then copy over to S3?

Will the I/O be efficient?

Here's what I am trying to do. As I wrote, I have a 500[MB] SQLite file in S3. I'd like to start say 10 different Amazon EC2 instances that will each read a subset of the file and do some processing (every instance will handle a different subset of the 500[MB] SQLite file). Then, once processing is done, every instance will update only the subset of the data it dealt with (as explained, there will be no overlap of data among processes).

For example, suppose that the SQLite file has say 1M rows:

instance 1 will deal with (and update) rows 0 - 100000

instance 2 will will deal with (and update) rows 100001 - 200000

.........................

instance 10 will deal with (and update) rows 900001 - 1000000


Is it at all possible? Does it sound OK? any suggestions / ideas are welcome.

like image 284
user3262424 Avatar asked Dec 16 '22 13:12

user3262424


2 Answers

I'd like to start say 10 different Amazon EC2 instances that will each read a subset of the file and do some processing (every instance will handle a different subset of the 500[MB] SQLite file)

You cannot do this with SQLite; on amazon infrastructure or otherwise. sqlite performs database level write locking. unless all ten nodes are performing reads exclusively, you will not attain any kind of concurrency. Even the SQLite website says so.

Situations Where Another RDBMS May Work Better

  • Client/Server Applications
  • High-volume Websites
  • Very large datasets
  • High Concurrency

Have you considered PostgreSQL?

like image 195
SingleNegationElimination Avatar answered Dec 29 '22 03:12

SingleNegationElimination


Since S3 cannot be directly mounted, your best bet is to create an EBS volume containing the SQLite file and work directly with the EBS volume from another (controller) instance. You can then create snapshots of the volume, and archive it into S3. Using a tool like boto (Python API), you can automate the creation of snapshots and the process of moving the backups into S3.

like image 31
serialworm Avatar answered Dec 29 '22 01:12

serialworm