Is it possible to implement ArangoDB sharding by database (rather than collection or shardKey)?

Tags:

I have a large Arango instance with lots of databases - one for each project. Each projects database has a bunch of collections and a lot of data. The databases look something like

project1
project2
project3
...
project500

I'd like to distribute query load by sharding the instance so that each project database runs on a separate server, or spin up multiple large hosts and have Arango set things up automatically. However it seems like ArangoDB sharding only works at the collection level (for instance by record _key within a collection).

Is there any way to setup sharding by database? If not, are there any best practices for running/orchestrating multiple Arango instances?

747

asked Dec 13 '18 19:12

ZECTBynmo

3 Answers

one of options to run multiple instances is to use Docker Swarm. with example below you can run multiple instances of ArangoDB

you'll need

Docker
with initialized Swarm docker swarm init [OPTIONS]
optionaly with more nodes added via docker swarm join [OPTIONS] HOST:PORT
and set group labels on nodes via docker node update --label-add group=group1 [node-name], group1 on first node, group2 on second node and so on

then save code below as docker-stack-arango.yml

version: '3.3'

services:
  arangodb:
    image: "${ARANGO_IMAGE}"
    environment:
      ARANGO_ROOT_PASSWORD: "${ARANGO_ROOT_PASSWORD}"
      ARANGO_STORAGE_ENGINE: "${ARANGO_STORAGE_ENGINE}"
    volumes:
      - arangodb:/var/lib/arangodb3
      - arangodb_apps:/var/lib/arangodb3-apps
    ports:
      - target: 8529
        published: $ARANGO_PUBLISHED_PORT
        protocol: tcp
        mode: ingress
    deploy:
      mode: replicated
      replicas: 1
      endpoint_mode: vip
      placement:
        constraints:
          - node.labels.group==$INSTANCE_GROUP
      resources:
        limits:
          cpus: $LIMITS_CPU
          memory: $LIMITS_MEMORY
      restart_policy:
        condition: any
        delay: 5s
        max_attempts: 3
        window: 60s
      update_config:
        parallelism: 1
        delay: 30s
    stop_grace_period: 60s

volumes:
  arangodb:
    external:
      name: ${ARANGO_VOLUME}
  arangodb_apps:
    external:
      name: ${ARANGO_APPS_VOLUME}

update and run config in shell/bash

export INSTANCE_GROUP="group1"
export INSTANCE_NAME="arango1"
export INSTANCE_PORT=8529
export INSTANCE_PASSWORD="do-not-use-this-password-in-production"

export ARANGO_IMAGE_TAG="3.4.0"
export ARANGO_IMAGE_REPO="arangodb/arangodb"
export ARANGO_IMAGE="${ARANGO_IMAGE_REPO}:${ARANGO_IMAGE_TAG}"
export ARANGO_VOLUME="arangodb-${INSTANCE_NAME}--3.4.0"
export ARANGO_APPS_VOLUME="arangodb-apps-${INSTANCE_NAME}--3.4.0"
export ARANGO_PUBLISHED_PORT=$INSTANCE_PORT
export ARANGO_STORAGE_ENGINE="rocksdb"
export ARANGO_ROOT_PASSWORD=$INSTANCE_PASSWORD
export LIMITS_CPU=1
export LIMITS_MEMORY=1024M

and then run deploy

docker stack deploy -c ./docker-stack-arango.yml $INSTANCE_NAME

to deploy second instance change INSTANCE_NAME, INSTANCE_PORT and INSTANCE_GROUP and run deploy again

then you can access instances via ip of any node with configured port

answered Oct 17 '22 15:10

sevcik.tk

No. Sharding is implemented solely for the purpose of distributing documents of any collection over multiple database servers. This is a means, to implement memory as well as load balancing on ArangoDB clusters.

answered Oct 17 '22 15:10

Kaveh Vahedipour

Arango can also be implemented using Kubernetes instead of Docker swarm (probably better).You could even create multiple server standalone instances if you really wanted to. Whichever the implementation technology though, I guess what the other answers are trying to indicate is that if you have multiple independent databases, you could, have multiple instances of ArangoDB (or any other DB for that matter). The only time you would want keep multiple DBs in one instance is if the DBs are small enough that they will not compete for the server's resources.

Dividing you current instance should be fairly straight forward as you can backup, restore and manipulate the different DBs independently. Sharding and other associated concepts like partitioning are meant for times where you have to keep all the data within a single database. In that case, one needs to find a way to divide the data in multiple servers while keeping it as a single unit. That does not appear to be the case for here.

If you want to find out more on how to use ArangoDb with Kubernetes, you can find the documentation here

answered Oct 17 '22 13:10

camba1

Related questions
                            
                                Best way to store tags in a database?
                            
                                How to use Unique Composite Key
                            
                                How to get an object's fields?
                            
                                Is it possible to access the SQLite-database of an Android-app on my phone?
                            
                                Difference between canonical cover and minimal cover
                            
                                Will SqlConnection's Dispose Method Interfere with Connection Pool?
                            
                                Database transactions in Zend Framework: Are they isolated?
                            
                                One big query vs. many small ones?
                            
                                PROJECTION in vertica database [closed]
                            
                                Copy column between tables in Enterprise Architect
                            
                                Retrieve MAX Primary Key values of all tables in a database at once
                            
                                When is a graph database (like Neo4j) not a good use? [closed]
                            
                                Ruby on Rails: two references with different name to the same model
                            
                                Partitioned table query still scanning all partitions
                            
                                MySQL Error "There can be only one TIMESTAMP column with CURRENT_TIMESTAMP in DEFAULT clause" even though I'm doing nothing wrong
                            
                                `UPDATE` and `LIMIT` in `MySQL`
                            
                                Compare row count of two tables in a single query and return boolean
                            
                                Stock management database design
                            
                                Create PostgreSQL Database without root privilege
                            
                                Provision multiple logical databases with Terraform on AWS RDS cluster instance

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is it possible to implement ArangoDB sharding by database (rather than collection or shardKey)?

Tags:

database

sharding

arangodb