Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

The optimum size of objects in Ceph Object Storage (RADOS)

It looks like that RADOS is best suited to be used as the storage backend for Ceph Block Storage and File System. But if i want to use the Object Storage itself:

  • Is there an optimum object size which gives the best performance?
  • Is there a problem with a large number of small objects?
  • How big objects can get without making troubles?

It would be great if you can share your experience.

like image 203
Ali Avatar asked Feb 12 '14 11:02

Ali


People also ask

What is Rados Ceph?

Reliable Autonomic Distributed Object Store (RADOS) is an open source object storage service that is an integral part of the Ceph distributed storage system. A Ceph RADOS system typically consists of a large collection of standard commodity servers, also known as storage nodes.

What is object storage Ceph?

Ceph is an open source software-defined storage solution designed to address the block, file and object storage needs of modern enterprises. Its highly scalable architecture sees it being adopted as the new norm for high-growth block storage, object stores, and data lakes.

How does Ceph scale?

Ceph delivers extraordinary scalability–thousands of clients accessing petabytes to exabytes of data. A Ceph Node leverages commodity hardware and intelligent daemons, and a Ceph Storage Cluster accommodates large numbers of nodes, which communicate with each other to replicate and redistribute data dynamically.

What is Ceph storage and how it works?

The Ceph storage cluster stores data objects in logical partitions called 'Pools. ' Ceph administrators can create pools for particular types of data, such as for block devices, object gateways, or simply just to separate one group of users from another.


1 Answers

There is no optimal size for objects in the object store, in fact this flexibility is one of the big benefits over fixed-size block stores. Typically an application will use this flexibility to decompose its data models along convenient boundaries. That said, if you are storing very small or very large objects, you should take into account some considerations.

Is there a problem with a large number of small objects?

There has never been a functional problem with small objects, though in the past it has been inefficient due to the way that objects are stored. However, in the next release of Ceph (Firefly) there is a way to use LevelDB as a backend, making small objects much more efficient.

How big objects can get without making troubles?

Assuming that you are using replication in RADOS (in contrast to the proposed object striping feature and the erasure coding backend) an object is replicated in its entirety to a set of physical storage nodes. Thus, the size of an object has an inherent limitation in size based on the storage capacity of the physical nodes to which the object is replicated.

This mode of operation also alludes to the practical limitation that per-object I/O performance will correspond to the performance of the physical devices (data and journal drives). This means that it is often useful to think of an object as a unit of I/O parallelism, although in practice many objects will map to the same set of devices.

This question will likely have a different answer for the erasure coded backend, and applications can always stripe large datasets across smaller objects.

like image 126
Noah Watkins Avatar answered Nov 10 '22 01:11

Noah Watkins