How to Optimize Ceph RGW with Dynamic Bucket Re-sharding

Sharding is the process of reorganizing and redistributing a dataset into multiple parts. The word “shard” means a small part of a whole. In Ceph, RGW offers the possibility to perform bucket re-sharding (an online process starting with the Luminous version) when an index bucket grows beyond a certain point to prevent performance degradation.

Each index bucket can handle its entries efficiently until a certain threshold is reached (more than 100,000 entries according to Ceph official docs. croit comes with a 200,000 entries flag set “rgw max objs per shard 200000” by default). Beyond this point, bucket operations can suffer from performance degradation.

The dynamic reshard process is invisible to the user. It automatically detects when a reshard needs to be performed and increases the number of reshards used by the index bucket (the default is 1999 shards). During this process, write I/O to the target bucket is blocked while read I/O is not affected. This is because the bucket index does not affect read operations on an object but adds an extra operation when writing or modifying an RGW object.

It’s important to note that dynamic re-sharding does not work for multisite setups for ceph versions prior to Reef. Re-sharding is a complicated process that can impact your application’s performance and should be taken into consideration when first deploying your application.

In Ceph RGW, there are several admin commands that can be used to manage the resharding process. These include:

Manually activate resharding for a specific bucket:

radosgw-admin reshard add --bucket= --num-shards=

List scheduled bucket reshards:

radosgw-admin reshard list

Check the resharding process queue:

radosgw-admin reshard process

Check resharding status on a specific bucket:

radosgw-admin reshard status --bucket=

It’s important to note that you cannot cancel an ongoing reshard operation. However, you can initiate an immediate manual reshard with the following command:

radosgw-admin bucket reshard --bucket= --num-shards=

You can also check for stale instances with the command:

radosgw-admin reshard stale-instances list

And clean up these instances in a single site cluster with the command:

radosgw-admin reshard stale-instances rm

All these commands and more can be found in the official Ceph documentation: RGW Dynamic Bucket Index Resharding — Ceph Documentation

Preparation is key when it comes to managing the performance of Ceph RGW buckets.

If the Ceph administrator knows in advance the number of objects a bucket will hold, they can better prepare to avoid potential performance bottlenecks.

However, if the number of objects is not known, the dynamic bucket re-sharding feature of Ceph RGW can help to avoid degraded performance associated with overloaded buckets.