Header Background
Optimize-Capacity-with-Ceph-S3-l

Optimize Storage Allocation with Ceph S3

Introduction

One of the selling aspects of Amazon S3 is that it allows users to choose from various storage classes based on data access frequency. Data often accessed is stored in the "Standard" class, while less frequently accessed data goes into the "Infrequently Accessed" (IA) class, where it can be moved if not accessed over a set period. This setup aligns storage costs with retrieval needs. Comparatively, Ceph S3 implements a similar approach with its own storage classes, making it an intriguing point of reference for users familiar with Amazon S3’s structure.

The Ceph S3 lingo is largely the same. These different places to store data are called Storage Classes and the default one is called STANDARD. However, STANDARD is the only storage class Ceph includes, the others need to be created by the administrator. On top of that, Ceph’s storage classes are implemented by so-called "placement rules" which determine where data is stored. These placement rules also specify a set of storage classes available to users, and each bucket is configured with a placement rule.

Summary

In this blog post, we are going to expand on the following points:

  • Each bucket in Ceph S3 is configured with a specific placement rule that dictates where data should be stored in ceph pools. Once a bucket is created, its placement rule is permanent and cannot be altered.
  • Placement rules are integral in defining storage classes, where each class specifies a designated ceph pool for data storage. S3 users have the flexibility to select the storage class that best fits their needs, and administrators can manage access to different storage classes for each user.
  • To optimize storage efficiency, data can be dynamically transitioned between storage classes using Lifecycle Policies, allowing for automated management based on predefined criteria.

Test Setup

In the rest of the article, we will be using a ceph cluster and a client to test and verify our claims.

The ceph cluster is deployed using croit (v2405.1.quincy) with ceph 17.2.7 running the image: ID: 734d2072-f482-4314-afd1-76f86ba6518e, name: croit enterprise image for v2405 (quincy).

$ ceph -s
cluster:
id: 6ba5438b-a632-4c53-a25f-dacd6710396e
health: HEALTH_OK

services:

mon: 5 daemons, quorum code01,code02,code03,code04,code05 (age 42h)

mgr: code01(active, since 3d), standbys: code05, code04, code02, code03

osd: 15 osds: 15 up (since 5d), 15 in (since 5d)

rgw: 5 daemons active (5 hosts, 1 zones)

data:

pools: 16 pools, 273 pgs

objects: 334 objects, 13 MiB

usage: 8.7 GiB used, 459 GiB / 468 GiB avail

pgs: 273 active+clean

The client is an Arch Linux VM with the following client versions:

$ aws --version
aws-cli/1.32.106 Python/3.12.3 $

s3cmd --version

s3cmd version 2.4.0

$ rclone version

rclone v1.66.0

Pools

The RADOS Gateway ships with the following list of default pools. Their names start with .rgw, and since we are not running a multi-site setup for our test cluster, the zonegroup is default. However, for this article, only the following are important:

default.rgw.buckets.data – can be used by the data_pool key of a placement rule (tied to the storage class) which stores the actual data in the pool.

default.rgw.buckets.non-ec – can be used by the data_extra_pool key of a placement rule which stores metadata for incomplete multi-part uploads.

default.rgw.buckets.index – can be used by the index_pool key of a placement rule which stores the bucket index information.

Placement Rules

Placement rules in Ceph map a specific data pool, index pool, and extra pool to a particular bucket. They are assigned to buckets at the time of creation and cannot be changed thereafter. The default placement rule is called, default-placement. It links to the default RADOS gateway index (default.rgw.buckets.index) and extra (default.rgw.buckets.non-ec) pools, and the default data pool (default.rgw.buckets.data) is linked to the STANDARD storage class which every placement rule includes.

Placement rules can be used to have multiple device classes tied to a single bucket. For example, a bucket can store some of its data on a set of ceph pools that are backed by SSD and some pools that are backed by HDD. Although a bucket can have multiple placement rules, the default placement rule configuration is best done for the user with radosgw-admin (or in the croit UI: S3 → Users → Edit User) than for the bucket. This is because, as previously mentioned, default placement rules for a bucket are set when buckets are created and cannot be changed after. However, the same attribute can changed by the admin for a user with radosgw-admin which will apply to new buckets only. The user can also choose a custom placement rule during creation (not upload) using tools like s3cmd or aws-cli.

Creating a New Placement Rule

During the creation of a new placement rule, custom pools can be chosen that would correspond with the STANDARD storage class of the new placement rule. Storage classes are explained further down in the article, which can be specified by user-facing tools like s3cmd and aws-cli to influence data placement into different pools.

For demonstration, let’s create a few new pools with the keywords, hot and cold, meaning frequently and infrequently accessed data respectively. After creation of those pools, the same pools page would look something like this.

As seen from the picture, the cold data pool corresponds a hdd-only crush rule that would place all the objects residing in the pool on spinning drives while the metadata and index still lives on flash. Similarly, for all the hot pools the associated crush rule is ssd-only which means the actual data, metadata and bucket index, all live on flash.

From here on, we can proceed to create two new placement rules.

Once created, the placement targets look like this. Please note that these placement targets are not limited to a single bucket. Multiple buckets and multiple users can use the same placement rule. As seen in the diagram in the summary of the article, placement rules are responsible for selecting data, index, and extra pools wherein the data pool part is decided by or tied to the storage class.

Changing Default Placement Rule for New Buckets

The default placement rule can be changed so that it can be auto-selected during creation of a new bucket or a new user.

On the command-line it would look like:

$ radosgw-admin zonegroup placement default \
--rgw-zonegroup default \
--placement-id hot-data-placement$ ceph -s

The command does the following:

  • radosgw-admin zonegroup placement default – Choose a default placement group for a particular zonegroup
  • --rgw-zonegroup default – This default placement group is only applied to the zonegroup named default. The zone (the ceph cluster) is responsible for mapping the placement rule to the pools.
  • --placement-id hot-data-placement – Name of the new placement rule to be set as default.

Although a new storage class can be created for the same placement rule and it can be set to default; doing so has no effect whatsoever. This is also explained further down in the article.

Bucket Creation with Non-Default Placement Rule with croit

Placement rules can be easily selected during bucket creation:

Bucket Creation with Non-Default Placement Rule with CLI Tools

When creating a bucket, in both instances, the default:hot-data-placement header is passed in the request that tells ceph which zonegroup and placement rule to choose.

The client is an Arch Linux VM with the following client versions:

With AWS CLI, it would look like this:

$ aws --endpoint-url http://172.31.117.89 \
s3api create-bucket \
--bucket hot-bucket \
--create-bucket-configuration 'LocationConstraint=default:hot-data-placement' \
--debug
[...]
2024-05-30 23:41:59,739 - MainThread - botocore.endpoint - DEBUG - Making request for OperationModel(name=CreateBucket) with params: {'url_path': '', 'query_string': {}, 'method': 'PUT', 'headers': {'User-Agent': 'aws-cli/1.32.106 md/Botocore#1.34.106 ua/2.0 os/linux#6.7.12-orbstack-00202-g57474688ffbd md/arch#aarch64 lang/python#3.12.3 md/pyimpl#CPython cfg/retry-mode#legacy botocore/1.34.106'}, 'body': b'default:hot-data-placement', 'auth_path': '/hot-bucket/', 'url': 'http://172.31.117.89/hot-bucket', 'context': {'client_region': 'us-east-1', 'client_config': , 'has_streaming_input': False, 'auth_type': 'v4', 's3_redirect': {'redirected': False, 'bucket': 'hot-bucket', 'params': {'Bucket': 'hot-bucket', 'CreateBucketConfiguration': {'LocationConstraint': 'default:hot-data-placement'}}}, 'input_params': {'Bucket': 'hot-bucket'}, 'signing': {'region': 'us-east-1', 'signing_name': 's3', 'disableDoubleEncoding': True}, 'endpoint_properties': {'authSchemes': [{'disableDoubleEncoding': True, 'name': 'sigv4', 'signingName': 's3', 'signingRegion': 'us-east-1'}]}}}
[...]

aws --endpoint-url http://172.31.117.89 – The endpoint URL is the URL of the RGW service or a load balancer that reverse proxies to multiple RGW services

s3api create-bucket – create a bucket using the s3 api

--bucket hot-bucket – the name of the bucket is hot-bucket

--create-bucket-configuration 'LocationConstraint=default:hot-data-placement' – apply the given configuration: the LocationConstraint here with s3api corresponds to the zonegroup‘s api_name (here: default) and a custom placement rule (here: hot-data-placement), separated with a :.

With S3CMD, it would look like:

$ s3cmd mb s3://cold-bucket \
--region default:cold-data-placement \
--debug
[...]
DEBUG: bucket_location: default:cold-data-placement
[...]
  • s3cmd mb s3://cold-bucket – make bucket with s3, name it cold-bucket
  • --region default:cold-data-placement – the region (similar to aws cli) is :

Using --storage-class with s3cmd to set a non-default storage class during creation also has no effect. AWS CLI does not accept a –storage-class flag while creating a bucket Uploaded objects will end up in the default storage class of the placement rule. A --storage-class provided during upload of objects results in expected behaviour.

Changing Default Placement Rule for the User

A default placement rule with the default storage class can also be configured for a specific user via the croit UI:

The same over the command-line:

$ radosgw-admin user modify \
--uid lifecycle-user \
--placement-id cold-data-placement \
--storage-class STANDARD
{
[...]
"default_placement": "cold-data-placement", "default_storage_class": "STANDARD",
[...]
}
  • radosgw-admin user modify – modifies some property of the user
  • --uid lifecycle-user – modify the user with the username lifecycle-user
  • --placement-id cold-data-placement – set the default placement rule of the user to cold-data-placement
  • --storage-class STANDARD – set the default storage class of the user to STANDARD for the placement rule cold-data-placement

Storage Classes

In Ceph, storage classes are components of placement rules that associate them with specific data pools. Storage classes are a native S3 concept that Ceph incorporates into its own nomenclature with the help of placement rules and pools. Applications however do not know what placement rules or pools are. Ceph exposes zonegroups and placement rules to applications as parts of the LocationConstraint attribute (analogous to region in some applications) that clients can include in their requests to choose a particular zonegroup and a placement rule.

A placement rule can have multiple storage classes. Every placement rule is created with a storage class called STANDARD by default but this STANDARD storage class might correspond to the same or different data pools across different placement rules. This is because Ceph allows you to choose a specific data pool during placement rule creation, which then ends up in the default, STANDARD storage class.

In the picture, the same STANDARD storage class is tied in with 3 different ceph pools. The cold-data-placement rule has a STANDARD storage class which corresponds to the default.rgw.buckets.cold-data pool. The default-placement rule has a storage class which is also called STANDARD but it corresponds to the default.rgw.buckets.data pool. While the hot-data-placement rule has a storage class with the same name corresponding to yet another pool, called default.rgw.buckets.hot-data. Therefore, storage classes are unique to their pools but not as a whole.

Creating a Storage Class

During the creation of a storage class, only the data pool is selected. The index and extra pools, responsible for bucket index and metadata, are selected during placement rule creation. When a new storage class is created for an existing placement rule, the index and extra pools which are associated with the placement rule stay the same.

A storage class can be created easily in the croit UI by visiting S3 → Placements, selecting a placement rule and clicking on, “Add Storage Class.”

As an example, a new storage class cold-data-new-storage-class is being added to the cold-data-placement rule that maps to the data pool default.rgw.buckets.cold-data-new-storage-pool.

The same can be achieved over the command line with radosgw-admin:

$ radosgw-admin zonegroup placement add \
--rgw-zonegroup default \
--placement-id cold-data-placement \
--storage-class cold-data-new-storage-class
[
{
"key": "cold-data-placement",
"val": {
"name": "cold-data-placement",
"tags": [],
"storage_classes": [
"STANDARD",
"cold-data-new-storage-class"
]
}
},
[...]
$ radosgw-admin zone placement add \
--rgw-zone default \
--placement-id cold-data-placement \
--storage-class cold-data-new-storage-class \
--data-pool default.rgw.buckets.cold-data-new-storage-pool \
--compression lz4
[...]
"placement_pools": [
{
"key": "cold-data-placement",
"val": {
"index_pool": "default.rgw.buckets.cold-data-index",
"storage_classes": {
"STANDARD": {
"data_pool": "default.rgw.buckets.cold-data"
},
"cold-data-new-storage-class": {
"data_pool": "default.rgw.buckets.cold-data-new-storage-pool",
"compression_type": "lz4"
}
},
"data_extra_pool": "default.rgw.buckets.cold-data-extra",
"index_type": 0,
"inline_data": "true"
}
},
[...]

Let’s break the commands down. The first command is used to add a new storage class to the placement rule in the zonegroup. While the second command applies to the zone (the ceph cluster) which maps a data pool and any other attributes to the newly added storage class.

First command:

  • radosgw-admin zonegroup placement add – add a placement to the zonegroup
  • --rgw-zonegroup default – apply to the zonegroup called default
  • --placement-id cold-data-placement – set the placement rule to cold-data-placement
  • --storage-class cold-data-new-storage-class – add a storage class cold-data-new-storage-class to the placement rule

Second command:

  • radosgw-admin zone placement add – add a placement to the zone
  • --rgw-zone default – apply to the zone called default
  • --placement-id cold-data-placement – set the placement rule to cold-data-placement
  • --storage-class cold-data-new-storage-class – set the storage class to the newly added, cold-data-new-storage-class
  • --data-pool default.rgw.buckets.cold-data-new-storage-pool – map the data pool default.rgw.buckets.cold-data-new-storage-pool to the newly added storage class
  • --compression lz4 – apply lz4 compression to the objects that will be uploaded

Changing Default Storage Class

There are two ways to change the default storage class. This can be done with an undocumented flag to radosgw-admin that applies it globally or it can be done per user by setting the default_storage_class attribute for the user.

  • These changes apply to new buckets only, and if both are set then the user attribute takes precedence.
  • These changes are not relevant if the client sends the x-amz-storage-class header as STANDARD without the user specifying a storage class. In that case, the storage class included in the header will be chosen.

Globally

Currently, radosgw-admin accepts an undocumented --storage-class flag for changing the default storage class for a particular placement rule.

1. In the first command, we are setting a storage class (which is not STANDARD) as the default for an already existing placement rule. The output of this command simply shows us that there are multiple storage classes for the placement rule. However it does not tell us if it’s the default yet.

$ radosgw-admin zonegroup placement default \

--rgw-zonegroup default \

--placement-id cold-data-placement \

--storage-class cold-data-new-storage-class

[

{

"key": "cold-data-placement",

"val": {

"name": "cold-data-placement",

"tags": [],

"storage_classes": [

"STANDARD",

"cold-data-new-storage-class"

]

}

},

[...]

  • radosgw-admin zonegroup placement default – set a default placement rule (and storage class) for a zonegroup
  • --rgw-zonegroup default – apply to the zonegroup named default
  • --placement-id cold-data-placement – set the placement rule cold-data-placement as the default
  • --storage-class cold-data-new-storage-class – set the storage class cold-data-new-storage-class (belonging to cold-data-placement) as the default

2. The second command confirms that the default placement rule and the default storage class for that placement rule is the one we intended to set. Then we update the period to commit our changes.

$ radosgw-admin zonegroup get

[...]

"default_placement": "cold-data-placement/cold-data-new-storage-class",

[...]

$ radosgw-admin period update --commit

[...]

"default_placement": "cold-data-placement/cold-data-new-storage-class",

[...]

  • radosgw-admin zonegroup get – fetches attributes set for the current default zonegroup.
  • radosgw-admin period update --commit – update the period (increment the epoch) with our changes included.

3. Then from a client, let’s create a new user so there are no defaults set, a new bucket and inspect its placement_rule.

# from the management host

$ radosgw-admin user create \

--uid lifecycle-user-test \

--display-name lifecycle-user-test

[...]

"default_placement": "",

"default_storage_class": "",

[...]

# from the client

$ s3cmd -c ~/.lc-user-test.cfg mb s3://cold-bucket-test

Bucket 's3://cold-bucket-test/' created

# from the management host

$ radosgw-admin bucket stats

[...]

"bucket": "cold-bucket-test",

[...]

"placement_rule": "cold-data-placement/cold-data-new-storage-class",

[...]

  1. We create a new user here because It’s currently not possible to set the default_placement rule of an user to empty again once set.
    First command:
  • radosgw-admin user create – create a new user
  • --uid lifecycle-user-test – set the uid to lifecycle-user-test
  • --display-name lifecycle-user-test – set the display name to the same
    Second command:
  • s3cmd -c ~/.lc-user-test.cfg mb s3://cold-bucket-test – create a new bucket with s3cmd using an alternative configuration file for the newly created user.

Third command:

  • radosgw-admin bucket stats – display attributes of all buckets (only our newly created bucket is relevant here)

4. Let’s upload a file and inspect where it ends up with both s3cmd and ceph df. We use rclone to upload the file here as it does not send the x-amz-storage-class header by default. Doing so would invalidate our testing as the header’s value will take precedence.

$ ceph df

[...]

default.rgw.buckets.cold-data 12 2 25 MiB 11 75 MiB 0.02 116 GiB

default.rgw.buckets.cold-data-index 13 2 16 KiB 33 48 KiB 0 29 GiB

default.rgw.buckets.cold-data-extra 14 2 0 B 0 0 B 0 29 GiB

default.rgw.buckets.cold-data-new-storage-class-pool 15 2 16 MiB 4 48 MiB 0.01 116 GiB

$ rclone copy \

--dump headers \

test-2.png lc-user-test:/cold-bucket-test/

[...]

2024/05/30 22:24:17 DEBUG : HTTP REQUEST (req 0x40002b65a0)

2024/05/30 22:24:17 DEBUG : PUT /cold-bucket-test/test-2.png HTTP/1.1

Host: 172.31.117.89

User-Agent: rclone/v1.66.0 Content-Length: 12582912

Authorization: XXXX

Content-Md5: nE3pdbRa9+m/enDid8wDFw==

Content-Type: image/png

Expect: 100-continue

X-Amz-Content-Sha256: UNSIGNED-PAYLOAD

X-Amz-Date: 20240530T165417Z

X-Amz-Meta-Mtime: 1717077391.640079251

Accept-Encoding: gzip

[...]

2024/05/30 22:24:35 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

2024/05/30 22:24:35 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

2024/05/30 22:24:35 DEBUG : HTTP RESPONSE (req 0x4000892900)

2024/05/30 22:24:35 DEBUG : HTTP/1.1 200 OK

Content-Length: 12582912

Accept-Ranges: bytes

Connection: Keep-Alive

Content-Type: image/png

Date: Thu, 30 May 2024 16:54:35 GMT

Etag: "9c4de975b45af7e9bf7a70e277cc0317"

Last-Modified: Thu, 30 May 2024 16:54:35 GMT

X-Amz-Meta-Mtime: 1717077391.640079251

X-Amz-Request-Id: tx000005fa8c56003cad81d-006658af4b-3f83ab-default

X-Amz-Storage-Class: cold-data-new-storage-class # <- what we wanted

X-Rgw-Object-Type: Normal

[...]

$ s3cmd -c ~/.lc-user-test.cfg info s3://cold-bucket-test/test-2.png

[...]

Storage: cold-data-new-storage-class

[...]

$ ceph df

[...]

default.rgw.buckets.cold-data 12 2 25 MiB 11 75 MiB 0.02 116 GiB

default.rgw.buckets.cold-data-index 13 2 16 KiB 33 48 KiB 0 29 GiB

default.rgw.buckets.cold-data-extra 14 2 0 B 0 0 B 0 29 GiB

default.rgw.buckets.cold-data-new-storage-class-pool 15 2 28 MiB 7 84 MiB 0.02 116 GiB

We inspect the usage of the pools first with ceph df.

Then we copy a test object into the newly created bucket with rclone. The response itself includes in which storage class the object is ending up – X-Amz-Storage-Class: cold-data-new-storage-class.

Two additional confirmations are then obtained with s3cmd info and ceph df which inspect the object’s metadata and the cluster usage respectively.

Per User

The storage classes can also be changed for the user, either by the admin using radosgw-admin or by the user at the time of the upload using a CLI tool such as s3cmd.

By the Admin

If it’s desired that a certain user’s uploads, by default, goto a different storage class, this can be done by modifying the default_storage_class key for the specific user with radowgw-admin.

$ radosgw-admin user info \

--uid lifecycle-user-test

[...]

"default_placement": "",

"default_storage_class": "",

[...]

# after adding a new storage class - hot-data-new-storage-class

# which maps to a new pool - default.rgw.buckets.hot-data-new-storage-class-pool

$ radosgw-admin user modify \

--uid lifecycle-user-test \

--placement-id hot-data-placement \

--storage-class hot-data-new-storage-class

[...]

"default_placement": "hot-data-placement",

"default_storage_class": "hot-data-new-storage-class",

[...]

Now the user’s uploads will, by default, end up in the default.rgw.buckets.hot-data-new-storage-class-pool pool as that’s associated with the storage class (hot-data-new-storage-class) that we set for the user. (unless the client modifies the x-amz-storage-class header, which rclone does not). Changing attributes to the user does not require the period to be updated.

Let’s do the same experiment as before: create a new bucket and inspect its placement_rule attribute. The commands have the same explanation as above.

$ s3cmd -c ~/.lc-user-test.cfg mb s3://hot-bucket-test

Bucket 's3://hot-bucket-test/' created

$ radosgw-admin bucket stats

[...]

"bucket": "hot-bucket-test",

[...]

"placement_rule": "hot-data-placement/hot-data-new-storage-class",

[...]

Then let’s try to upload an object with rclone and see where it ends up, of course with multiple confirmations.

$ ceph df

[...]

default.rgw.buckets.hot-data 9 2 0 B 0 0 B 0 29 GiB

default.rgw.buckets.hot-data-extra 10 2 0 B 0 0 B 0 29 GiB

default.rgw.buckets.hot-data-index 11 2 0 B 22 0 B 0 29 GiB

[...]

default.rgw.buckets.hot-data-new-storage-class-pool 16 2 8 MiB 2 24 MiB 0.03 29 GiB

$ rclone copy --dump headers test-2.png lc-user-test:/hot-bucket-test/

[...]

2024/05/30 22:50:46 DEBUG : HTTP REQUEST (req 0x4000af26c0)

2024/05/30 22:50:46 DEBUG : PUT /hot-bucket-test/test-2.png HTTP/1.1

Host: 172.31.117.89

User-Agent: rclone/v1.66.0

Content-Length: 12582912

Authorization: XXXX

Content-Md5: nE3pdbRa9+m/enDid8wDFw==

Content-Type: image/png

Expect: 100-continue

X-Amz-Content-Sha256: UNSIGNED-PAYLOAD

X-Amz-Date: 20240530T172046Z

X-Amz-Meta-Mtime: 1717077391.640079251

Accept-Encoding: gzip

[...]

2024/05/30 22:50:54 DEBUG : HTTP RESPONSE (req 0x4000af2ea0)

2024/05/30 22:50:54 DEBUG : HTTP/1.1 200 OK

Content-Length: 12582912

Accept-Ranges: bytes

Connection: Keep-Alive

Content-Type: image/png

Date: Thu, 30 May 2024 17:20:53 GMT

Etag: "9c4de975b45af7e9bf7a70e277cc0317"

Last-Modified: Thu, 30 May 2024 17:20:53 GMT

X-Amz-Meta-Mtime: 1717077391.640079251

X-Amz-Request-Id: tx000006c1a258120c1916a-006658b575-420d3e-default

X-Amz-Storage-Class: hot-data-new-storage-class

X-Rgw-Object-Type: Normal

[...]

$ s3cmd -c ~/.lc-user-test.cfg info s3://hot-bucket-test/test-2.png

[...]

Storage: hot-data-new-storage-class

[...]

$ ceph df

[...]

default.rgw.buckets.hot-data 9 2 0 B 0 0 B 0 29 GiB

default.rgw.buckets.hot-data-extra 10 2 0 B 0 0 B 0 29 GiB

default.rgw.buckets.hot-data-index 11 2 0 B 22 0 B 0 29 GiB

[...]

default.rgw.buckets.hot-data-new-storage-class-pool 16 2 20 MiB 5 60 MiB 0.07 29 GiB

Same as before, we have multiple confirmations from rclones3cmd and ceph df that our object landed where we wanted it to.

By the User

This is by far the most straightforward and convenient method. This method does not require administrator intervention and a storage class can be included in the request header by the client.

For aws-cli and s3cmd, the flag is --storage-class and for rclone, it’s --s3-storage-class.

Let’s start from a clean state: remove the existing cold-bucket, create it again, check its placement-rule attribute and then start uploading objects.

# remove the bucket

$ s3cmd rb --force --recursive s3://cold-bucket

# create it again

$ s3cmd mb s3://cold-bucket

Bucket 's3://cold-bucket/' created

# check bucket attribute

$ radosgw-admin bucket stats

[...]

"bucket": "cold-bucket",

[...]

"placement_rule": "cold-data-placement",

[...]

Our placement rule now says, cold-data-placement which means our objects would normally end up in the default.rgw.buckets.cold-data which is tied to the STANDARD storage class.

We can override that by providing a --storage-class flag with our requests.

# with aws cli

$ aws --endpoint-url http://172.31.117.89 \

s3api put-object \

--bucket cold-bucket \

--key test-1.png \

--body test-1.png \

--storage-class cold-data-new-storage-class

{

"ETag": "\"094a332b16cab2c1e26e891690654d07\""

}

# with s3cmd $ s3cmd put \

--storage-class cold-data-new-storage-class \

test-2.png s3://cold-bucket

upload: 'test-2.png' -> 's3://cold-bucket/test-2.png' [1 of 1]

12582912 of 12582912 100% in 5s 2.38 MB/s done

# with rclone

$ rclone copy --s3-storage-class cold-data-new-storage-class test-3.png lc-user:/cold-bucket/

Now that our objects are uploaded, let’s confirm that they landed in the correct storage class with s3cmd.

$ s3cmd info s3://cold-bucket/test-1.png

[...]

Storage: cold-data-new-storage-class

[...]

[kayg@arch ~]$ s3cmd info s3://cold-bucket/test-2.png

[...]

Storage: cold-data-new-storage-class

[...]

[kayg@arch ~]$ s3cmd info s3://cold-bucket/test-3.png [...]

Storage: cold-data-new-storage-class

[...]

Our objects did end up at the right place! As previously mentioned, this is the easiest and most recommended method of choosing a different storage class as this method works well with existing buckets and does not require creation of new ones.

Lifecycle Policies

Every bucket can be configured to have a lifecycle policy in place, which would decide actions based on the objects’ lifetime. For example, an object can be moved to another storage class or deleted altogether based on the number of days the object has been in the bucket. Lifecycle policies apply to both existing and new objects.

To emphasize our usecase, below is a lifecycly policy that transitions objects to a cold-data-new-storage-class once they have been in the bucket for a year.

{

"Rules": [

{

"ID": "TransitionToNewStorageClass",

"Filter": {

"Prefix": ""

},

"Status": "Enabled",

"Transitions": [

{

"Days": 365,

"StorageClass": "cold-data-new-storage-class"

}

]

}

]

}

  • ID – Description of the policy
  • Filter – Select an object based on the following conditions
    • Prefix – Set to empty as we want to select all objects in the bucket
  • Status – Enabled or Disablde
  • Transtiions – Specifies transition criteria and destination
    • Days – If more than 365 days old. If the value is set to 0, objects aren’t immediately transitioned, rather they wait till the next midnight.
    • StorageClass – Destination storage class to move the objects to.

To apply it the cluster, navigate to S3 → Buckets, select the bucket → Edit (or right click on the bucket) → Policy, like so

The same can be done over the command-line with aws-cli which selects a bucket and applies the lifecycle policy stored in a json file.

$ aws s3api put-bucket-lifecycle-configuration \

--bucket cold-bucket \

--lifecycle-configuration file://lifecycle.json

Conclusion

As we saw, there are a few odd cases with Ceph S3 that might surprise Amazon S3 users and administrators alike. Here is a summary matrix for all the test cases above. The highlighted fields are the recommended methods as those provide the least resistance to the user.

Changing Default Placement RuleChanging Default Storage Class for the selected Placement Rule
Globally, for buckets
New buckets only (needs rgw period to be committed)
New buckets only (needs rgw period to be committed)
Globally, for a user
New buckets only
New buckets only
User during bucket creation
Possible with the LocationConstraint header
Not possible.
User during upload
Buckets cannot change placement rules after creation.
Possible with the --storage-class or equivalent option

Overall, Ceph S3 offers a seamless transition for administrators accustomed to Amazon S3. Like its counterpart, it provides distinct storage classes tailored for both hot and cold data storage needs—utilizing faster devices for frequently accessed data and spinning drives for less active data. Additionally, Ceph S3 supports advanced tiering strategies suitable for enterprise data centers looking to optimize storage based on access patterns and data longevity. Whether it’s for long-term archiving or rapid data retrieval, Ceph S3 presents a robust, open-source solution that caters to a wide range of storage requirements.