Introduction
One of the selling aspects of Amazon S3 is that it allows users to choose from various storage classes based on data access frequency. Data often accessed is stored in the “Standard” class, while less frequently accessed data goes into the “Infrequently Accessed” (IA) class, where it can be moved if not accessed over a set period. This setup aligns storage costs with retrieval needs. Comparatively, Ceph S3 implements a similar approach with its own storage classes, making it an intriguing point of reference for users familiar with Amazon S3’s structure.
The Ceph S3 lingo is largely the same. These different places to store data are called Storage Classes and the default one is called STANDARD
. However, STANDARD
is the only storage class Ceph includes, the others need to be created by the administrator. On top of that, Ceph’s storage classes are implemented by so-called “placement rules” which determine where data is stored. These placement rules also specify a set of storage classes available to users, and each bucket is configured with a placement rule.
Summary
In this blog post, we are going to expand on the following points:
-
Each bucket in Ceph S3 is configured with a specific placement rule that dictates where data should be stored in ceph pools. Once a bucket is created, its placement rule is permanent and cannot be altered.
-
Placement rules are integral in defining storage classes, where each class specifies a designated ceph pool for data storage. S3 users have the flexibility to select the storage class that best fits their needs, and administrators can manage access to different storage classes for each user.
-
To optimize storage efficiency, data can be dynamically transitioned between storage classes using Lifecycle Policies, allowing for automated management based on predefined criteria.
Test Setup
In the rest of the article, we will be using a ceph cluster and a client to test and verify our claims.
The ceph cluster is deployed using croit (v2405.1.quincy) with ceph 17.2.7 running the image: ID: 734d2072-f482-4314-afd1-76f86ba6518e
, name:
croit enterprise image for v2405 (quincy)
.
$ ceph -s
cluster:
id: 6ba5438b-a632-4c53-a25f-dacd6710396e
health: HEALTH_OK
services:
mon: 5 daemons, quorum code01,code02,code03,code04,code05 (age 42h)
mgr: code01(active, since 3d), standbys: code05, code04, code02, code03
osd: 15 osds: 15 up (since 5d), 15 in (since 5d)
rgw: 5 daemons active (5 hosts, 1 zones)
data:
pools: 16 pools, 273 pgs
objects: 334 objects, 13 MiB
usage: 8.7 GiB used, 459 GiB / 468 GiB avail
pgs: 273 active+clean
The client is an Arch Linux VM with the following client versions:
$ aws --version aws-cli/1.32.106 Python/3.12.3 $ s3cmd --version s3cmd version 2.4.0 $ rclone version rclone v1.66.0
Pools
The RADOS Gateway ships with the following list of default pools. Their names start with .rgw
, and since we are not running a multi-site setup for our test cluster, the zonegroup is default
. However, for this article, only the following are important:
-
default.rgw.buckets.data
– can be used by thedata_pool
key of a placement rule (tied to the storage class) which stores the actual data in the pool. -
default.rgw.buckets.non-ec
– can be used by thedata_extra_pool
key of a placement rule which stores metadata for incomplete multi-part uploads. -
default.rgw.buckets.index
– can be used by theindex_pool
key of a placement rule which stores the bucket index information.
Placement Rules
Placement rules in Ceph map a specific data pool, index pool, and extra pool to a particular bucket. They are assigned to buckets at the time of creation and cannot be changed thereafter. The default placement rule is called, default-placement
. It links to the default RADOS gateway index (default.rgw.buckets.index
) and extra (default.rgw.buckets.non-ec
) pools, and the default data pool (default.rgw.buckets.data
) is linked to the STANDARD
storage class which every placement rule includes.
Placement rules can be used to have multiple device classes tied to a single bucket. For example, a bucket can store some of its data on a set of ceph pools that are backed by SSD and some pools that are backed by HDD. Although a bucket can have multiple placement rules, the default placement rule configuration is best done for the user with radosgw-admin
(or in the croit UI: S3 → Users → Edit User) than for the bucket. This is because, as previously mentioned, default placement rules for a bucket are set when buckets are created and cannot be changed after. However, the same attribute can changed by the admin for a user with radosgw-admin
which will apply to new buckets only. The user can also choose a custom placement rule during creation (not upload) using tools like s3cmd
or aws-cli
.
Creating a New Placement Rule
During the creation of a new placement rule, custom pools can be chosen that would correspond with the STANDARD
storage class of the new placement rule. Storage classes are explained further down in the article, which can be specified by user-facing tools like s3cmd and aws-cli to influence data placement into different pools.
For demonstration, let’s create a few new pools with the keywords, hot
and cold
, meaning frequently and infrequently accessed data respectively. After creation of those pools, the same pools page would look something like this.
As seen from the picture, the cold
data pool corresponds a hdd-only
crush rule that would place all the objects residing in the pool on spinning drives while the metadata and index still lives on flash. Similarly, for all the hot
pools the associated crush rule is ssd-only
which means the actual data, metadata and bucket index, all live on flash.
From here on, we can proceed to create two new placement rules.
Once created, the placement targets look like this. Please note that these placement targets are not limited to a single bucket. Multiple buckets and multiple users can use the same placement rule. As seen in the diagram in the summary of the article, placement rules are responsible for selecting data, index, and extra pools wherein the data pool part is decided by or tied to the storage class.
Changing Default Placement Rule for New Buckets
The default placement rule can be changed so that it can be auto-selected during creation of a new bucket or a new user.
On the command-line it would look like:
$ radosgw-admin zonegroup placement default \
--rgw-zonegroup default \
--placement-id hot-data-placement$ ceph -s
The command does the following:
-
radosgw-admin zonegroup placement default
– Choose a default placement group for a particular zonegroup -
--rgw-zonegroup default
– This default placement group is only applied to the zonegroup nameddefault
. The zone (the ceph cluster) is responsible for mapping the placement rule to the pools. -
--placement-id hot-data-placement
– Name of the new placement rule to be set as default.
Although a new storage class can be created for the same placement rule and it can be set to default; doing so has no effect whatsoever. This is also explained further down in the article.
Bucket Creation with Non-Default Placement Rule with croit
Placement rules can be easily selected during bucket creation:
Bucket Creation with Non-Default Placement Rule with CLI Tools
When creating a bucket, in both instances, the default:hot-data-placement
header is passed in the request that tells ceph which zonegroup and placement rule to choose.
The client is an Arch Linux VM with the following client versions:
With AWS CLI, it would look like this:
$ aws --endpoint-url http://172.31.117.89 \
s3api create-bucket \
--bucket hot-bucket \
--create-bucket-configuration 'LocationConstraint=default:hot-data-placement' \
--debug
[...]
2024-05-30 23:41:59,739 - MainThread - botocore.endpoint - DEBUG - Making request for OperationModel(name=CreateBucket) with params: {'url_path': '', 'query_string': {}, 'method': 'PUT', 'headers': {'User-Agent': 'aws-cli/1.32.106 md/Botocore#1.34.106 ua/2.0 os/linux#6.7.12-orbstack-00202-g57474688ffbd md/arch#aarch64 lang/python#3.12.3 md/pyimpl#CPython cfg/retry-mode#legacy botocore/1.34.106'}, 'body': b'default:hot-data-placement', 'auth_path': '/hot-bucket/', 'url': 'http://172.31.117.89/hot-bucket', 'context': {'client_region': 'us-east-1', 'client_config': <botocore.config.Config object at 0xffffa9cd8f50>, 'has_streaming_input': False, 'auth_type': 'v4', 's3_redirect': {'redirected': False, 'bucket': 'hot-bucket', 'params': {'Bucket': 'hot-bucket', 'CreateBucketConfiguration': {'LocationConstraint': 'default:hot-data-placement'}}}, 'input_params': {'Bucket': 'hot-bucket'}, 'signing': {'region': 'us-east-1', 'signing_name': 's3', 'disableDoubleEncoding': True}, 'endpoint_properties': {'authSchemes': [{'disableDoubleEncoding': True, 'name': 'sigv4', 'signingName': 's3', 'signingRegion': 'us-east-1'}]}}}
[...]
-
aws --endpoint-url http://172.31.117.89
– The endpoint URL is the URL of the RGW service or a load balancer that reverse proxies to multiple RGW services -
s3api create-bucket
– create a bucket using the s3 api -
--bucket hot-bucket
– the name of the bucket ishot-bucket
-
--create-bucket-configuration 'LocationConstraint=default:hot-data-placement'
– apply the given configuration: theLocationConstraint
here withs3api
corresponds to thezonegroup
‘sapi_name
(here:default
) and a custom placement rule (here:hot-data-placement
), separated with a:
.
With S3CMD, it would look like:
$ s3cmd mb s3://cold-bucket \ --region default:cold-data-placement \ --debug [...] DEBUG: bucket_location: default:cold-data-placement [...]
-
s3cmd mb s3://cold-bucket
– make bucket with s3, name itcold-bucket
-
--region default:cold-data-placement
– the region (similar to aws cli) is:
Using --storage-class
with s3cmd to set a non-default storage class during creation also has no effect. AWS CLI does not accept a –storage-class flag while creating a bucket Uploaded objects will end up in the default storage class of the placement rule. A --storage-class
provided during upload of objects results in expected behaviour.
Changing Default Placement Rule for the User
A default placement rule with the default storage class can also be configured for a specific user via the croit UI:
The same over the command-line:
$ radosgw-admin user modify \ --uid lifecycle-user \ --placement-id cold-data-placement \ --storage-class STANDARD { [...] "default_placement": "cold-data-placement", "default_storage_class": "STANDARD", [...] }
-
radosgw-admin user modify
– modifies some property of the user -
--uid lifecycle-user
– modify the user with the usernamelifecycle-user
-
--placement-id cold-data-placement
– set the default placement rule of the user tocold-data-placement
-
--storage-class STANDARD
– set the default storage class of the user toSTANDARD
for the placement rulecold-data-placement
Storage Classes
In Ceph, storage classes are components of placement rules that associate them with specific data pools. Storage classes are a native S3 concept that Ceph incorporates into its own nomenclature with the help of placement rules and pools. Applications however do not know what placement rules or pools are. Ceph exposes zonegroups and placement rules to applications as parts of the LocationConstraint
attribute (analogous to region
in some applications) that clients can include in their requests to choose a particular zonegroup and a placement rule.
A placement rule can have multiple storage classes. Every placement rule is created with a storage class called STANDARD
by default but this STANDARD
storage class might correspond to the same or different data pools across different placement rules. This is because Ceph allows you to choose a specific data pool during placement rule creation, which then ends up in the default, STANDARD
storage class.
In the picture, the same STANDARD
storage class is tied in with 3 different ceph pools. The cold-data-placement
rule has a STANDARD
storage class which corresponds to the default.rgw.buckets.cold-data
pool. The default-placement
rule has a storage class which is also called STANDARD
but it corresponds to the default.rgw.buckets.data
pool. While the hot-data-placement
rule has a storage class with the same name corresponding to yet another pool, called default.rgw.buckets.hot-data
. Therefore, storage classes are unique to their pools but not as a whole.
Creating a Storage Class
During the creation of a storage class, only the data pool is selected. The index and extra pools, responsible for bucket index and metadata, are selected during placement rule creation. When a new storage class is created for an existing placement rule, the index and extra pools which are associated with the placement rule stay the same.
A storage class can be created easily in the croit UI by visiting S3 → Placements, selecting a placement rule and clicking on, “Add Storage Class.”
As an example, a new storage class cold-data-new-storage-class
is being added to the cold-data-placement
rule that maps to the data pool default.rgw.buckets.cold-data-new-storage-pool
.
The same can be achieved over the command line with radosgw-admin
:
$ radosgw-admin zonegroup placement add \ --rgw-zonegroup default \ --placement-id cold-data-placement \ --storage-class cold-data-new-storage-class [ { "key": "cold-data-placement", "val": { "name": "cold-data-placement", "tags": [], "storage_classes": [ "STANDARD", "cold-data-new-storage-class" ] } }, [...] $ radosgw-admin zone placement add \ --rgw-zone default \ --placement-id cold-data-placement \ --storage-class cold-data-new-storage-class \ --data-pool default.rgw.buckets.cold-data-new-storage-pool \ --compression lz4 [...] "placement_pools": [ { "key": "cold-data-placement", "val": { "index_pool": "default.rgw.buckets.cold-data-index", "storage_classes": { "STANDARD": { "data_pool": "default.rgw.buckets.cold-data" }, "cold-data-new-storage-class": { "data_pool": "default.rgw.buckets.cold-data-new-storage-pool", "compression_type": "lz4" } }, "data_extra_pool": "default.rgw.buckets.cold-data-extra", "index_type": 0, "inline_data": "true" } }, [...]
Let’s break the commands down. The first command is used to add a new storage class to the placement rule in the zonegroup. While the second command applies to the zone (the ceph cluster) which maps a data pool and any other attributes to the newly added storage class.
First command:
-
radosgw-admin zonegroup placement add
– add a placement to the zonegroup -
--rgw-zonegroup default
– apply to the zonegroup calleddefault
-
--placement-id cold-data-placement
– set the placement rule tocold-data-placement
-
--storage-class cold-data-new-storage-class
– add a storage classcold-data-new-storage-class
to the placement rule
Second command:
-
radosgw-admin zone placement add
– add a placement to the zone -
--rgw-zone default
– apply to the zone calleddefault
-
--placement-id cold-data-placement
– set the placement rule tocold-data-placement
-
--storage-class cold-data-new-storage-class
– set the storage class to the newly added,cold-data-new-storage-class
-
--data-pool default.rgw.buckets.cold-data-new-storage-pool
– map the data pooldefault.rgw.buckets.cold-data-new-storage-pool
to the newly added storage class -
--compression lz4
– apply lz4 compression to the objects that will be uploaded
Changing Default Storage Class
There are two ways to change the default storage class. This can be done with an undocumented flag to radosgw-admin
that applies it globally or it can be done per user by setting the default_storage_class
attribute for the user.
-
These changes apply to new buckets only, and if both are set then the user attribute takes precedence.
-
These changes are not relevant if the client sends the
x-amz-storage-class
header asSTANDARD
without the user specifying a storage class. In that case, the storage class included in the header will be chosen.
Globally
Currently, radosgw-admin
accepts an undocumented --storage-class
flag for changing the default storage class for a particular placement rule.
1. In the first command, we are setting a storage class (which is not STANDARD
) as the default for an already existing placement rule. The output of this command simply shows us that there are multiple storage classes for the placement rule. However it does not tell us if it’s the default yet.
$ radosgw-admin zonegroup placement default \ --rgw-zonegroup default \ --placement-id cold-data-placement \ --storage-class cold-data-new-storage-class [ { "key": "cold-data-placement", "val": { "name": "cold-data-placement", "tags": [], "storage_classes": [ "STANDARD", "cold-data-new-storage-class" ] } }, [...]
-
radosgw-admin zonegroup placement default
– set a default placement rule (and storage class) for a zonegroup -
--rgw-zonegroup default
– apply to the zonegroup nameddefault
-
--placement-id cold-data-placement
– set the placement rulecold-data-placement
as the default -
--storage-class cold-data-new-storage-class
– set the storage classcold-data-new-storage-class
(belonging tocold-data-placement
) as the default
2. The second command confirms that the default placement rule and the default storage class for that placement rule is the one we intended to set. Then we update the period to commit our changes.
$ radosgw-admin zonegroup get [...] "default_placement": "cold-data-placement/cold-data-new-storage-class", [...] $ radosgw-admin period update --commit [...] "default_placement": "cold-data-placement/cold-data-new-storage-class", [...]
-
radosgw-admin zonegroup get
– fetches attributes set for the current default zonegroup. -
radosgw-admin period update --commit
– update the period (increment the epoch) with our changes included.
3. Then from a client, let’s create a new user so there are no defaults set, a new bucket and inspect its placement_rule
.
# from the management host $ radosgw-admin user create \ --uid lifecycle-user-test \ --display-name lifecycle-user-test [...] "default_placement": "", "default_storage_class": "", [...] # from the client $ s3cmd -c ~/.lc-user-test.cfg mb s3://cold-bucket-test Bucket 's3://cold-bucket-test/' created # from the management host $ radosgw-admin bucket stats [...] "bucket": "cold-bucket-test", [...] "placement_rule": "cold-data-placement/cold-data-new-storage-class", [...]
-
We create a new user here because It’s currently not possible to set the
default_placement
rule of an user to empty again once set.
First command:
-
radosgw-admin user create
– create a new user -
--uid lifecycle-user-test
– set the uid tolifecycle-user-test
-
--display-name lifecycle-user-test
– set the display name to the sameSecond command:
-
s3cmd -c ~/.lc-user-test.cfg mb s3://cold-bucket-test
– create a new bucket withs3cmd
using an alternative configuration file for the newly created user.
Third command:
-
radosgw-admin bucket stats
– display attributes of all buckets (only our newly created bucket is relevant here)
4. Let’s upload a file and inspect where it ends up with both s3cmd
and ceph df
. We use rclone
to upload the file here as it does not send the x-amz-storage-class
header by default. Doing so would invalidate our testing as the header’s value will take precedence.
$ ceph df [...] default.rgw.buckets.cold-data 12 2 25 MiB 11 75 MiB 0.02 116 GiB default.rgw.buckets.cold-data-index 13 2 16 KiB 33 48 KiB 0 29 GiB default.rgw.buckets.cold-data-extra 14 2 0 B 0 0 B 0 29 GiB default.rgw.buckets.cold-data-new-storage-class-pool 15 2 16 MiB 4 48 MiB 0.01 116 GiB $ rclone copy \ --dump headers \ test-2.png lc-user-test:/cold-bucket-test/ [...] 2024/05/30 22:24:17 DEBUG : HTTP REQUEST (req 0x40002b65a0) 2024/05/30 22:24:17 DEBUG : PUT /cold-bucket-test/test-2.png HTTP/1.1 Host: 172.31.117.89 User-Agent: rclone/v1.66.0 Content-Length: 12582912 Authorization: XXXX Content-Md5: nE3pdbRa9+m/enDid8wDFw== Content-Type: image/png Expect: 100-continue X-Amz-Content-Sha256: UNSIGNED-PAYLOAD X-Amz-Date: 20240530T165417Z X-Amz-Meta-Mtime: 1717077391.640079251 Accept-Encoding: gzip [...] 2024/05/30 22:24:35 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2024/05/30 22:24:35 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 2024/05/30 22:24:35 DEBUG : HTTP RESPONSE (req 0x4000892900) 2024/05/30 22:24:35 DEBUG : HTTP/1.1 200 OK Content-Length: 12582912 Accept-Ranges: bytes Connection: Keep-Alive Content-Type: image/png Date: Thu, 30 May 2024 16:54:35 GMT Etag: "9c4de975b45af7e9bf7a70e277cc0317" Last-Modified: Thu, 30 May 2024 16:54:35 GMT X-Amz-Meta-Mtime: 1717077391.640079251 X-Amz-Request-Id: tx000005fa8c56003cad81d-006658af4b-3f83ab-default X-Amz-Storage-Class: cold-data-new-storage-class # <- what we wanted X-Rgw-Object-Type: Normal [...] $ s3cmd -c ~/.lc-user-test.cfg info s3://cold-bucket-test/test-2.png [...] Storage: cold-data-new-storage-class [...] $ ceph df [...] default.rgw.buckets.cold-data 12 2 25 MiB 11 75 MiB 0.02 116 GiB default.rgw.buckets.cold-data-index 13 2 16 KiB 33 48 KiB 0 29 GiB default.rgw.buckets.cold-data-extra 14 2 0 B 0 0 B 0 29 GiB default.rgw.buckets.cold-data-new-storage-class-pool 15 2 28 MiB 7 84 MiB 0.02 116 GiB
We inspect the usage of the pools first with ceph df
.
Then we copy a test object into the newly created bucket with rclone
. The response itself includes in which storage class the object is ending up – X-Amz-Storage-Class: cold-data-new-storage-class
.
Two additional confirmations are then obtained with s3cmd info
and ceph df
which inspect the object’s metadata and the cluster usage respectively.
Per User
The storage classes can also be changed for the user, either by the admin using radosgw-admin
or by the user at the time of the upload using a CLI tool such as s3cmd
.
By the Admin
If it’s desired that a certain user’s uploads, by default, goto a different storage class, this can be done by modifying the default_storage_class
key for the specific user with radowgw-admin
.
$ radosgw-admin user info \
--uid lifecycle-user-test
[...]
"default_placement": "",
"default_storage_class": "",
[...]
# after adding a new storage class - hot-data-new-storage-class
# which maps to a new pool - default.rgw.buckets.hot-data-new-storage-class-pool
$ radosgw-admin user modify \
--uid lifecycle-user-test \
--placement-id hot-data-placement \
--storage-class hot-data-new-storage-class
[...]
"default_placement": "hot-data-placement",
"default_storage_class": "hot-data-new-storage-class",
[...]
Now the user’s uploads will, by default, end up in the default.rgw.buckets.hot-data-new-storage-class-pool
pool as that’s associated with the storage class (hot-data-new-storage-class
) that we set for the user. (unless the client modifies the x-amz-storage-class
header, which rclone does not). Changing attributes to the user does not require the period to be updated.
Let’s do the same experiment as before: create a new bucket and inspect its placement_rule
attribute. The commands have the same explanation as above.
$ s3cmd -c ~/.lc-user-test.cfg mb s3://hot-bucket-test
Bucket 's3://hot-bucket-test/' created
$ radosgw-admin bucket stats
[...]
"bucket": "hot-bucket-test",
[...]
"placement_rule": "hot-data-placement/hot-data-new-storage-class",
[...]
Then let’s try to upload an object with rclone and see where it ends up, of course with multiple confirmations.
$ ceph df
[...]
default.rgw.buckets.hot-data 9 2 0 B 0 0 B 0 29 GiB
default.rgw.buckets.hot-data-extra 10 2 0 B 0 0 B 0 29 GiB
default.rgw.buckets.hot-data-index 11 2 0 B 22 0 B 0 29 GiB
[...]
default.rgw.buckets.hot-data-new-storage-class-pool 16 2 8 MiB 2 24 MiB 0.03 29 GiB
$ rclone copy --dump headers test-2.png lc-user-test:/hot-bucket-test/
[...]
2024/05/30 22:50:46 DEBUG : HTTP REQUEST (req 0x4000af26c0)
2024/05/30 22:50:46 DEBUG : PUT /hot-bucket-test/test-2.png HTTP/1.1
Host: 172.31.117.89
User-Agent: rclone/v1.66.0
Content-Length: 12582912
Authorization: XXXX
Content-Md5: nE3pdbRa9+m/enDid8wDFw==
Content-Type: image/png
Expect: 100-continue
X-Amz-Content-Sha256: UNSIGNED-PAYLOAD
X-Amz-Date: 20240530T172046Z
X-Amz-Meta-Mtime: 1717077391.640079251
Accept-Encoding: gzip
[...]
2024/05/30 22:50:54 DEBUG : HTTP RESPONSE (req 0x4000af2ea0)
2024/05/30 22:50:54 DEBUG : HTTP/1.1 200 OK
Content-Length: 12582912
Accept-Ranges: bytes
Connection: Keep-Alive
Content-Type: image/png
Date: Thu, 30 May 2024 17:20:53 GMT
Etag: "9c4de975b45af7e9bf7a70e277cc0317"
Last-Modified: Thu, 30 May 2024 17:20:53 GMT
X-Amz-Meta-Mtime: 1717077391.640079251
X-Amz-Request-Id: tx000006c1a258120c1916a-006658b575-420d3e-default
X-Amz-Storage-Class: hot-data-new-storage-class
X-Rgw-Object-Type: Normal
[...]
$ s3cmd -c ~/.lc-user-test.cfg info s3://hot-bucket-test/test-2.png
[...]
Storage: hot-data-new-storage-class
[...]
$ ceph df
[...]
default.rgw.buckets.hot-data 9 2 0 B 0 0 B 0 29 GiB
default.rgw.buckets.hot-data-extra 10 2 0 B 0 0 B 0 29 GiB
default.rgw.buckets.hot-data-index 11 2 0 B 22 0 B 0 29 GiB
[...]
default.rgw.buckets.hot-data-new-storage-class-pool 16 2 20 MiB 5 60 MiB 0.07 29 GiB
rclone
, s3cmd
and ceph df
that our object landed where we wanted it to.By the User
This is by far the most straightforward and convenient method. This method does not require administrator intervention and a storage class can be included in the request header by the client.
For aws-cli and s3cmd, the flag is --storage-class
and for rclone
, it’s --s3-storage-class
.
Let’s start from a clean state: remove the existing cold-bucket
, create it again, check its placement-rule
attribute and then start uploading objects.
# remove the bucket
$ s3cmd rb --force --recursive s3://cold-bucket
# create it again
$ s3cmd mb s3://cold-bucket
Bucket 's3://cold-bucket/' created
# check bucket attribute
$ radosgw-admin bucket stats
[...]
"bucket": "cold-bucket",
[...]
"placement_rule": "cold-data-placement",
[...]
Our placement rule now says, cold-data-placement
which means our objects would normally end up in the default.rgw.buckets.cold-data
which is tied to the STANDARD
storage class.
We can override that by providing a --storage-class
flag with our requests.
# with aws cli
$ aws --endpoint-url http://172.31.117.89 \
s3api put-object \
--bucket cold-bucket \
--key test-1.png \
--body test-1.png \
--storage-class cold-data-new-storage-class
{
"ETag": "\"094a332b16cab2c1e26e891690654d07\""
}
# with s3cmd $ s3cmd put \
--storage-class cold-data-new-storage-class \
test-2.png s3://cold-bucket
upload: 'test-2.png' -> 's3://cold-bucket/test-2.png' [1 of 1]
12582912 of 12582912 100% in 5s 2.38 MB/s done
# with rclone
$ rclone copy --s3-storage-class cold-data-new-storage-class test-3.png lc-user:/cold-bucket/
Now that our objects are uploaded, let’s confirm that they landed in the correct storage class with s3cmd
.
$ s3cmd info s3://cold-bucket/test-1.png
[...]
Storage: cold-data-new-storage-class
[...]
[kayg@arch ~]$ s3cmd info s3://cold-bucket/test-2.png
[...]
Storage: cold-data-new-storage-class
[...]
[kayg@arch ~]$ s3cmd info s3://cold-bucket/test-3.png [...]
Storage: cold-data-new-storage-class
[...]
Our objects did end up at the right place! As previously mentioned, this is the easiest and most recommended method of choosing a different storage class as this method works well with existing buckets and does not require creation of new ones.
Lifecycle Policies
Every bucket can be configured to have a lifecycle policy in place, which would decide actions based on the objects’ lifetime. For example, an object can be moved to another storage class or deleted altogether based on the number of days the object has been in the bucket. Lifecycle policies apply to both existing and new objects.
To emphasize our usecase, below is a lifecycly policy that transitions objects to a cold-data-new-storage-class
once they have been in the bucket for a year.
{ "Rules": [ { "ID": "TransitionToNewStorageClass", "Filter": { "Prefix": "" }, "Status": "Enabled", "Transitions": [ { "Days": 365, "StorageClass": "cold-data-new-storage-class" } ] } ] }
-
ID
– Description of the policy -
Filter
– Select an object based on the following conditions-
Prefix
– Set to empty as we want to select all objects in the bucket
-
-
Status
– Enabled or Disablde -
Transtiions
– Specifies transition criteria and destination-
Days
– If more than 365 days old. If the value is set to 0, objects aren’t immediately transitioned, rather they wait till the next midnight. -
StorageClass
– Destination storage class to move the objects to.
-
To apply it the cluster, navigate to S3 → Buckets, select the bucket → Edit (or right click on the bucket) → Policy, like so
The same can be done over the command-line with aws-cli which selects a bucket and applies the lifecycle policy stored in a json file.
$ aws s3api put-bucket-lifecycle-configuration \
--bucket cold-bucket \
--lifecycle-configuration file://lifecycle.json
Conclusion
As we saw, there are a few odd cases with Ceph S3 that might surprise Amazon S3 users and administrators alike. Here is a summary matrix for all the test cases above. The highlighted fields are the recommended methods as those provide the least resistance to the user.
Changing Default Placement Rule | Changing Default Storage Class for the selected Placement Rule | |
---|---|---|
Globally, for buckets | New buckets only (needs rgw period to be committed) | New buckets only (needs rgw period to be committed) |
Globally, for a user | New buckets only | New buckets only |
User during bucket creation | Possible with the LocationConstraint header |
Not possible. |
User during upload | Buckets cannot change placement rules after creation. | Possible with the --storage-class or equivalent option |
Overall, Ceph S3 offers a seamless transition for administrators accustomed to Amazon S3. Like its counterpart, it provides distinct storage classes tailored for both hot and cold data storage needs—utilizing faster devices for frequently accessed data and spinning drives for less active data. Additionally, Ceph S3 supports advanced tiering strategies suitable for enterprise data centers looking to optimize storage based on access patterns and data longevity. Whether it’s for long-term archiving or rapid data retrieval, Ceph S3 presents a robust, open-source solution that caters to a wide range of storage requirements.