Software Release v1710

Our latest release holds a major optimization: Upgrade to Ceph Luminous automatically!

Until now, we’ve been using Ceph Kraken (11.2) to deploy new clusters. With release v1710 we now move forward to the latest longterm stable release Ceph Luminous (12.2). And here comes the best part: To minimize the troubles of upgrading, we’ve designed an easy way to migrate your existing cluster from Kraken to Luminous!

With the new command /script/cluster.upgrade.luminous.py you can upgrade a single or thousand servers while enjoying a nice hot cup of coffee. Yay! (and yes, we will offer an appropriate script for all future Ceph upgrades)

Upgrade to Ceph Luminous automatically

You should still keep in mind though that upgrading a Ceph cluster may be easy on the croit side, but you do need to make sure your clients are capable as well. We also strongly suggest to only do this in a planned maintenance window with reduced client I/O to avoid complications.

Of course, release v1710 comes along with a number of further optimizations focused on stability and better user experience, including the following:

GET ENHANCED INFORMATION FROM SYSTEM LOGS FOR EARLY FAILURE DETECTION

As we make use of system-journal, all of your configured servers will send the system log to our management node. These logs include even boot messages and very detailed service messages.

To access those logs, we provide you with a fully featured frontend with live log view on a global, per host or per service scale.

With our new logging feature, you will gain easy access to all the necessary information in order to detect and prevent system or service failures even faster.

Get enhanced information from System Logs for early failure detection

USE ADVANCED PLACEMENT GROUP (PG) MANAGEMENT TO MINIMIZE DOWNTIMES

With the new Ceph Luminous, we bring the PG force-backfill and force-recovery options to your hands. These features help administrating critical cluster states after hardware failures.

With the forced backfill, you can advice Ceph which data you want to get back to full redundancy faster. On the other hand, forced recovery offers a way to get critical PGs recovered earlier.

Both commands can drastically reduce production downtimes after critical hardware failures or after administrative errors.

Use Advanced Placement Group (PG) Management to minimize downtimes

EXPORT YOUR CEPHFS VIA NFS V4

As NFSv3 is an old (1995), mostly for UNIX based operating systems optimized file system, NFSv4 fills the gap to reach out on Windows clients as well. NFSv4 provides a lot of optimizations such as TCP communication, stronger security and a stateful protocol.

With NFSv4 you can now connect VMware and other Hypervisors on the new protocol stack with better performance, much easier firewall handling and many further improvements.

Export your CephFS via NFS v4

HIGH AVAILABILTY (HA) GROUPS FOR RGW

With our new release, you can set up HA groups not only for NFS, but also for RGW. This can be helpful for clients that cannot independently check the status of the services and migrate to another server.

INSTALL AND EDIT SERVICES ON SEVERAL SERVERS AT ONCE

Here comes another feature to save your time: If you want to install or edit services (NFS, RGW, MDS) on several servers, you had to click on each system, go to services view and then install a new service. A real time killer, isn’t it?

Here’s a better way: You now have the option to select multiple servers, and then install or configure the service on each of them.

BLUESTORE OSDS

Along with upgrading to Luminous, it’s now possible to get started with BlueStore OSDs. For now, we only support BlueStore OSDs with journal and data on the same disk, and will add external journal support in a future version. You should keep in mind though that BlueStore is quite new and it’s well possible there are still bugs on Ceph side.

On the other hand, BlueStore brings a major performance lift. On spinning disks you can reach a boost by 2 or 3 times to the old FileStore. BlueStore is also required if you want to make use of EC (erasure coding) pools to save some space with BlockStorage.

AUTOMATE YOUR TEST ENVIRONMENTS

As we install Ceph test clusters quite often, we added a little program that installs small test clusters with identical hardware hassle free.

After starting the fresh croit docker container, you can start the setup software called install.cluster.py. You’ll need to configure the script for your test cluster at first usage. To copy the current version of our script to your management host, just issue a docker cp command or checkout our github repository at github.com/croit.

The scripts available on github are also helpful if you want to train your employees, test new features or verify configuration changes.

BENEFIT FROM A MORE ROBUST HARDWARE DETECTION

Since the beginning of our journey to become the best Ceph Management Solution, we always tried to make your life easier. Due to an uncounted number of tests with different setups, we were able to optimize our Debian OS images and software based hardware detection.

With our new method, the detection of lost drives, new hardware or system errors are way faster than before. Those changes provide you with more reliable information through the croit interface.

SUPPORT FOR LACP PXE BOOTS

With LACP, a group of network interfaces becomes a logical one. This is causing some additional problems to our beloved boot procedure. Thanks to our software engineers, booting from a LACP channel can now be used for production clusters.

RUN CLUSTER MAINTENANCE OUTSIDE BUSINESS HOURS

A storage cluster is rarely used around the clock at full performance. With our latest changes, we make use of the out-of-office times and let Ceph scrub the disks to ensure data consistency.

This helps to offer best performance to your clients when needed, while maintaining data integrity all the time.

RESTART YOUR CLUSTER GRACEFULLY

As we offer newer OS images from time to time, you might sometimes feel the need to update your cluster. With small clusters you are able to do this yourself by hand, but with larger clusters the process can take hours in order to prevent client impact.

Our restart functionality makes sure that each server gets rebooted – but just one at a time, while monitoring cluster health and operation. Only when health is back to OK, the next system will be rebooted. This offers another pain-relieving solution for a typical administrative task.

LACP channels are our suggested way of connecting hosts to your network, in order to reach maximum performance and reliability.

MEMORIZE DATATABLE SETTINGS

For better usability we now save your individual settings per datatable, e.g. if you choose to show a field that’s normally hidden, you will see it again when you come back to the same view as if you never left.

UPDATE YOUR CROIT SOFTWARE

If you are already using croit, we strongly recommend to upgrade to croit v1710 to get the best possible experience! Just enter the following two commands on the management node:

docker rm -f croit
docker run --net=host --restart=always --volumes-from croit-data --name croit -d croit/croit:1710

Disclaimer: please make sure that you have a working backup before entering those commands.