- Getting Started Configuration Accessing Your Cluster Hypervisors Maintenance Hook Scripts Troubleshooting
Evaluating PG numbers
Getting the PG numbers right is crucial for a well balanced and maintainable cluster. Let's see how to calculate them manually:
PGs per OSD
If you look at the CRUSH map, it will tell you how many PGs each OSD has. We are aiming for ~ 100 PGs, but it's better to overshoot, rather than undershoot here. Just don't go too overboard, or you might end up with PGs stuck in peering.
Manual PG calculations
Replicated pools make calculating PG numbers a bit easier. We want to make sure all PGs are about the same size.
Let's say we have 3 pools and 1000 OSDs:
#pool raw total
replica_3 250TB (* 3 = 750 TB)
ec_4_2 500TB (* (1 / 4 * 2 + 1) = 750 TB)
ec_8_3 1PB (* (1 / 8 * 3 + 1) = 1375 TB)
Let's calculate PG size if each pool had a single PG:
#pool total / chunks PG Size
replica_3 750 TB / 3 = 250 TB
ec_4_2 750 TB / (4 + 2) = 125 TB
ec_8_3 1375 TB / (8 + 3) = 125 TB
Since we are aiming for 100 PGs per OSD adjusted for PG size we end up with:
- 50 PGs per OSD for replica_3
- 25 PGs per OSD for ec_4_2
- 25 PGs per OSD for ec_8_3
Our final PG numbers are:
#pool pgs * osds / chunks pg_num_ideal rounded to power of two
replica_3 50 * 1000 / 3 = 16666.6666666667 = 16384 PGs
ec_4_2 25 * 1000 / (4 + 2) = 4166.6666666667 = 4096 PGs
ec_8_3 25 * 1000 / (8 + 8) = 2272.7272727273 = 2048 PGs
Autoscaler
Ceph pools have a caveat of not staying the same size forever, so keeping pg_nums perfect is unrealistic. Ceph upstream has build the autoscaler which is designed to automatically adjust the pg_num
.
On it's own, the autoscalers recommendations are not ideal:
It starts resizing PGs whenever a pool shrinks or grows, causing a lot of data movement. That's why the croit default keeps the autoscaler in warn
mode. This means it will show recommendations as Ceph health warnings.
To improve it's recommendations, you have a few options:
- You can specify the pools target size:
ceph osd pool set $POOL_NAME target_size_bytes 100T
- You can specify the pools target ratio (takes precedence over target size):
ceph osd pool set mypool target_size_ratio 1.0
Ceph will warn you when the pool size is bigger than the target_size, so you'll know when to update the value.
You can check recommendations with:
ceph osd pool autoscale-status
Why can I not maintain the Autoscaler in croit?
We are working on it :)