We are thrilled to unveil the new CephFS Snapdiff feature, a nice advancement in the realm of CephFS backup creation. It’s a powerful tool that provides users with the ability to compare two snapshots of a Ceph File System (CephFS) and view all changes made to the files on the file system between those snapshots. This enables efficient backups of large file systems without the need for a full scan. By comparing snapshots, users or applications can quickly identify the changes made since the last backup and only transfer the modified files, significantly reducing the time and resources required for backups. This makes it possible to perform more frequent backups and ensures that data can be restored in a timely manner in the event of data loss or corruption.
Developed by engineers from croit GmbH, this new feature will arrive in the next release of Ceph storage named Squid.
The SnapDiff functionality can be accessed using the libcephfs API, which is available in both C++ and Python programming languages.
Here, we make use of the Python libcephfs bindings to illustrate how to deploy the snapdiff feature within your application.
The following method instantiates, mounts and returns libcephfs object used to access CephFS API.
import cephfs as libcephfs def init_cephfs(): cephfs = libcephfs.LibCephFS(conffile='') cephfs.mount() return cephfs
The next code snippet implements functions designed to generate a sample file set. This file set will serve as the foundation for illustrating the functionality of the SnapDiff feature. It encompasses a collection of files and folders, displaying variations between different snapshots of the file system.
def write_file(cephfs, filename, data): fd = cephfs.open(filename, 'w', 0o755) cephfs.write(fd, data, 0) cephfs.close(fd) def prepare_fs(cephfs, testdir, snap1name, snap2name): print("Preparing...") cephfs.mkdir(testdir, 0o755) write_file(cephfs, testdir + b'/file-modified', b"1111") write_file(cephfs, testdir + b'/file-removed', b"2222") write_file(cephfs, testdir + b'/file-untouched', b"3333") cephfs.mkdir(testdir + b'/dir1', 0o755) write_file(cephfs, testdir + b'/dir1/file-modified', b"d1_1111") write_file(cephfs, testdir + b'/dir1/file-removed', b"d1_2222") write_file(cephfs, testdir + b'/dir1/file-untouched', b"d1_3333") cephfs.mkdir(testdir + b'/dir-removed', 0o755) write_file(cephfs, testdir + b'/dir-removed/file-removed', b"dremoved_5555") cephfs.mkdir(testdir + b'/dir-untouched', 0o755) write_file(cephfs, testdir + b'/dir-untouched/file', b"dir-untouched_file_content") # Create the first snapshot cephfs.mksnap(testdir, snap1name, 0o755) write_file(cephfs, testdir + b'/file-modified', b"1111+1") write_file(cephfs, testdir + b'/dir1/file-modified', b"d1_1111_1") cephfs.unlink(testdir + b'/file-removed') cephfs.unlink(testdir + b'/dir1/file-removed') cephfs.unlink(testdir + b'/dir-removed/file-removed') cephfs.rmdir(testdir + b'/dir-removed') write_file(cephfs, testdir + b'/file-new', b"6666") write_file(cephfs, testdir + b'/dir1/file-new', b"d1_6666") # Create the second snapshot cephfs.mksnap(testdir, snap2name, 0o755)
Let’s proceed by implementing a recursive function that facilitates folder traversal and prints the changes (delta) between the two snapshots. Both snapshot names and relevant snapshot identifiers to be provided as parameters. The function initializes the ‘snapdiff’ object using ‘opensnapdiff()’ call. This call requires the snapshots’ root path, the path of the specific subfolder under consideration, and the names of the two snapshots for which the delta needs to be built. The SnapDiff content is retrieved using ‘snapdiff.readdir()’ call, which operates in a pretty similar manner to the standard directory reading process. However, there are a few nuances to take into account:
The function exclusively returns updated or removed files, omitting unmodified files. When an updated file is encountered, it is labeled with the identifier of the more recent snapshot. Conversely, a removed file is marked with the identifier of the older snapshot.
Similarly, when dealing with directories, a removed directory is associated with the identifier of the older snapshot, while an updated directory is linked to the identifier of the newer snapshot. The key distinction lies in unmodified directories; they are not skipped and are labeled in the same manner as updated directories. Consequently, the traversal function must encompass these unmodified directories, leading to the exploration of all subdirectories, although they do not yield any file entries.
def snapdiff_traversal(cephfs, path0, subpath, snap1name, snap1id, snap2name, snap2id): if (not subpath.endswith(b"/")): subpath = subpath + b"/" path = path0 + subpath diff = cephfs.opensnapdiff(path0, subpath, snap1name, snap2name) cnt = 0 e = diff.readdir() while e is not None: if (e.d_name != b"." and e.d_name != b".."): cnt = cnt + 1 status = " " if (e.d_snapid == snap1id) : status = " " print (">>> ", path + e.d_name, status) if (e.is_dir()): cnt += snapdiff_traversal(cephfs, path0, subpath + e.d_name, snap1name, snap1id, snap2name, snap2id) e = diff.readdir() diff.close() return cnt
To achieve our goal, we invoke the ‘snapdiff_traversal()’ function on the root folder of the snapshots. It’s important to highlight that within this function, we make use of the ‘cephfs_snap_info()’ call on a snapshot path to acquire the relevant snapshot identifier.
def test_snapdiff(cephfs, testdir, snap1name, snap2name): print("Running SnapDiff...") snap1id = cephfs.snap_info(testdir + b"/.snap/" + snap1name)['id'] snap2id = cephfs.snap_info(testdir + b"/.snap/" + snap2name)['id'] cnt = snapdiff_traversal(cephfs, testdir, b"/", snap1name, snap1id, snap2name, snap2id) print ("SnapDiff completed, found entries=", cnt) assert cnt == 10
Lastly, we have the ‘main()’ function coupled with another recursive ‘purge_dir()’ helper function for easier filesystem’s state reset.
def purge_dir(cephfs, path, is_snap = False): try: d = cephfs.opendir(path) except: return if (not path.endswith(b"/")): path = path + b"/" dent = cephfs.readdir(d) while dent: if (dent.d_name not in [b".", b".."]): if dent.is_dir(): if (not is_snap): try: snappath = path + dent.d_name + b"/.snap" cephfs.stat(snappath) purge_dir(cephfs, snappath, True) except: pass purge_dir(cephfs, path + dent.d_name, False) else: cephfs.rmsnap(path, dent.d_name); else: cephfs.unlink(path + dent.d_name) dent = cephfs.readdir(d) cephfs.closedir(d) if not is_snap: try: snappath = path + b"/.snap" cephfs.stat(snappath) purge_dir(cephfs, snappath, True) except: pass cephfs.rmdir(path) def main(): print("Setting up...") cephfs = init_cephfs() purge_dir(cephfs, b"/snapdiff_test") prepare_fs(cephfs, b"/snapdiff_test", b"/snap1", b"/snap2") test_snapdiff(cephfs, b"/snapdiff_test", b"/snap1", b"/snap2") print("Tearing down...") purge_dir(cephfs, b"/snapdiff_test") cephfs.shutdown() print("Completed.") main()
The script’s execution produces the following output:
SnapDiff completed, found entries= 10
As evident from the output, it includes all the directory entries, even those that remain unmodified. However, there are no unchanged file entries provided.
Want to maximize your storage potential? Contact us today to discover how croit’s storage platform can help.