SPDX-License-Identifier: Apache-2.0
Copyright (c) 2019-2021 Intel Corporation
Rook-Ceph
Overview
Edge applications based on Kubernetes* need to store persistent data, Kubernetes provides a persistent volume provisioning based on local disk directly attached to a single node. This solution has several shortcomings:
- Dynamic provisioning is not supported. Dynamic provisioning allows to automatically provision storage when it is requested by users.
- No data reliability. Only one copy of user data is stored in local node, this causes a single point of failure issue.
Ceph storage is introduced to resolve the problems, to integrate Ceph storage into Kubernetes cluster, we propose Rook-Ceph as storage orchestration solution.
Overview of Ceph
Ceph is an open-source, highly scalable, distributed storage solution for block storage, shared filesystems, and object storage with years of production deployments. It was born in 2003 as an outcome of Sage Weil’s doctoral dissertation and then released in 2006 under the LGPL license.
Ceph is a prime example of distributed storage, a distributed storage system is composed of many server nodes interconnected through a network to provide storage services externally as a whole. The benefit of distributed storage are:
- Large capacity and easy expansion
- High reliability. No single point of failure, data security and business continuity are guaranteed.
- High performance. Data shared and enable concurrent access.
- Data management and more features support. The storage cluster can be managed and configured easily, also provide functions such as data encryption/decryption and data deduplication, etc.
Ceph consists of several components:
- MON (Ceph Monitors) are responsible for forming cluster quorums. All the cluster nodes report to MON and share information about every change in their state.
- OSD (Ceph Object Store Devices) are responsible for storing objects and providing access to them over the network.
- MGR (Ceph Manager) provides additional monitoring and interfaces to external management systems.
- RADOS (Reliable Autonomic Distributed Object Stores) is the core of Ceph cluster. RADOS ensures that stored data always remains consistent with data replication, failure detection, and recovery among others.
- LibRADOS is the library used to gain access to RADOS. With support for several programing languages, LibRADOS provides a native interface for RADOS as well as a base for other high-level services, such as RBD, RGW, and CephFS.
- RBD (RADOS Block Device), which are now known as the Ceph block device, provide persistent block storage, which is thin-provisioned, resizable, and stores data striped over multiple OSDs.
- RGW (RADOS Gateway) is an interface that provides object storage service. It uses libRGW (the RGW library) and libRADOS to establish connections with the Ceph object storage among applications. RGW provides RESTful APIs that are compatible with Amazon* S3 and OpenStack* Swift.
- CephFS is the Ceph Filesystem that provides a POSIX-compliant filesystem. CephFS uses the Ceph cluster to store user data.
- MDS (Ceph Metadata Server) keeps track of file hierarchy and stores metadata only for CephFS.
Overview of Rook operator and CSI
Rook operator is an open-source cloud-native storage orchestrator that transforms storage software into self-managing, self-scaling, and self-healing storage services.
Rook-Ceph is the Rook operator for Ceph, it starts and monitors Ceph daemon pods, such as MONs, OSDs, MGR, and others, Rook-Ceph also monitors the daemon to ensure the cluster is healthy. Ceph MONs are started or failed over when necessary. Other adjustments are made as the cluster grows or shrinks.
Just like native Ceph, Rook-Ceph provides block, filesystem, and object storage for applications.
- Ceph CSI (Container Storage Interface) is a standard for exposing arbitrary block and file storage systems to containerized workloads on Container Orchestration Systems like Kubernetes. Ceph CSI is integrated with Rook and enables two scenarios:
- RBD (block storage): This driver is optimized for RWO pod access where only one pod may access the storage.
- CephFS (filesystem): This driver allows for RWX with one or more pods accessing the same storage.
- For object storage, Rook supports the creation of new buckets and access to existing buckets via two custom resources: Object Bucket Claim (OBC) and Object Bucket (OB). Applications can access the objects via RGW.
Rook-Ceph configuration and usage
To deploy the Rook-Ceph operator and Ceph to the Intel® Smart Edge Open cluster, rook_ceph_enabled: True
must be set in /inventory/default/group_vars/all/10-default.yml
, which is disabled by default. This will perform Rook operator and Ceph daemons install and deploy, images for Rook operator and Ceph daemon with CSI plugins are download from public docker repository.
Note: For Rook-Ceph to work it is expected that two SSDs are present in the platform.
Configuration
Rook-Ceph operator provide custom resource for CephCluster to configure Ceph parameters by user.
Pre-requisites
Currently, the deployment searches for any available raw drive in the node and then consumes it for Ceph. If there are 2 drives, 1 with OS and another drive with partitions then the second drive with partitions will be wiped out and consumed by Ceph. Also the deployment will wipeout if a Ceph OSD exists and will create a new Ceph OSD in the 2nd drive. Note that, creating paritions on the drive after deploying Ceph has to be avoided. If the user wants to manually prepare the drive then he can follow the steps to reset the drive to a usable state as described in : https://rook.io/docs/rook/v1.7/ceph-teardown.html#zapping-devices
Ceph Configuration
Specifies Ceph configuration in inventory/default/group_vars/all/10-default.yml
:
rook_ceph_enabled: True
rook_ceph:
mon_count: 1
host_name: ""
osds_per_device: "1"
replica_pool_size: 1
Ceph Resource Restraint
Since Rook-Ceph daemons deploy in Edge Node, considering that the resource consumption of the ceph daemon may affect the operation of the edge application, the default value of resource restraint for Ceph daemons is defined in roles/kubernetes/rook_ceph/defaults/main.yml
.
_rook_ceph_limits:
api:
requests:
cpu: "500m"
memory: "512Mi"
limits:
cpu: "500m"
memory: "512Mi"
mgr:
requests:
cpu: "500m"
memory: "512Mi"
limits:
cpu: "500m"
memory: "512Mi"
mon:
requests:
cpu: "500m"
memory: "512Mi"
limits:
cpu: "500m"
memory: "512Mi"
osd:
requests:
cpu: "500m"
memory: "4Gi"
limits:
cpu: "500m"
memory: "4Gi"
Verify the Ceph OSD creation on the node
The user can do the following to verify the Ceph creation on the node by running
kubectl get pods -n rook-ceph
NAME READY STATUS RESTARTS AGE
csi-cephfsplugin-provisioner-689686b44-rwzpc 6/6 Running 0 40m
csi-cephfsplugin-zv7hk 3/3 Running 0 40m
csi-rbdplugin-fn56f 3/3 Running 0 40m
csi-rbdplugin-provisioner-5775fb866b-dpph4 6/6 Running 0 40m
rook-ceph-crashcollector-silpixa00401187-f98464b49-4pvk4 1/1 Running 0 40m
rook-ceph-mgr-a-7cb77f9c6c-lx7tl 1/1 Running 0 40m
rook-ceph-mon-a-6d647c9546-bpp9r 1/1 Running 0 40m
rook-ceph-operator-54655cf4cd-kjbm8 1/1 Running 0 40m
rook-ceph-osd-0-7b45dfcc77-5nc95 1/1 Running 0 40m
rook-ceph-osd-prepare-silpixa00401187--1-5rs2b 0/1 Completed 0 40m
rook-ceph-tools-54474cfc96-xrvzw 1/1 Running 0 40m
Usage
After deployment of Rook operator and Ceph daemons, there is an example for VM application using PVC (Ceph block storage) provision by Ceph-CSI.
PVC VM application
Create a PVC using the following yaml file
- The PVC yaml file sample-pvc.yaml contains
apiVersion: cdi.kubevirt.io/v1beta1 kind: DataVolume metadata: name: ubuntu-reg-dv spec: source: registry: url: "docker://tedezed/ubuntu-container-disk:20.0" pvc: storageClassName: rook-ceph-block accessModes: - ReadWriteOnce resources: requests: storage: 10Gi
- Apply the PVC
kubectl apply -f sample-pvc.yaml
- Check that the PVC exists - it will take some time to import the image
kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE ubuntu-reg-dv Bound pvc-86050fce-e695-4586-b800-7bc477d9eb03 10Gi RWO rook-ceph-block 9s ubuntu-reg-dv-scratch Bound pvc-702e606b-3c0e-4508-b925-412aee318414 10Gi RWO rook-ceph-block 9s # kubectl get pods NAME READY STATUS RESTARTS AGE importer-ubuntu-reg-dv 1/1 Running 0 54s # kubectl get pvc # ubuntu-reg-dv-scratch goes away after it completes image upload NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE ubuntu-reg-dv Bound pvc-86050fce-e695-4586-b800-7bc477d9eb03 10Gi RWO rook-ceph-block 9s
- Create a yaml file for deploying the VM
apiVersion: kubevirt.io/v1alpha3 kind: VirtualMachine metadata: name: testvmceph spec: running: true template: metadata: labels: kubevirt.io/size: small kubevirt.io/domain: testvmceph spec: domain: cpu: cores: 2 devices: disks: - disk: bus: virtio name: containervolume - disk: bus: virtio name: cloudinitvolume interfaces: - name: default bridge: {} resources: requests: memory: 4096M networks: - name: default pod: {} volumes: - name: containervolume persistentVolumeClaim: claimName: ubuntu-reg-dv - name: cloudinitvolume cloudInitNoCloud: userData: |- #cloud-config chpasswd: list: | debian:debian root:toor expire: False
- Deploy the VM
kubectl apply -f sample-vm.yaml
- Check that the VM is running
# kubectl get vm testvmceph 3s Starting False # kubectl get vmi testvmceph 45s Running 10.245.225.34 silpixa00400489 True
- Log in to the VM and create a file
# kubectl virt console testvmceph (root toor) # touch myfileishere.txt
- Remove the VM
# kubectl delete vm testvmceph
- Re-deploy the VM
# kubectl apply -f vm.yaml
- Log in to the VM and check if the file exists
# kubectl console vm testvmceph (root toor) # ls
Limitations
There is a limitation for Rook-Ceph ansible role implementation in Smart Edge Open cluster, currently there is only support for single node and single disk deployment for Ceph. So by default, you don’t need to set replica_pool_size > 1.
Reference
For further details:
- Rook github: https://github.com/rook/rook
- Rook Introduction: https://01.org/kubernetes/blogs/tingjie/2020/introduction-cloud-native-storage-orchestrator-rook
- Kubernetes CSI: https://kubernetes-csi.github.io/docs/