A CEPH Sandbox: How to Install CEPH on LXC

·

, , , ,

CEPH is a free and opensource object storage system. This article describes the basic terminology, installation and configuration parameters required to build your own CEPH environment.

CEPH has been designed to be a distributed storage system which is highly fault tolerant, scalable and configurable. It can run on large number of distributed commodity hardware thus eliminating the need for very large central storage solutions.

This post aims to be a basic, short and self contained article that explains all the key details to understand and play with CEPH.

Environment

I’ve setup CEPH on my laptop with a few LXC containers. My setup has –

  • Host: Ubuntu 16.04 xenial 64-bit OS
  • LXC version: 2.0.6
  • 40G disk partition that is free to use for this experiement.
  • CEPH Release – Jewel

The Ubuntu site has documentation on LXC configuration here. But this site talks about creating unprivileged containers. However, we’ll need “privileged” containers. Privileged containers are’t considered secure, since processes are mapped to root user on the host. Hence it has been used only for the purpose of experimenting.

See documentation on LXC site here to understand how to create privileged LXC containers.

Now, create these containers (as privileged) with names like these –

  • cephadmin
  • cephmon
  • cephosd0
  • cephosd1
  • cephosd2
  • cephradosgw

Here’s a command to create a container with Ubuntu Xenial 64-bit container. You could choose another distro, in which case some of the instructions might not be applicable. Using the names above run the command to create each container –

host:~# lxc-create -n <container-name> -t download -- -d ubuntu -r xenial -a amd64

Though CEPH installations talk about private and public networks. For testing purposes, the ‘lxcbr0’ available from LXC is sufficient to successfully work with CEPH.

Start the containers and install SSH servers on all of them –

host:~# lxc-start -n <container-name>
host:~# lxc-attach -n <container-name>
container:~# apt-get install openssh-server
container:~# exit

Repeat the above commands for all the containers you’ve created. Re-start all the containers. Register the container-names into your /etc/hosts for easily being able to login to them.

Disk Setup

On the 40G disk partition you’ve allocated for this exercise perform –

  1. Delete the existing partition
  2. Create 3 new partitions
  3. Format each of them with ‘XFS’ filesystem.

CEPH recommends using XFS / BTRFS / EXT4. I’ve used XFS in my tests. I tried with EXT4 but received warnings related to limited xattr sizes while CEPH was being deployed.

Note down the device major and minor numbers for the partitions you’ve created from the above steps.

Make each of the partition you’ve created available to each of the cephosd0, cephosd1 and cephosd2 containers. If the above operation resulted in ‘/dev/sda8’, ‘/dev/sda9’ and ‘/dev/sda10’ devices. Assign each of the device major and minor number to a corresponding cephosd<N> container. To do that, you’ll need to edit the LXC configuration file for each of the cephosd* container by performing the steps –

  • Login to each cephosd<N> node, and run
     container:~# mkdir -p /mnt/xfsdisk
  • Stop your cephosd containers (cephosd0, cephosd1, cephosd2) –
     host:~# lxc-stop -n cephosd<N>
  • On the host create corresponding ‘fstab’ file for each container. You’d assign /dev/sda8 to cephosd0, /dev/sda9 to cephosd1 and so on. The syntax of fstab would have lines like this depending on how many disks / partitions you intend to share –
/dev/sda<N> <mount-point-in-container> <file-system> <options>

Example –

/dev/sda8 mnt/xfsdisk xfs noatime 0 0

IMPORTANT NOTE

The <mount-point-in-container> has no preceding forward slash. The preceding slash is not required. Refer this post on askubuntu.com for more details.

  • For each of the cephosd<N> node edit the configuration file
host:~# cd /var/lib/lxc
host:~# cd cephosd<N>

Edit the container configuration file ‘config’. The lines below gives the container permission to access the device inside LXC and mount it to the mount point defined in the fstab file. Add the following lines –

lxc.cgroup.devices.allow = b <maj>:<min> rwm
lxc.aa_profile = unconfined
lxc.mount = /var/lib/lxc/cephosd<N>/fstab

Now, start/restart the containers. With the above set of steps we complete the creation of containers ready for us to install CEPH.

CEPH Installation

Complete the pre-flight steps on the CEPH quick install from here. The steps in pre-flight –

  • Setup the ‘cephadmin’ node with the ceph-deploy package.
  • Installs ‘ntp’ on all the nodes (required where OSD or MON run).
  • Create a common ceph deploy user (password-less ssh sudo access) that will be used for CEPH installation on all nodes.

After the pre-flight steps are complete check –

  • To ensure the password-less access works from ‘cephadmin’ to all the other nodes on your cluster via the ceph deploy user you have created.
  • The XFS partition created earlier is available on all OSDs under /mnt/xfsdisk.

Next, you need to complete the CEPH deployment. The steps with all the illustrations are available here. To get a good understanding of each command read the description from the link. Since our experiement has 3 OSDs and 3 monitor daemons; You could run these commands from the ‘cephadmin’ node using the ceph deploy user created in pre-flight –

  • Create a directory and run the commands below from within it.
  • Designate following nodes as CEPH Monitors
 cephadmin:~$ /usr/bin/ceph-deploy new cephmon cephosd0 cephosd1
  • Install CEPH on all nodes including admin node.
cephadmin:~$ /usr/bin/ceph-deploy install cephadmin cephmon cephosd0 cephosd1 cephosd2 cephradosgw
  • Add the initial monitors and gather their keys
cephadmin~:$ /usr/bin/ceph-deploy mon create-initial
  • Prepare the disk on each OSD
cephadmin~:$ /usr/bin/ceph-deploy osd prepare cephosd0:/mnt/xfsdisk cephosd1:/mnt/xfsdisk cephosd2:/mnt/xfsdisk
  • Activate each OSD
cephadmin~:$ /usr/bin/ceph-deploy osd activate cephosd0:/mnt/xfsdisk cephosd1:/mnt/xfsdisk cephosd2:/mnt/xfsdisk
  • Run this command to ensure cephadmin can perform administrative activities on all your nodes on the CEPH cluster
cephadmin~:$ /usr/bin/ceph-deploy admin cephadmin cephmon cephosd0 cephosd1 cephosd2 cephradosgw
  • Check health of cluster (You must see HEALTH_OK) if you’ve performed all the steps correctly.
cephadmin~:$ ceph health
  • Install and deploy the instance of RADOS gateway
cephadmin~:$ /usr/bin/ceph-deploy install --rgw cephradosgw
cephadmin~:$ /usr/bin/ceph-deploy rgw create cephradosgw

NOTE

If you haven’t configured the OSD daemons to start automatically via upstart, they won’t after a LXC startup/restart. And running ‘ceph -s’ or ‘ceph -w’ or ‘ceph health’ on admin node shows HEALTH_ERR and degraded cluster since the OSDs are down. In such a case, manually login to each OSD and start the OSD instance with the command sudo systemctl start ceph-osd@<instance>; <instance> in our case is 0, 1, or 2.

With the above steps, the cluster must be up with all the PGs showing ‘active + clean’ status;

Access data on cluster

To run these commands you’ll need a subset ceph.conf with monitor node information and admin keyring file. For test purposes you could run it on the cephadmin node under the cluster directory you’ve created in the first step of CEPH installation section.

To list all available data pools

cephadmin~:$ rados lspools

To create a pool

cephadmin~:$ rados mkpool <pool-name>

To insert data in a file into a pool

cephadmin~:$ rados put {object-name} {file-path} --pool=<pool-name>

To list all objects in a pool

cephadmin~:$ rados -p <pool-name> ls

To get a data object from the pool

cephadmin~:$ rados get {object-name} {outfile-path} --pool=<pool-name>

One response to “A CEPH Sandbox: How to Install CEPH on LXC”

  1. Greg Zavertnik Avatar

    Great article but could you use a volume mounted to “/opt/btrfsdisk” instead of creating a partition on the host device and mounting the partition into the container? That way the default storage backend for LXD could be used and managed as a single vmdk. Or would it be better to create separate vmdks and mount to the host, then attach those devices to the ceph-osd containers? I plan to use erasure coding so I expect a minimum of 6 OSD’s. Just trying to manage the files on the virtualization host.

Leave a comment