Couple decades back, developers intending to build and test their applications on different operating systems (OS) or platforms would have been forced to buy multiple sets of hardware to do so. Some of them may have instead used the same set of hardware but set up complicated multi-boot sequences to be able to boot up to different operating system and development environments.
Over these couple decades, server processing power and capacity increased, but most applications did not need all these additional resources. Then, hardware virtualization software came along that runs on tops on physical servers to emulate a hardware system. This software allowed the creation of virtual machines (VM), each of which runs its own guest OS irrespective of the host OS on the physical server. Running multiple VMs on the same physical server helped make better use of the server’s resources and thus, cost savings. It also helped developers to quickly provision additional machines (VMs) as needed to develop and test their application.
Since each VM included a full blown system, it consumed lot of host system resources (gigabytes of memory and few gigabytes of storage). In general, VMs provide an environment with more resources than most applications need. Starting up a new VM is quicker and convenient than rebooting in to a new system (as with multi-boot systems) but it still took a few minutes.
Operating system (OS) virtualization allows virtual environments (commonly called containers) to run on top of the host OS - each of these containers shared the core component (kernel) of the host OS but had their own isolated user-space. These containers will look like real OS for the programs that run within these containers by packaging all (and only) the specific libraries and binaries that are needed for the program to run (making it lightweight). Compared to VMs, containers take up MBs to a few gigabytes of memory and storage and start up in a few seconds. Though the concept of containers have existed for a long period in Linux and Unix systems, Docker augmented it with additional features that made this technology customizatble and easy to use.
The Docker engine is the software that sits on top of the host OS and manages the creation (running) of containers. If the containers are the equivalent of VMs, then Docker is the equivalent of the virtualization software. The blueprint for each Docker container is a Docker image, which has to be specified when you create (run) a container. For those familiar with object oriented programming (OOP), a docker image and container is equivalent to that of a class and object (instance of a class), respectively. Just like in OOP, you could have multiple containers running simulataneously that are spawned off the same docker image.
A Docker image is made up of a series of filesystem layers representing instructions that make up an executable software application. An image is an immutable binary file including the application and all other dependencies such as libraries, binaries and instructions necessary for running the application.
Pre-built images are availabe in Docker registries. One of the most popular registry is Dockerhub . In addition to numerous images created by individuals, Dockerhub alsos hosts ‘official’ images for various popular open source software such as python, r-base, MySQL, Tomcat, etc. To begin experimenting with running and interacting with Docker containers, these images could serve as a good starting point. Some of the other common registries are Quay, Google container registry, and Amazon Elastic Container Registry.
Later, you will want to start building your own customized images to meet your specific needs. A new image is created from a ‘Dockerfile’, which is a text file listing out the sequence of the above-mentioned instructions that constitute the image. We will briefly go over this process later. For more information, there are plenty of in-depth Docker guides online and the official docker documentation is excellent. In addition, in most cases, you can also review the ‘Dockerfile’ of the images available in the registries.
But, before you could run a docker image and create a container, you need to install Docker engine on your system(the host).
Instructions are provided for Debian Stretch (Debian 9). Instructions for other Linux systems are similar and should be available online. In all the code segments words beginning with ‘AA..’ refer to those that have to be supplied by the user for their specific example. For instance ‘AAimage-name’ will refer to the specific Docker image that you will be applying the command to
Update the package list and upgrade all installed packages
sudo apt update
sudo apt upgrade
Install required dependencies to add new repositories over HTTPS (some of it may already be installed)
sudo apt install apt-transport-https ca-certificates curl software-properties-common gnupg2
Add the Docker’s GPG key to your apt keyring
curl -fsSL https://download.docker.com/linux/debian/gpg | sudo apt-key add -
Verify that you now have the key with the following fingerprint
9DC8 5822 9FC7 DD38 854A E2D8 8D81 803C 0EBF CD88,
by searching for the last 8 characters of the fingerprint.
sudo apt-key fingerprint 0EBFCD88
Add the Docker stable repository to the apt-repository list
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/debian stretch stable"
Once again, update the package list (with the newly added repository) and install docker community edition
sudo apt update
sudo apt install docker-ce
The docker service starts automatically once installed, To verify installation
sudo systemctl status docker
and/or
sudo docker -v
To run Docker commands without prepending sudo you’ll need to add your user to the docker group that is created during Docker installation.
sudo usermod -aG docker $USER
The default location for all images, storage space for running containers, storage volumes that you may create (to be discussed later), etc is /var/lib/docker. To change the default location for these, in case you have limited space in your /var partition, go through the following steps (assume the full path to your new location is /AAalt_loc).
stop the docker service
sudo service docker stop OR sudo systemctl stop docker
edit /etc/default/docker file to add environmental variables DOCKER_OPTS and DOCKER_TMPDIR
#DOCKER_OPTS="--dns 8.8.8.8 --dns 8.8.4.4"
DOCKER_OPTS="-g /AAalt_loc"
DOCKER_TMPDIR="/AAalt_loc/tmp"
Create the new location, if it does not exist
sudo /AAalt_loc
Backup existing files
sudo tar -zcC /var/lib/docker > /AAalt_loc/docker_backup-$(date +%s).tar.gz
Move the existing files to the new location
sudo mv /var/lib/docker /AAalt_loc
Create a symlink of the new location to the default location
sudo ln -s /home/shared/docker /var/lib/docker
Restart service
sudo service docker start OR sudo systemctl start docker
General syntax of a Docker command is
docker command [option] [subcommand] [arguments]
To get general info about the docker installation
docker info
One of the lines towards the end will be
Registry:
https://index.docker.io/v1/
The address corresponds to Dockerhub that I explained above and this lines denotes that Dockerhub is the default registry to pull (download) images. The default registry cannot be changed.
All images in Dockerhub are of the format
NAMESPACE/REPOSITORY:TAG
For ex: rocker/tidyverse:3.4.0. IMAGE_TAG is optional and by default is set to ‘latest’. Some images will have different tags corresponding to different versions such as ‘devel’, ‘testing’, or some kind of numerical sequence like ‘3.4.0’
To search for images in Dockerhub with specific keywords (for ex: python) in the name
docker search python
The keyword could include NAMESPACE or REPOSITORY or the Description of the repository
To download an image from Dockerhub on to your local system (TAG is always optional and will always default to ‘latest’, if not present)
docker pull NAMESPACE/REPOSITORY:TAG
To download an image not hosted on Dockerhub (private registries, locally hosted registries, etc.)
docker pull AAother_registry:portNumber/NAMESPACE/REPOSITORY:TAG
If this other registry requires authentication, you first have to login to this repository
docker login AAother_registry:portNumber
Another less commonly used option to obtain an image is a ‘tar’ file of the image that has been created by the same or an older version of docker. This option is used if you wish to share a docker image amongst multiple machines without having to go through Dockerhub or any other registries.
docker load -i image-name.tar
To list all the images available on your local system do
docker images OR docker image ls
To delete a specific image (to reclaim storage space on your host system, for instance)
docker image rm NAMESPACE/REPOSITORY:TAG
To create (run) a container from an image (for instance, ‘AAimage_name’), the basic command is
docker run AAimage_name OR docker container run AAimage_name
Most practical docker images will start a process and the output of this process will be displayed in the same terminal. One such example will be an image that is set up to start a web server.
If the image is not set up to run a process, then, you may wish it run it in interactive mode and run various commands within the container. In this case, you will create(run) a container with the ‘-it’ option (interactive, start a termina)
docker run -it AAimage_name
and will be returned with a terminal prompt inside the running container, where you can run commands. Once you exit this terminal prompt, the container will stop.
If the image is set up to run a process, but you do not wish to see the process output displayed on the screen, you will run (create) the container in ‘detached’ mode using the ‘-d’ option.
docker run -d AAimage_name
Once the container has finished executing the process that got started with the container, it will stop running.
To find the list of running containers
docker ps
OR
docker container ls
To find the list of stopped containers
docker ps -a
OR
docker container ls -a
Another common option to use when creating a container is ‘–name’, which gives you an option to provide an easy to identify name for the container (for instance, ‘AAidentifiable_name’) If not, you will have to use a system generated random name, or a jumble of random alpha numeric characters that serve as the container UUID, if you wish to interact with the container.
docker run --name=AAidentifiable_name -d image_name
For example, to stop a running container,
docker container stop AAidentifiable_name
Similarly, to re-start a stopped container,
docker container start AAidentifiable_name
To remove a stopped container
docker container rm AAidentifiable_name
To automatically remvoe the container once it stops, use the ‘–rm’ option when you create a container
docker run --name=AAidentifiable_name -d --rm AAimage_name
In addition to the above two methods of running a process within a container (either in interactive mode or if the process is built in to the image), you could also specify it (for instance ‘AArun_some_process’) when you run the container
docker run --name=AAidentifiable_name -d --rm AAimage_name AArun_some_process
docker run --name=AAidentifiable_name -d --rm AAimage_name /bin/bash -c "AArun_some_shell_command"
In this case, the container will stop (and get removed) once the process finished executing.
To run additional processes on running containers
docker exec -d AAidentifiable_name AArun_some_additional_process
docker exec -it AAidentifiable_name
In the second case, you will be returned with a terminal prompt inside the running container, where you can run additional commands (for instance, if you wish to monitor any processes that may have been running in the container).
To get statistics on the resources that are being used by running containers,
docker stats
When you create (run) a container, additional options to limit the resources (CPUs, memory, etc.) used by container could be set. For a full list,
docker run --help
Similarly, some of these options could be updated for a running container. For the syntax and complete list of options,
docker update --help
Most practical Docker images are set up to run a service (for instance, a web server) on a specific port when a container is created with this image. To be able to access this port within the container from the host machine, a user defined port on the host machine is bound (mapped) to that of the container. This is done by using the “-p” option when creating (running) the container
docker run --name=AAidentifiable_name -d --rm -p 1234:4321 AAimage_name
Here the service in the image is set up to run on 4321. On the host computer, once the container is created, this service will be accessed on port 1234. For example, if this service was a web-server, then, the web-server can be accessed on the host computer by going to ‘ http://localhost:1234' in any browser (or it will be ‘ https://localhost:1234' , depending on the web-server). Multiple such ports could be mapped with multiple ‘-p’ options
By default, all data created inside a container (such as, files, are modifications to files) do not persist, once the container does not exist (once the container is stopped and removed). In other words, this data will not be accessible by other containers (or by the host system). There are two options for containers to store files on the host machine. On Linux hosts, there is an additional option too. There is an excellent comparison of these options in the official Docker documentation at https://docs.docker.com/storage/
docker run --name=AAidentifiable_name -d --rm -p 1234:4321 -v /AAhost_folder:/AAcontainer_folder AAimage_name
For ex: we can use the current working folder on the host machine
docker run --name=AAidentifiable_name -d --rm -p 1234:4321 -v $PWD:/AAcontainer_folder AAimage_name
docker volume create AAvolume_name
where ‘AAvolume_name’ is user specified. These are are stored in a part of the host filesystem managed by Docker (/var/lib/docker/volumes OR /alt_loc/volumes, on Linux). Non-docker processes on the host system will not easily modify this part of the filesystem and the data in the storage volume can be accessed by all containers in this host by adding the ‘–mount’ option when creating the container
docker run --name=identifiable_name -d --rm -p 1234:4321 --mount src=volume_name,dst=/container_folder image_name
As is evident, this option is safer to the host system. It should be the preferred method for transferring data between containers, especially if we do not need direct access to the data on the host system.
If necessary, we could combine these by having both a bind mount and a storage volume mounted to 2 different locations on the container (Docker does not allow the same destination folder) when the container is created. Then, we can transfer files between these two destination folder within the container. This could be a good option to periodically do a backup of the data being created, modified, and shared between the containers (for instance, data in a containerized database application).
docker run --name=identifiable_name -d --rm -p 1234:4321 -v /host_folder:/container_folder1 --mount src=volume_name,dst=/container_folder2 image_name
If we need to run a X11 based GUI application from within the container and we have a X-server running on the host (typically the case in Linux systems), we could achieve these by bind mounting specific folders on the host system to the corresponding folders in the container and specifying the DISPLAY variable in the container to point to that of the host system. In this case, the GUI application in the container could only be executed after the container has been created and hence, cannot be part of the image.
docker run --name=identifiable_name -d --rm --net=host -e DISPLAY -v "$HOME/.Xauthority:/root/.Xauthority:rw" -p 1234:4321 -v /host_folder:/container_folder1 --mount src=volume_name,dst=/container_folder2 image_name GUI_appl
Being able to create (run) containers out of pre-built standardized images provides an opportunity to experiment with different software without the complexities of installing (and uninstalling) ensuring all dependencies are met, and compatibility with other existing components, etc. More importantly, it provides us a means to orchestrate the startup and shutdown of multiple containers to create an application. Using docker compose and swarms (not covered here, but discussed very clearly in docker documentation , such an application could be scaled very easily by starting (and shuttind down) multiple containers on the same or multiple hosts. This could be made more efficient with the ability to easily create your own specialized images geared towards the specific needs of an application.
Ideally, each image - to be more specific, the container that runs off a specific image - is configured to perform a very specific task (or service) of your application. To create a Docker image, first create a Dockerfile with a series of instructions that represent the commands a user could all on the command line to assemble an image. Then, a docker image can be built by running the following command in the same folder as the Dockerfile
docker build -t NAMESPACE/REPOSITORY .
(The NAMESPACE will not have any meaning at present, but we will see later how it could become relevant when we plan to load this image on to a registy such as Dockerhub)
Review the Dockefile for anandvl/octave for a very basic example.
The ‘FROM’ instruction denotes the ‘base’ image on which additional components are added to build the final image. The ‘base’ image will be one of the pre-built images already available in Dockerhub (or locally, on your host system). A Dockerfile must always start with this instruction
The ‘MAINTAINER’ instruction is more of a information than a requirements
The ‘ENV’ instruction sets environment variables that will be needed for other parts of the image to work correctly
The RUN instruction will execute any commands in a new layer on top of the current image and commit the results. The resulting committed image will be used for the next instruction in the Dockerfile. In this example, a new repository location is added to the package list and additional software (octave) is installed from this repository location
For a Dockerfile with some additional steps, review the Dockerfile at anandvl/anaconda
The ‘EXPOSE’ instruction denotes the port numbers that are exposed by the container and these could be mapped to ports on the host using the command line options for ‘docker run’ as explained above
The ‘CMD’ instruction sets the executable to run when creating (running) a container from this image. The preferred form for this is CMD [“executable”,”param1”,”param2”]. If there are multiple CMD instructions, only the last one will take effect.
One way to make sure that the entire image will get built correctly and work as intended once you build it using Dockerfile is to start a container with the base image with an interactive shell (‘-it’ option). Then, start going through each of the instruction in the Docker file.
Once you built the image using the ‘docker build..’ command described above, you can start creating (running) a container with it on your host system. You could also use this as a base image for building more complex images. To share the image with others, you could either
host it to Dockerhub so that others could pull the image and use it
Create an account on
Dockerhub
. Create a new repository with a namespace (for free accounts, you are allowed only one namespace, which is typically, your username), repository name (image name), short and long description of the image. You can chose your repository to be public or private - free accounts are allowed only one private repository. You can load multiple images with diferent ‘TAG’s in to same repository.
On the host system,
login to
Dockerhub
from the command line
docker login
Set docker user
export DOCKER_ID_USER="AADockerhub_userid"
Create a tag (TAG) for the image if you wish.
docker tag NAMESPACE/REPOSITORY NAMESPACE/REPOSITORY:TAG
push it to Dockerhub .
docker push NAMESPACE/REPOSITORY:TAG
docker image save -o image-name.tar NAMESPACE/REPOSITORY
On the target system (one-time)
docker load -i image-name.tar
upload the Dockerfile to a github repository and link your dockerhub account to the github repository