Introduction to Docker: A Hands on Guide
Docker, since its release in 2013, has taken the world of DevOPS by storm. In this article, we will take a deeper dive of Docker, that is now synonymous with "container" technology.
By reading this article, you will not only compare VM vs Docker, but also get a hands on introduction to Docker. Containerization Technology adds tremendous value to the current software landscape. Comparing and contrasting docker with the well understood world of VMs, helps bring a better understanding of docker and its advantages and disadvantages over VM. In the article, I delved into the various terminologies and the "how tos" of docker as well as described the underlying technology that enables it.
The below hands-on description allows a deeper understanding and demonstration of its simplified and varied use in real-life situations. The hands-on part includes: building a docker image, running the container, as well as advanced options of creating volumes to share data. To better enhance use of the Docker system, I have included Frequently Used Docker commands to assist those starting out in exploring this new technology. I hope that by reading this article that everyone is able to get an idea of Docker technology, its ease of use, and the advantages it brings to the current technology landscape.
A ‘Container’ is an abstraction layer above the Operating System(OS), that hides the underlying differences between OS distributions. An example would be comparing the Centos 7 vs Centos 8 vs Ubuntu operating systems. Containers provide a way to package an application and all of its dependencies: Libraries, Binaries and Configurations into a single image. Furthermore, this container ‘Image’ can be run in the containerization environment, thus providing environmental consistency across development, test and production lifecycle. There are several customer deployment problems in which this technology can solve such as : Postgres database version mismatch between the company's application and the customer installed version ; Application failing to start due to "java classpath error" that is seen at customer side and not seen during development testing; and a dependency used by CartoDB ( a mapping library) missing in customer environment. The value of Containerization is immediately realized in these cases, as it eliminates the issues due to inconsistency in the environment between development and production.
Docker is a client-server software program that enables packaging an application and its dependencies into an ‘image’ , a virtual container. The second part of Docker consists of an Operating System level virtualization layer, that enables the execution of the docker ‘images’. Docker started out providing an environment where these containers could run on any Linux server, and then, later, support for Windows server was added. Docker, the eponymous company, owns the software and releases newer versions of Docker software. There are two products offered - Docker Community Edition (CE) and Docker Enterprise Edition (EE). The EE version is a paid subscription version with additional features compared to the free CE edition of docker.
Understanding the ins and outs of Virtual Machine software enables an enhanced understanding of Docker. Virtual Machines (VMs) provide an abstraction layer over the physical hardware allowing loading of multiple instances of an OS and applications on the same physical machine. Multiple VMs can run simultaneously on the same physical hardware. On these hardware machines, another software called Hypervisor is run first, which manages the multiple instances of VMs. Each virtual machine provides its own virtual hardware, including CPUs, memory, hard drives, network interfaces, and other devices. The virtual hardware is then mapped to the real hardware on the physical machine. This saves costs by reducing the need for physical hardware systems, along with the associated maintenance costs that go with it, as well as reduces power and cooling demand.
Docker vs. Virtual Machines:
There are several notable differences between Docker and Virtual Machine. A physical server running three VMs would have a hypervisor and three separate OS’s running on top of it. Docker, on the other hand, runs a single instance of OS, The OS is shared across docker containers, which are read-only. This makes the containers lightweight and leads to better resource utilization. The figure below from the Docker website shows the comparison of Virtual Machines and Docker.
Figure 1 : Docker vs VM (Source Docker.com)
Table 1 compares Docker and Virtual Machines in four areas : Size of the Image, Boot-up time, Modularity and Security.
Size - Docker image sizes can start form 10’s of MBs and are typically in the order of few 100 MB’s range. VM’s include the entire OS and all the dependencies for the application can be several GB’s in Size. The impact of smaller size of the Docker image is that, a server can host far more docker images and run more containers than VMs.
Boot-up - Starting of a VM requires the OS to boot-up first and then various services related to application are started, and this can take several minutes. The Docker container runs on top of the docker layer, which has the application up and running in the order of seconds. As the Docker containers can be started quickly, they can be started as needed, and keep the compute and memory resources free.
Modularity - Due to the lightweight nature of Docker, each docker container can host a single Microservice. VM will be more resource intensive.
Security - Docker provides less isolation between applications compared to VM. In Docker, all the images share the same docker layer, as the docker layer is on top of the OS. In a VM deployment model, the hypervisor layer itself is shared and provides far better isolation between VMs. This security concern for docker can be addressed by having sensitive docker images run in its own VM,
Table 1 : Docker vs VM - comparison points
Docker Architecture:
Docker is a Client-Server application with these major components:
A server - a daemon process (the dockerd command)
Client ( the docker command)
A Rest API to talk to the daemon.
Once the server is started on a ‘Docker Host’ machine, all docker related functionality is performed by the client by using the ‘docker’ command. The Rest API is invisible to the user. Docker further provides a mechanism to create a ‘Network’ and ‘Data volumes’ for data storage that is used by the docker containers.
Specific terminology related to docker include:
Images: An image is a read-only template with instructions for creating a Docker container. To build your own image, you create a Dockerfile with a simple syntax for defining the steps needed to create the image and run it.
Containers: A container is a runnable instance of an image. You can create, start, stop, move, or delete a container.
Docker Registries: A Docker registry stores Docker images. Docker Hub and Docker Cloud are public registries. A corporation can host its own private registry.
Underlying technology:
Docker is written in Go and uses features of Linux Kernel.
Namespaces – Docker uses a technology called namespaces to provide the isolated workspace called the container. Control Groups – On Linux, limits an application to a specific set of resources. Union File System (FS) - file systems that operate by creating layers, making them very lightweight and fast.
Docker - Hands on:
As with all technology, trying it hands-on gives a enhanced understanding of the product and its features. The barrier for trying docker is fairly low. The simplest way to get started (on Windows) is to install the ‘Docker ToolBox’. Once installed, running a docker container from the public docker registry is very simple.
Example : Run a Centos 7 container from Docker registry:
$ docker container run -it --name centos-test -d centos:7
Parameter after ‘—name’ is the name you give ‘centos:7’ is the docker image name from public docker registry Get the Bash shell of the Container
$ docker exec -i –t centos-test /bin/bash
Contrasting this to a VM setting, to get Centos 7 up and running, you have to find the ISO image and install it, which can take a long time. Whereas, with docker, the centos 7 container can be up and running in less than a minute.
Building Docker Image:
Building a Docker image requires two steps -
Creating a file named ‘Dockerfile’, this file has statements that are instructions for creating the Docker image.
Once the ‘Dockerfile’ is ready building the image is done via this command:
$ docker build -t schema-creator schema-creator/ - Parameter after ‘-t’ is the image name (user provided) - The second parameter points to the location of the ‘Dockerfile’
Example of a ‘Dockerfile’ : this example creates a docker image for creating tables in a Postgres Database.
FROM centos:7
RUN yum -y install postgresql
RUN yum -y install java-1.8.0-openjdk java-1.8.0-openjdk-devel
RUN mkdir /opt/schema-updater
RUN mkdir /opt/schema-updater/lib
RUN cd /opt/schema-updater
COPY schema-updater/lib /opt/schema-updater/lib
COPY schema-updater/ /opt/schema-updater/
CMD cd /opt/schema-updater && ./schema-upgrader.sh
The ‘Dockerfile’ starts with the base ‘centos:7’ image from the public docker repository. Then ‘RUN” docker command is used to install dependencies. The ‘COPY’ command is used to copy data from the local machine into the docker image being created. Finally, the CMD statement is run once the build docker image is started using the ‘docker run …’ command.
Docker Volumes:
There is often need to share data between one service that is running in one container and another service in another container. For sharing data across containers, Docker provides ‘docker volume’. The steps to share data are: 1) create a docker volume. 2) Start first container mapping the created volume to the path of the container that stores/reads data to the volume. 3) Start the second container using the same volume and the path this container uses to read data.
$docker volume create datastore-vol
$ docker run -v datastore-vol:/opt/data1 --name container1 container1/voltest1
$docker run -v datastore-vol:/opt/data2 --name container2 container2/voltest2
Frequent Docker Commands:
$ docker ps
$ docker network create test-net
$ docker run -p 5432:5432 -v /c/Users:/data --network test-net --name test-postgres postgres:9.6
$ docker run -d --network test-net -e ENV_VAR1='value1' -e ENV_VAR2='8080' -v data-vol:/opt/data1 -p 8090:8080 --name container1 container/tomcat
$ docker build -t schema-creator schema-creator/
$ docker exec -i –t container1 /bin/bash
$ docker stop container1
$ docker container rm container1
$ mvn package -P docker -Ddocker.verbose //once integrated with Maven
Summary:
In this article, we were able to explore and delve deep into the various aspects of Docker technology. The comparison between the VM and containerization technology allows an improved understanding of the advantages and disadvantages of the use of Docker.
The above hands-on description allows a deeper understanding and demonstration of its simplified and varied use in real-life situations. The hands-on part included : building a docker image, running the container, as well as advanced options of creating volumes to share data. To better enhance use of the Docker system, I have also included Frequently Used Docker commands to assist those starting out in exploring this new technology. I hope that by reading this article that everyone was able to get an idea of Docker technology, its ease of use, and the advantages it brings to the current technology landscape.
References: