When you start out to containerize your application, it can feel overwhelming With all the information you read about it on the internet. But it all boils down to get a few basic things right to get your docker image rolling. Let’s go through each of these pointers in this post.
Caveat: This is all garnered from my experience and worldview of building images.
Get a good base image.
It all starts with the
FROM tag on the top of your
Dockerfile. There are so many choices here. For instance, you could go with lean images like the ones based on Alpine, but it doesn’t hurt to go with your favorite distribution either. In fact, I’d strongly recommend you go with a distribution you’ve already used. I usually start with Ubuntu images, but YMMV. Docker image size shouldn’t be a deciding criterion when you choose your image, mostly. Learning curve with the distribution(like what package manager your team is most familiar with etc.), and availability of dependencies are good guiding points when choosing base images. There was an instance where I couldn’t build an Alpine-based image because a dependency wouldn’t compile. The reason being Alpine uses a different standard C library. In short, go with a distribution you’re familiar with.
Identify the process which will run in your container.
Containers are closely modeled around the UNIX philosophy of one process doing one thing very well. If you are running a web server in a container, it should run only that, and nothing else. I’ve seen this classic antipattern of bundling all your processes into a single container and wrapping it with supervisord. While that might get you off the ground quickly, it isn’t the ideal way to run processes inside a container. If you have multiple processes, run them in their respective containers and mount their common parts in a volume. If you are running a PHP application, you will have Nginx and PHP FPM running in 2 different containers with the code mounted as a common volume between both. Similarly, if you have a Python application, you will have Nginx and gunicorn/wsgi process running in 2 different containers.
Declutter your Docker image
You must be knowing by now that containers are ephemeral. Most real-world applications need to persist some form of data or state. Any data you want to be persisted beyond the lifecycle of a container must be written in a mounted volume and not anywhere else. Also, you should consider doing a multistage build when you have a non trivial build process. For example, let’s say you use npm to compile your frontend code before you push it to production. Even though you are using Python or PHP to run your web app, you have to install npm and the required dependencies like node. Instead of cluttering your image with node just for building frontend dependencies, you could use an intermediate image and build the frontend code. Then you can just carry forward the resulting files while leaving the other files behind. This will result in a smaller resulting image with only the required packages to run your app.
Use a docker compose file.
With so many moving parts like volumes, config, exposing ports etc. your docker run command can get quite complicated. A docker compose file codifies all that into a single YAML. Besides, you can maintain different variations of docker compose files for different environments and use other cool features of docker compose like inheritance. I can’t remember the last time when I ran one-off docker run commands except when I needed to debug or test new images.
Inject config via environment variables.
Any kind of configuration(ex. the port where a process runs) or secret(ex. AWS access key and secret) should be injected into your container as environment variables. There are a few places where you can do this. In the
ENV directive of your Dockerfile when building the image and using the
--env argument when running your image. You can even give an environment file which is different for different environments and keep it off version control.
Although containers are isolated from one another and from parts of the host OS, it is a good practice to run your process as a non-root user. Most of the official images relax this policy, but I consider this when building my images. You just have to create a user meant to run the main process and give appropriate permissions to that user, so that they can do stuff only pertaining to that process. For instance, if you’re running Nginx, you just have to give access to the Nginx configuration files directory and run it in a non-standard port(not 80, as it works only when you run as root).
Also, it goes without saying that you expose only those ports which are relevant to your application.
Push it to a registry.
There is no reason to hold your docker image yo yourself, even if you are a 1 person team 🙂 You have to use it in your CI and possibly production as well. Even if you build intermediate images, you might start off from a base image you built as a starting point for all your projects. There are public registries and then there are private registries. If you are building images which contain stuff like your company’s code, you should use private registries. You can get a free private registry when you sign up for a Gitlab account. Here’s more info on how to access it. You can host your own private Docker registry, although I’d suggest it as the last resort. You are better off shipping features and building stuff 🙂 There are commercial registry providers out there with even more features like fine-grained access control.
Make your Dockerfile a part of your code
Your Dockerfile travels everywhere your code does. Anyone onboarding in your team should be able to pull it out and build an image. The same goes with your docker compose files.
Log to stdout and stderr
You don’t write logs of your process to files inside a container. As you might have already guessed it, they will perish alongside your container when it dies. You might argue that you can mount the volume where you write logs on the host machine, but that’s another antipattern. You log your process’s logs to stdout and your container or logs management sytem does the job of aggregating those logs.
version control your image
I’d reserved the most important for the last. You have to ensure that you tag and version your images. That way, you know which version is currently running. I’ve seen people do versioning based on semantic versioning, dates, application specific contexts(like PHP version). Any versioning methodology is better than using
latest. With using
latest you have no clue which
latest you are running.