Asserting container health when using docker with ansible

When using Docker with Ansible, the docker_container and docker_compose modules are usually used to manage containers. While these modules work fine and provide a lot of options, the Ansible task itself will only check if this specific task was successful or not. This means it will only check for valid inputs and if the container was started or not. It will not check if the container is actually in a healthy state after starting up.

This is an issue when trusting your build pipeline that the job has actually properly succeeded, i.e., that a healthy container is doing its job now. What you really want is that Ansible is also asserting the healthiness of the container after startup. And failing the playbook if the container is not healthy.

To achieve this, two components are necessary:

Docker container healthcheck

Many containers do not have a healthcheck built-in. Yet often, their underlying service provides an API endpoint to check the health of the service. For example, a webserver might provide an endpoint /health that returns a 200 OK if the service is healthy and a 500 Internal Server Error if the service is not healthy. Look out for the (healthy) suffix in the container overview to check if there is a healthcheck in place already. If not, try to add one yourself.

Ansible task to check container health

Now, the important part: The following docker_container_info task will check whether the State.Health.Status field of the container is healthy or not. As configured, it will retry this check 15 times, with a delay of 10 seconds between each check.

- name: Assert container health
  community.docker.docker_container_info:
    name: "<your container name>"
  until: "container_info.container.State.Health.Status == 'healthy'"
  register: container_info
  retries: 15
  delay: 10

When starting up, most healthchecks take a while to become “healthy”. But once this state is reached, you can be sure that the container is in a healthy state and the playbook can safely continue. If not, your playbook will fail after the last retry failed, and you will notice that something is wrong with your container.