Docker Space Usage Discrepancy

"docker system df --verbose" tells that a container is eating up a lot of space. But when I "docker exec -it <container> sh" into the container shell and run "du -h -d1 /", I don't find any dir that's taking up space
Of course. The most likely reason for this discrepancy is how Docker's layered filesystem works, specifically with deleted files.
When you run du inside the container, you are seeing the current, merged filesystem. However, docker system df reports the total space used by all the layers of the container's image plus its own writable layer.
If a large file was added in a lower (earlier) layer and then "deleted" in a higher (later) layer, the file isn't actually gone. It's just marked as hidden in the top layer. The data still exists in the lower layer, consuming space. 👻
Think of it like this:
  • Layer 1 (Image Base): You add a huge 5 GB video file.
  • Layer 2 (Your Dockerfile): You run rm /path/to/huge/video.mp4.
Docker's copy-on-write system doesn't erase the file from Layer 1. Instead, it places a "whiteout" file in Layer 2 that says "the huge video file is deleted."
  • du inside the container: Sees the "whiteout" file and reports that the video is gone. The space appears free.
  • docker system df on the host: Sees both Layer 1 (with the 5 GB file) and Layer 2 (with the whiteout marker). It correctly reports that the total space used is still over 5 GB.

Other Common Causes

While deleted files are the most common culprit, here are other possibilities:

1. Container Logs

The container's logs (stdout/stderr) are typically managed by a logging driver (like json-file) and stored on the host machine, not inside the container's filesystem. If the container is very verbose, this log file can grow to be enormous. du inside the container will never see this file.

2. Volumes and Bind Mounts

If you have a volume or bind mount attached to the container, the data resides on the host filesystem. While du inside the container might see it, it's easy to overlook, and its space is managed outside the container's writable layer.

3. Open File Descriptors

This is a general Linux issue. A process inside your container might have a file open that has since been deleted from the filesystem. The disk space for that file won't be freed until the process closes the file handle (or the process terminates).

How to Troubleshoot and Fix It 🕵️

Here’s a step-by-step guide to find the real source of the space usage.

Step 1: Analyze the Image Layers

This will help you find those "hidden" deleted files. The best tool for this is dive.
  1. Install dive (if you don't have it): Installation Instructions
  1. Run dive on your image:Bash
    1. dive <your-image-name>:<tag>
      dive gives you a brilliant interactive TUI to explore each layer of your image. You can easily navigate the filesystem and see what files were added, modified, or deleted in each layer on the left. Look for layers with a large size and check for files that were later removed.

Step 2: Check the Container's Log File

Find where Docker is storing the container's log and check its size.
  1. Find the log path:Bash
    1. docker inspect <container_id_or_name> | grep LogPath
  1. Check its size on the host:Bash
    1. # The command above will return something like: # "LogPath": "/var/lib/docker/containers/abc.../abc...-json.log", # # Now check its size: ls -lh /var/lib/docker/containers/abc.../abc...-json.log
      If this file is huge, you've found a problem. You should configure log rotation for your containers in the Docker daemon's daemon.json file.

Step 3: Check for Changes in the Writable Layer

You can use docker diff to see what has been added (A), changed (C), or deleted (D) in the container's writable layer since it was created.
Bash
docker diff <container_id_or_name>
This can help you spot large temporary files or caches that were created during runtime.

Best Practices for Prevention 📦

  • Use Multi-Stage Builds: This is the #1 solution for the deleted file problem. You use one stage to build your application (installing dependencies, compiling, etc.) and then copy only the necessary artifacts to a final, clean, minimal base image. The intermediate layers with all the build tools and source code are discarded.
  • Combine RUN Commands: Chain your apt-get install (or equivalent) and cleanup commands in the same RUN instruction using &&. This ensures the temporary files are removed in the same layer they were created in.
    • Bad (creates a bloated layer):Dockerfile
      • RUN apt-get update && apt-get install -y build-essential RUN rm -rf /var/lib/apt/lists/*
    • Good (cleans up in the same layer):Dockerfile
      • RUN apt-get update && apt-get install -y build-essential \ && rm -rf /var/lib/apt/lists/*
  • Configure Log Rotation: As mentioned above, prevent container logs from growing indefinitely.