CI/CD for Embedded: Dockerized Build-Test-Deploy at Scale

Web developers have had CI/CD figured out for a decade. Push to main, a pipeline builds your container, runs your tests, and deploys to production. Embedded engineers? We're still arguing about whether the build machine should have GCC 9 or GCC 11, and the "CI pipeline" is a cron job that runs make all and emails the output to three people.

Here's how I've been fixing that across multiple projects — from BMS validation to rail systems to my own products.

Why Embedded CI/CD Is Different

Embedded CI/CD isn't just compiling code. A proper pipeline needs to handle:

Cross-compilation — you're not building for x86. The toolchain is ARM GCC, the linker scripts are custom, and the binary format is device-specific (.hex, .bin, .elf)
Hardware-dependent testing — unit tests run fine in a container, but HiL tests need physical hardware connected to specific test benches
Multi-target builds — one codebase, multiple MCU targets with different flash sizes, peripherals, and pin configurations
Artifact management — firmware binaries, debug symbols, map files, and build manifests all need to be versioned and traceable

The Architecture

After implementing variations of this across three companies, I've landed on a pattern that works. The core idea: everything runs in Docker, and the only thing that changes between CI providers is the runner config.

Stage 1: Build

A Dockerfile based on a minimal Linux image with the ARM GCC toolchain, CMake, and any vendor-specific SDK components. The build stage produces:

Firmware binary (.bin / .hex)
ELF with debug symbols
Build manifest (git SHA, timestamp, compiler version, build flags)
Static analysis output (if you're running MISRA checks or PC-lint)

The Docker image itself is versioned and stored in a registry. This eliminates the "works on my machine" problem permanently.

Stage 2: Unit & Integration Tests

These run inside the same container. I use either GoogleTest (for C/C++ firmware) or pytest (for Python test frameworks). The key is that these tests don't need hardware — they test the software logic, protocol parsing, state machines, and error handling.

# GitHub Actions example
- name: Run Unit Tests
  run: |
    docker run --rm \
      -v $PWD:/workspace \
      build-env:latest \
      bash -c "cd /workspace/build && ctest --output-junit results.xml"

The --output-junit flag is critical. JUnit XML is the lingua franca of CI systems — GitHub Actions, GitLab CI, Jenkins, and Azure DevOps all understand it. This is also what Bud consumes for test reporting.

Stage 3: HiL/SiL Tests (Hardware-in-the-Loop)

This is where embedded CI diverges from web CI. You can't run HiL tests in a container — you need a physical test bench. The solution is a self-hosted runner connected to the test bench network.

Container build completes in the cloud (stages 1-2)
Artifacts are uploaded to shared storage (S3, Artifactory, or an SMB share)
A self-hosted runner on the test bench network picks up the job
The runner flashes the firmware to the target hardware
Automated HiL tests execute via Bud Runner or a custom Python harness
Results are published back to the CI dashboard as JUnit XML

Pro tip: always include a "bench health check" step before HiL tests. Power cycle the hardware, verify CAN bus connectivity, check power supply rails. You'd be surprised how many "test failures" are actually a loose cable or a bench that someone forgot to power on.

Stage 4: Reporting & Deployment

If all tests pass, the pipeline:

Tags the firmware binary with the git SHA and build number
Uploads to an artifact repository with metadata (target MCU, git diff since last release, test results summary)
Generates a release note from commit messages
For OTA-capable devices, stages the firmware for deployment via the update server

Jenkins vs. GitLab CI vs. GitHub Actions

I've implemented this pattern on all three. My take:

Jenkins — still the most common in automotive/industrial. Declarative pipelines with shared libraries work well, but Jenkins itself is a maintenance burden. Self-hosted by nature, which is actually an advantage for HiL runners.
GitLab CI — clean YAML syntax, excellent runner management, and the Docker-in-Docker support is solid. My preference for teams that are already on GitLab.
GitHub Actions — the easiest to set up and the best ecosystem for reusable workflows. The actions/cache for Docker layers and actions/upload-artifact for firmware binaries make it very productive. My current default for new projects.

The One Thing That Matters Most

Tooling doesn't matter if the team doesn't trust the pipeline. I've seen teams invest months in CI/CD only to have engineers run tests manually because "the pipeline sometimes gives false positives." The pipeline must be reliable before it's fast. Start with a simple build+unit-test pipeline that never fails spuriously. Add HiL stages only when the foundation is solid. And always, always make the test results visible — dashboards, Slack notifications, PR check annotations. If people can't see the results, the pipeline doesn't exist.