Engineering Metrics That Matter: Measuring What Counts

The Measurement Problem in Engineering

Engineering leaders face a paradox: you need data to make good decisions about your team, but the wrong metrics create perverse incentives that actively harm performance. Counting lines of code rewards verbosity. Tracking the number of tickets closed incentivizes splitting work into smaller pieces. Measuring individual bug counts discourages engineers from working on risky, high-impact projects.

The result is that many teams either measure nothing (flying blind) or measure everything (drowning in dashboards nobody looks at). Neither approach serves you well. What you need is a small set of meaningful metrics that give you genuine insight into your team's health, velocity, and quality - without creating the toxic side effects that bad metrics produce.

This guide covers the metrics that actually matter, how to collect them, and how to use them responsibly.

The DORA Metrics: Your Starting Point

The DevOps Research and Assessment (DORA) team at Google has spent years studying what separates high-performing engineering organizations from the rest. They identified four key metrics that reliably predict both engineering performance and business outcomes.

1. Deployment Frequency

What it measures: How often your team deploys code to production.

Why it matters: Deployment frequency is a proxy for batch size. Teams that deploy frequently are shipping smaller changes, which means lower risk per deployment, faster feedback loops, and quicker delivery of value to users.

Benchmarks:

Elite: Multiple deploys per day
High: Between once per day and once per week
Medium: Between once per week and once per month
Low: Less than once per month

How to measure it: Count the number of production deployments per week. Most CI/CD tools (GitHub Actions, GitLab CI, CircleCI) can report this automatically. If you use feature flags, count flag activations as deployments.

Common pitfalls: Do not push teams to deploy more frequently by splitting work artificially or deploying trivial changes. The goal is to enable frequent deployment through smaller batch sizes, automated testing, and reliable CI/CD - not to game the number.

2. Lead Time for Changes

What it measures: The time from the first commit on a feature branch to that code running in production.

Why it matters: Long lead times mean slow feedback loops, stale branches that are painful to merge, and features that take longer to reach users. Short lead times indicate a smooth, automated path from development to production.

Benchmarks:

Elite: Less than one hour
High: Between one day and one week
Medium: Between one week and one month
Low: More than one month

How to measure it: Track the timestamp of the first commit on a branch and the timestamp of the deployment that includes that branch. Tools like LinearB, Sleuth, or Faros AI can automate this measurement by integrating with your Git provider and deployment pipeline.

Common pitfalls: Lead time includes code review time, which is often the biggest contributor. Do not pressure reviewers to approve faster - instead, make PRs smaller and easier to review, and ensure team norms support prompt reviews.

3. Change Failure Rate

What it measures: The percentage of deployments that result in a failure requiring remediation (rollback, hotfix, or incident).

Why it matters: Deploying frequently is only valuable if those deployments are reliable. Change failure rate tells you whether your testing, review, and deployment processes are catching problems before they reach users.

Benchmarks:

Elite: 0-15%
High: 16-30%
Medium: 31-45%
Low: Greater than 45%

How to measure it: Track the number of deployments that are followed by a rollback, hotfix deployment, or incident within 24 hours. Divide by total deployments.

Common pitfalls: If tracking this metric causes teams to avoid deploying, you have implemented it wrong. The solution to high change failure rate is better testing and smaller changes, not fewer deployments.

4. Mean Time to Recovery (MTTR)

What it measures: How long it takes to restore service after a production incident.

Why it matters: Failures are inevitable. What matters is how quickly you detect, diagnose, and resolve them. Low MTTR indicates strong operational practices - good monitoring, clear runbooks, and effective incident response.

Benchmarks:

Elite: Less than one hour
High: Less than one day
Medium: Between one day and one week
Low: More than one week

How to measure it: Track incident start time (when the issue was detected) and resolution time (when service was restored). PagerDuty, OpsGenie, or even a simple incident log spreadsheet can provide this data.

Beyond DORA: Metrics for Team Health

DORA metrics tell you about your delivery pipeline, but they do not capture everything. Add these metrics to get a fuller picture.

Developer Experience (DX) Score

Survey your team quarterly with questions about their tools, processes, and overall satisfaction. Ask about specific friction points: build times, test reliability, documentation quality, and tooling gaps. A simple 1-10 scale on key dimensions gives you trends over time and highlights areas where investment will have the biggest impact on productivity.

PR Review Turnaround Time

Measure the time from when a pull request is opened to when the first substantive review is submitted. Long review times are one of the biggest hidden productivity killers. If PRs sit for more than 24 hours before review, your team's flow is being disrupted.

Target: First review within 4 business hours for most PRs.

Time Spent on Unplanned Work

Track the percentage of engineering time spent on bugs, incidents, and urgent requests versus planned feature work. A healthy ratio is 70-80% planned work and 20-30% unplanned. If unplanned work exceeds 40%, your systems are too fragile and need stability investment.

Onboarding Time

Measure how long it takes a new engineer to submit their first meaningful pull request (not a trivial one-line fix). This metric reflects the quality of your documentation, development environment, and codebase clarity.

Target: First meaningful PR within the first two weeks.

Implementing Metrics Without Creating Toxicity

The single most important rule of engineering metrics is: never use metrics to evaluate individual engineers. Metrics are for understanding team and system performance, not for performance reviews.

Here is how to implement metrics responsibly:

Measure teams, not individuals. All metrics should be aggregated at the team level. The moment you rank individual engineers by deployment frequency or lines of code, you create an environment where people optimize for their metrics instead of for the team's success.

Share openly, discuss collaboratively. Make your metrics dashboard visible to the entire team. Review metrics in team retrospectives as conversation starters, not report cards. Ask "what is behind this trend?" rather than "why is this number bad?"

Focus on trends, not absolutes. A single week's data point means nothing. Look at four-to-six-week rolling averages and the direction of the trend. Are things improving, stable, or degrading?

Use metrics to identify systemic issues. If lead time is increasing, the question is not "who is being slow?" but "what is our process bottleneck?" The answer might be insufficient CI capacity, unclear requirements, large PR sizes, or understaffed code review.

Revisit your metrics periodically. Every six months, evaluate whether your metrics are still driving the right behavior. If a metric has been consistently healthy for a long time, consider replacing it with one that addresses a current challenge.

Building Your Metrics Dashboard

Start small. A simple dashboard with the four DORA metrics plus PR review turnaround time gives you 80% of the insight you need with 20% of the effort.

Tooling options:

LinearB, Sleuth, or Faros AI - purpose-built for engineering metrics, integrating with Git, CI/CD, and project management tools
Custom dashboards - pull data from GitHub's API and your CI/CD platform into Grafana or a simple web application
Spreadsheets - for very early-stage teams, a weekly manual update in Google Sheets is better than no measurement at all

Review cadence:

Weekly: Glance at deployment frequency and lead time to catch sudden changes
Biweekly: Review all metrics in your engineering standup or retrospective
Monthly: Deep dive with engineering leadership, correlating metrics with business outcomes and team changes
Quarterly: Review whether you are measuring the right things

From Metrics to Action

Metrics are only valuable if they lead to action. For each metric that shows a concerning trend, follow this process:

Investigate the root cause. Talk to engineers. Look at the data behind the metric. Identify the specific bottleneck or issue.
Propose a targeted improvement. Formulate a specific hypothesis: "If we reduce PR size by 40%, lead time will decrease by at least 25%."
Run the experiment. Implement the change for four to six weeks.
Measure the result. Did the metric improve as expected? Did any other metric degrade?
Decide and iterate. Keep what works, revert what does not, and move to the next improvement.

This disciplined, data-informed approach to engineering improvement compounds over time, transforming your team's delivery capability.

At InfoDive Labs, we help engineering organizations implement meaningful metrics programs that drive real improvement without creating toxic incentives. Our consulting team has built metrics dashboards, designed review processes, and coached engineering leaders at companies from seed stage to enterprise scale. If you want to understand how your team is performing and where to invest for the biggest impact, we are here to help.