Engineering metrics

What 2026 Research Says About Engineering Productivity

by Alina · Evaal AI



Findings from the 2025 and 2026 research on measuring AI augmented engineering productivity.

The measurement gap

94%
Business leaders who consider AI critical to their five year strategy
Global enterprise survey, 2025
60%
Engineering and operational leaders who cite the lack of clear metrics as their largest obstacle
Same survey, engineering segment

Three frameworks are now used in combination at companies with mature measurement practices. DORA for system throughput and stability. SPACE for the contextual layer of satisfaction, communication, and flow. DX Core 4 for speed, quality, effectiveness, and developer experience.

The 2025 DORA report frames AI as an amplifier rather than a productivity tool: it magnifies both existing strengths and underlying dysfunctions in the team.

The productivity paradox

GitHub and Microsoft research with 4,800 developers established large speed ups on defined coding tasks.

Successful build rate
+84%
Task completion speed
+55.8%
Cycle time (commit to prod)
-31.8%
PR review duration
-23%
PR volume
+10.6%
GitHub and Microsoft developer study, 2025

But METR's study of experienced open source contributors on complex tasks told a different story.

−24%
Time developers predicted they would save on complex tasks
Developer self-report
+19%
Actual increase in completion time measured on those same tasks
METR, 2025

Harvard's study of 62 million workers found junior developer hiring dropped 9 to 10 percent within six quarters of AI adoption, raising questions about long term skill formation.

The speed up shows up on simple, defined tasks. On complex architectural work it often disappears or reverses.

The rework trap

Workday's 2025 research found nearly 40 percent of the time AI saves is lost to rework: fixing hallucinations, rewriting robotic output, double checking results.

Reviewers spend roughly 4.6 times longer on AI generated PRs than on human authored ones. The rework burden falls disproportionately on the 25 to 34 age cohort. Only 12 percent of employees report sufficient training to use AI effectively.

A team that ships 38 percent more PRs while spending 4.6 times more time reviewing each one has not necessarily become more productive.

The economic shift

Cost is now measured by the token rather than the seat license.

Output tokens cost up to 10 times more than input tokens. Reasoning tokens used by chain of thought models multiply visible output by 3 to 5 times. A Deloitte case study documented one healthcare enterprise with a 6 million dollar unplanned annual cost increase from token usage growing 8 to 10 percent per month.

Tiered model routing (most queries to budget models, premium reserved for high value work) typically delivers 75 percent total spend reduction with minimal quality impact.

The cost question for engineering leaders is now "what is our cost per outcome" rather than "how many seats do we need."

What leaders want next

The wish list across the research focuses on orchestration rather than smarter underlying models:

  • Change traceability. Every AI assisted commit reconstructable in a postmortem, including which human validated the output.
  • Autonomous guardrails. Tools that intercept unsafe or off topic prompts before they reach the LLM.
  • Cross context reasoning. AI that maintains coherent understanding across thousands of documents and projects.
  • Governed semantic layers. Consistent interpretation of business terms like churn or margin across the entire organization.

A separate emerging category measures human readiness: skill atrophy risk, recovery time after a public AI error, and whether AI mediation is thinning peer collaboration.

The structural finding

The strongest predictor of realized AI value across the research is what companies do with the gains, rather than adoption rate or token spend itself.

McKinsey and Bain found that companies that reinvest AI returns into talent upskilling and process redesign achieve approximately 5 times the revenue increases of their counterparts that do not.

Revenue growth multiplier for companies that reinvest AI gains into upskilling and process redesign
McKinsey and Bain, 2025
25 → 75%
Share of new code at Google that is AI generated, 2024 to early 2026
Google internal reporting

A few additional reference points from the same body of research:

  • Google: 25 percent of new code AI generated in 2024, rising to 75 percent in early 2026.
  • Dropbox: regular AI users ship 20 percent more PRs while reducing change failure rate.
  • Microsoft: pioneered Bad Developer Days as a friction proxy for AI augmented teams.
  • NBER customer service study: AI assistance increases issue resolution per hour by 13.8 percent, with the largest gain (35 percent) among the least experienced agents.

The signal across these findings is consistent: adoption alone produces modest gains, while redesigning processes and reinvesting the savings produces compounding ones.

Evaal AI builds engineering intelligence for teams of 30+. See how it works →