What 2026 Research Says About Engineering Productivity
by Alina · Evaal AI
Findings from the 2025 and 2026 research on measuring AI augmented engineering productivity.
The measurement gap
Three frameworks are now used in combination at companies with mature measurement practices. DORA for system throughput and stability. SPACE for the contextual layer of satisfaction, communication, and flow. DX Core 4 for speed, quality, effectiveness, and developer experience.
The 2025 DORA report frames AI as an amplifier rather than a productivity tool: it magnifies both existing strengths and underlying dysfunctions in the team.
The productivity paradox
GitHub and Microsoft research with 4,800 developers established large speed ups on defined coding tasks.
But METR's study of experienced open source contributors on complex tasks told a different story.
Harvard's study of 62 million workers found junior developer hiring dropped 9 to 10 percent within six quarters of AI adoption, raising questions about long term skill formation.
The speed up shows up on simple, defined tasks. On complex architectural work it often disappears or reverses.
The rework trap
Workday's 2025 research found nearly 40 percent of the time AI saves is lost to rework: fixing hallucinations, rewriting robotic output, double checking results.
Reviewers spend roughly 4.6 times longer on AI generated PRs than on human authored ones. The rework burden falls disproportionately on the 25 to 34 age cohort. Only 12 percent of employees report sufficient training to use AI effectively.
A team that ships 38 percent more PRs while spending 4.6 times more time reviewing each one has not necessarily become more productive.
The economic shift
Cost is now measured by the token rather than the seat license.
Output tokens cost up to 10 times more than input tokens. Reasoning tokens used by chain of thought models multiply visible output by 3 to 5 times. A Deloitte case study documented one healthcare enterprise with a 6 million dollar unplanned annual cost increase from token usage growing 8 to 10 percent per month.
Tiered model routing (most queries to budget models, premium reserved for high value work) typically delivers 75 percent total spend reduction with minimal quality impact.
The cost question for engineering leaders is now "what is our cost per outcome" rather than "how many seats do we need."
What leaders want next
The wish list across the research focuses on orchestration rather than smarter underlying models:
- Change traceability. Every AI assisted commit reconstructable in a postmortem, including which human validated the output.
- Autonomous guardrails. Tools that intercept unsafe or off topic prompts before they reach the LLM.
- Cross context reasoning. AI that maintains coherent understanding across thousands of documents and projects.
- Governed semantic layers. Consistent interpretation of business terms like churn or margin across the entire organization.
A separate emerging category measures human readiness: skill atrophy risk, recovery time after a public AI error, and whether AI mediation is thinning peer collaboration.
The structural finding
The strongest predictor of realized AI value across the research is what companies do with the gains, rather than adoption rate or token spend itself.
McKinsey and Bain found that companies that reinvest AI returns into talent upskilling and process redesign achieve approximately 5 times the revenue increases of their counterparts that do not.
A few additional reference points from the same body of research:
- Google: 25 percent of new code AI generated in 2024, rising to 75 percent in early 2026.
- Dropbox: regular AI users ship 20 percent more PRs while reducing change failure rate.
- Microsoft: pioneered Bad Developer Days as a friction proxy for AI augmented teams.
- NBER customer service study: AI assistance increases issue resolution per hour by 13.8 percent, with the largest gain (35 percent) among the least experienced agents.
The signal across these findings is consistent: adoption alone produces modest gains, while redesigning processes and reinvesting the savings produces compounding ones.
Evaal AI builds engineering intelligence for teams of 30+. See how it works →