The Over-Reliance Trap: When Your Best Analyst Stops Questioning the AI
Automation complacency is the AI SOC's most underrated risk. Why fluent explanations deepen over-reliance — and how to design tools that keep analysts questioning.
I want to describe something that is becoming more common in security operations, even if no one writes it into the incident report.
An alert fires. The AI analysis layer picks it up, gathers the context, and produces a clean verdict: closed as a false positive, with the reasoning attached. The analyst reads the first two lines and decides it sounds right. It usually is right. They approve it and move to the next one.
Multiply that by a few hundred alerts a day, across a few months, and something quietly shifts in the analyst. They stop reading the third line. Then they stop opening the evidence. Eventually the thing they trust becomes the verdict, and the analysis behind it becomes a formality they no longer check.
That is the over-reliance trap. And I believe it is the most underrated risk in the entire AI SOC conversation.
We have known about this for thirty years. We just called it something else.
None of this is new. It is new to security, but human-factors researchers were studying it long before anyone put an LLM next to an alert queue.
In aviation, it is the pilot who trusts the autopilot so completely that they stop scanning the instruments. Raja Parasuraman and Dietrich Manzey, in their foundational review of the field, gave it a precise name: automation complacency, the insufficient monitoring and checking of an automated system when we assume it is working correctly. Their most uncomfortable finding, for anyone who thinks seniority is a defense: automation complacency appears in both novice and expert operators, and it cannot be removed with simple practice.
It is worth reading that again. Expertise does not protect you. Your best L2 or L3 is not immune to any of it. In some ways they are even more exposed, because they have the pattern recognition that makes skimming feel safe.
The twin failure is automation bias: when the automated recommendation is wrong and the human follows it anyway, because checking has started to feel unnecessary. Parasuraman and Manzey describe how this produces not only errors of omission, missing what the system missed, but errors of commission, actively doing the wrong thing because the system advised it.
In a cockpit, that is a missed altitude warning. In a SOC, it is a confirmed true positive closed as benign because the AI was confident and the analyst was busy.
The painful part: better explanations can make it worse
This is where the modern research becomes genuinely counterintuitive, and where I think most AI SOC vendors are quietly making the problem worse while believing they are solving it.
If we want analysts to trust AI to the right degree, the instinct is to give them explanations and show the reasoning. Surely a verdict that arrives with a paragraph of justification is safer than a bare one.
A 2025 study presented at the CHI conference tested exactly this, with a pre-registered experiment of more than three hundred participants. The finding belongs on the wall of every product team building these tools: the presence of explanations increased reliance on both correct and incorrect answers. People trusted the explained answer more, whether or not it was right. Worse still, explanations made participants more confident in their own judgment and less likely to ask a follow-up question.
A fluent explanation does not make a wrong answer easier to catch. It makes it easier to accept.
This lines up with what cognitive scientists at UC Irvine called the calibration gap, in work published the same year in Nature Machine Intelligence. People consistently overestimate how reliable an LLM’s output is, and standard explanations do not help them tell a correct answer from an incorrect one. As the researchers put it, there is a disconnect between what the model knows and what the person thinks it knows.
So the industry’s default move, producing a confident summary with a tidy rationale, turns out to be the recipe for over-reliance. We are optimizing for the very thing that disarms the analyst.
What the analysts themselves are telling us
The part that gives me some hope is that working analysts seem to sense this, even when the tools work against it.
A longitudinal study by CSIRO’s Data61, run inside a live SOC with the MDR provider eSentire, examined more than three thousand real analyst queries to LLMs over ten months. The pattern was clear: analysts used the AI for sense-making and context-building, not for handing over the final call. They kept the judgment for themselves. The researchers’ recommendation to anyone designing these systems was direct, that good tools should surface evidence over recommendations and preserve analyst autonomy rather than replace it.
A separate study surveying hundreds of SOC professionals across four continents found the same instinct from the other direction. Analysts were willing to accept AI output even at lower accuracy, as long as the reasoning was visible and grounded in evidence. They did not want a more confident machine. They wanted one that showed its work, so they could stay inside the loop instead of being walked past it.
To me, that is the whole point. The analysts are not the problem. The design is.
How this should shape the way we build
I do not think the answer is to make AI less capable, to constrain it, or to slow it down out of misplaced caution. The capacity problem in security operations is real, and AI is the only realistic way through it. Telling analysts to manually re-analyze every verdict defeats the entire purpose and burns out the very people we are trying to protect.
The answer is to design against complacency on purpose. A few principles I keep coming back to:
Make the product deliver the evidence layer, not just the verdict. If the analyst’s eye lands on a confidence score, you have already lost them to the trap. If it lands on the evidence chain, the missing telemetry, the detail that does not quite fit, you have kept them thinking and engaged.
Let the system say it does not know. A tool that always produces a clean answer trains the analyst to expect one and to trust it. A tool that can return inconclusive, and name exactly what is missing, keeps the human’s skepticism alive, because it models that skepticism itself.
Surface inconsistencies instead of hiding them. The same CHI study produced one genuinely useful result: when explanations contained visible inconsistencies, or when sources were shown, people relied less on the wrong answers. Friction, placed in the right spot, is a feature. A system that hides its own uncertainty to look polished is doing the analyst a disservice.
Treat over-reliance as a metric, not a feeling. If your analysts approve verdicts faster every month while looking at the underlying evidence less, that is not maturity. It is the trap closing. It is measurable, and it should be measured.
The question that stays with me
The promise of AI in the SOC was never to replace human judgment. It was to clear away the noise that buries it, and give that judgment more room to breathe.
But there is a version of this future where we gain the speed and lose the judgment anyway, not because the AI took it, but because we slowly stopped using it. The verdict came fast, it sounded right, and one day the analyst simply stopped looking behind it.
I do not think that outcome is inevitable. I think it is a design choice, made one product decision at a time. The tools that make it through the next few years will not be the ones that sound the most certain. They will be the ones that keep the human awake.
The best analyst is not the one who trusts the AI. It is the one who never quite stops questioning it.
Our job, as the people building these systems, is to make sure the tool rewards that instinct instead of dissolving it.
Written by Burhan Ünal Canmaya, Founding Member at Priam Cyber AI.
Sources:
- Parasuraman, R., and Manzey, D. (2010). Complacency and Bias in Human Use of Automation: An Attentional Integration. Human Factors, 52(3). https://journals.sagepub.com/doi/10.1177/0018720810376055
- Parasuraman, R., and Riley, V. (1997). Humans and Automation: Use, Misuse, Disuse, Abuse. Human Factors, 39(2). https://journals.sagepub.com/doi/10.1518/001872097778543886
- Kim, S. S. Y., Vaughan, J. W., Liao, Q. V., Lombrozo, T., and Russakovsky, O. (2025). Fostering Appropriate Reliance on Large Language Models: The Role of Explanations, Sources, and Inconsistencies. CHI 2025. https://arxiv.org/abs/2502.08554
- Steyvers, M., et al. (2025). What large language models know and what people think they know. Nature Machine Intelligence. https://doi.org/10.1038/s42256-024-00976-7
- Singh, R., et al. (2025). LLMs in the SOC: An Empirical Study of Human-AI Collaboration in Security Operations Centres. https://arxiv.org/abs/2508.18947
- Rastogi, N., et al. (2025). Too Much to Trust? Measuring the Security and Cognitive Impacts of Explainability in AI-Driven SOCs. ACM CCS 2025. https://arxiv.org/abs/2503.02065