Attack-in-Depth: Breaking AI Defenses with AiDx


The last post made the case that AI guardrails tend to fail for the same reason, and that attackers beat them by chaining. AiDx is the tool we built to do that chaining on purpose. The name stands for Adversarial, in Depth, by eXperiments, and “in depth” is the whole point. If the defense is layered, the attack has to be too.

AiDx is not a full pentest framework, and it does not try to be. It finds the AI-layer foothold and hands off. Once a chain breaches, the rest of the engagement runs on the usual toolkit: Burp, custom scripts, whatever you already use. AiDx’s job is the entry vector those tools cannot reach on their own.

Three layers, composable

AiDx treats an attack as three layers you mix and match, each aimed at a different kind of defense.

Layer 1 is structural: how you wrap the payload. Spotlighting and anti-spotlighting, end-sequences, instruction prefixes, document wrappers. This layer goes after delimiting and the model’s parsing, the same delimiting defenders lean on for spotlighting.

Layer 2 is evasion: how you encode the payload. Around fifty transforms (base64, leetspeak, homoglyphs, morse, translation, and so on), and you can chain them. This layer beats input filters and keyword-matching classifiers.

Layer 3 is strategy: how you drive the attack over time. Best-of-n, crescendo, a RAG poisoner, an LLM-driven jailbreaker. These run at execution time and go after stateful defenses and single-shot classifiers, the controls a one-shot payload never touches.

A real bypass is usually one item from each layer, combined into a single chain. That is the part manual testing gets tedious fast, and the part AiDx makes quick.

How we actually use it

Most of the work happens in research mode, which is interactive. You poke the target, watch how it responds, adapt the payload, chain in an evasion or a strategy, and keep going until one combined chain gets through. You stay in the loop the whole time. The output is not a score, it is a working chain, and that chain becomes the AI-layer foothold for the engagement.

There is a benchmark mode for the other half of the job. Replay a chain, or a built-in suite, against a target and line the results up against the frameworks clients ask about: OWASP LLM Top 10, EU AI Act, NIST AI RMF, MITRE ATLAS, ISO/IEC 42001, GDPR. The same attack work becomes a deliverable a client can act on.

One more mode worth calling out: you can point AiDx at a guardrail directly, not just the app behind it. If a classifier or filter sits in front of the model, you can test that layer on its own and learn exactly what gets past it before you touch the real target.

Why this shape

The design follows from the previous post. Guardrails are layered and probabilistic, so a single payload is the wrong tool for the job. You want to break each layer with the technique built for it, learn what that layer actually does, then assemble the pieces. Attack in depth, because the defense is in depth. AiDx is that idea with the tedious parts automated.