Autonomous Bug Bounty Agent(with Scope-Enforcing Proxy + PoC Validator)

Status: Private beta / Early access

Focus: Authorized, in-scope security testing (VDP / Bug Bounty, black-box)

We’re three security researchers based in Tokyo building an autonomous agent framework that can map an application, plan targeted security hypotheses, and produce a human-reviewable report while enforcing strict safety constraints so it can’t wander out of scope.

There’s no public repo yet; this page shares architecture and learnings for feedback.

TL;DR

Multi-agent workflow: recon → hypothesis planning → class-specific testing → validation → report drafting.
All traffic passes through a scope-enforcing proxy (allowlist + rate/concurrency caps + logging).
Real-world validation (Feb 8, 2026): running on ~5 targets/week since late 2025.
U.S. Dept of Defense (DoD): 3 vulnerabilities triaged.
HackerOne ranking: reached #86 globally in VDP (90 Days) leaderboard.
Bug Bounty Programs: 2 duplicates, 1 under review.
Benchmarks: solved 84% of PortSwigger Web Security Academy labs autonomously.

What this is / isn’t

✅ This is

An autonomous testing engine for authorized scopes with human approval before submission.
A precision-focused system that validates findings and only leaves final approval/report submission to a human.

❌ This is not

A fully autonomous submit-to-bounty bot.
A general internet crawler or exploitation toolkit.
A replacement for structured, coverage-driven pentest methodology (yet).

The Architecture

The workflow mimics human red-team methodology while maintaining hard safety controls. Final submission is always decided by a human.

Input: Target URL and optional credentials (for grey-box testing)
Output: Drafted report for human review

Architecture diagram (Mermaid)

Rendering diagram…

Architecture (overview)

1) Initial Recon Agent:enumerates reachable endpoints in-scope, infers technology patterns, and builds an attack-surface map.

2) Coordinator:selects hypotheses, delegates to specialized agents, and manages budgets/rate limits/retries/stop conditions.

3) Specialized testing agents:focused workers (IDOR/SQLi/XSS) reduce hallucinations and apply class-specific evidence heuristics.

4) Validator + Report Drafting:replays key requests, runs negative checks, collects artifacts, and emits draft reports for human review.

Execution environment

Python runtime (parsing, diffing, state handling)
Headless browser (DOM rendering, JS-driven flows)
Kali Linux shell (recon utilities, HTTP tooling, parsers)
All traffic routed through the scope-enforcing proxy

Safety model and guardrails

Safety is a hard constraint. This system is intended only for authorized testing.

Scope-Enforcing Proxy

Allowlist controls: FQDN/method constraints and optional headers
Throttling: max RPS and concurrency caps
Auditing: full allow/deny logging and reproducible traces
Default-deny: ambiguous requests are blocked

Safe PoC policy

Prioritizes read-only verification patterns
Avoids destructive payloads and persistence attempts
Stops on instability or side-effect risk signals

We do not publish exploit payloads or step-by-step compromise guidance.

Experimental Results (As of Feb 8, 2026)

Running against ~5 targets/week since late 2025.
VDP success: #86 globally on HackerOne VDP (90 Days), with 3 DoD vulnerabilities triaged.
BBP challenges: 2 submissions closed as duplicates, 1 report under review.
Key learning: impact gap between technical correctness and business criticality.

Performance (one representative run)

Wall time: ~2 hours
Model/API cost: low single-digit USD (varies by conditions)
Human time: review, verification, and final editing

Optimization priorities: high precision, strong evidence trails, strict scope adherence.

Limitations / current challenges

SPA-heavy targets still degrade performance due to deeper browser-state modeling demands.
Context growth can cause inefficient behavior or rare loops; mitigated via budgets, stop conditions, and summarization.
Coverage and reproducibility vary by exploration path, timing, and defenses.

Ethics

Authorized testing only within explicit VDP / bounty scopes.
Human-in-the-loop; no automatic submissions.
Scope enforcement via proxy and default-deny rules.
No harmful payload sharing.

Contact / Disclosure

Open to collaboration and feedback from teams building similar systems.

Email: info@layer8.jp