Autonomous Bug Bounty Agent(with Scope-Enforcing Proxy + PoC Validator)

Status: Private beta / Early access

Focus: Authorized, in-scope security testing (VDP / Bug Bounty, black-box)

We’re three security researchers based in Tokyo building an autonomous agent framework that can map an application, plan targeted security hypotheses, and produce a human-reviewable report while enforcing strict safety constraints so it can’t wander out of scope.

There’s no public repo yet; this page shares architecture and learnings for feedback.

TL;DR

  • Multi-agent workflow: recon → hypothesis planning → class-specific testing → validation → report drafting.
  • All traffic passes through a scope-enforcing proxy (allowlist + rate/concurrency caps + logging).
  • Real-world validation (Feb 8, 2026): running on ~5 targets/week since late 2025.
  • U.S. Dept of Defense (DoD): 3 vulnerabilities triaged.
  • HackerOne ranking: reached #86 globally in VDP (90 Days) leaderboard.
  • Bug Bounty Programs: 2 duplicates, 1 under review.
  • Benchmarks: solved 84% of PortSwigger Web Security Academy labs autonomously.
HackerOne Triage Verification
HackerOne Triage Verification
HackerOne VDP Ranking
HackerOne VDP Ranking

What this is / isn’t

✅ This is

  • An autonomous testing engine for authorized scopes with human approval before submission.
  • A precision-focused system that validates findings and only leaves final approval/report submission to a human.

❌ This is not

  • A fully autonomous submit-to-bounty bot.
  • A general internet crawler or exploitation toolkit.
  • A replacement for structured, coverage-driven pentest methodology (yet).

The Architecture

The workflow mimics human red-team methodology while maintaining hard safety controls. Final submission is always decided by a human.

  • Input: Target URL and optional credentials (for grey-box testing)
  • Output: Drafted report for human review

Architecture diagram (Mermaid)

Rendering diagram…

Architecture (overview)

1) Initial Recon Agent:enumerates reachable endpoints in-scope, infers technology patterns, and builds an attack-surface map.

2) Coordinator:selects hypotheses, delegates to specialized agents, and manages budgets/rate limits/retries/stop conditions.

3) Specialized testing agents:focused workers (IDOR/SQLi/XSS) reduce hallucinations and apply class-specific evidence heuristics.

4) Validator + Report Drafting:replays key requests, runs negative checks, collects artifacts, and emits draft reports for human review.

Execution environment

  • Python runtime (parsing, diffing, state handling)
  • Headless browser (DOM rendering, JS-driven flows)
  • Kali Linux shell (recon utilities, HTTP tooling, parsers)
  • All traffic routed through the scope-enforcing proxy

Safety model and guardrails

Safety is a hard constraint. This system is intended only for authorized testing.

Scope-Enforcing Proxy

  • Allowlist controls: FQDN/method constraints and optional headers
  • Throttling: max RPS and concurrency caps
  • Auditing: full allow/deny logging and reproducible traces
  • Default-deny: ambiguous requests are blocked

Safe PoC policy

  • Prioritizes read-only verification patterns
  • Avoids destructive payloads and persistence attempts
  • Stops on instability or side-effect risk signals

We do not publish exploit payloads or step-by-step compromise guidance.

Experimental Results (As of Feb 8, 2026)

  • Running against ~5 targets/week since late 2025.
  • VDP success: #86 globally on HackerOne VDP (90 Days), with 3 DoD vulnerabilities triaged.
  • BBP challenges: 2 submissions closed as duplicates, 1 report under review.
  • Key learning: impact gap between technical correctness and business criticality.

Performance (one representative run)

  • Wall time: ~2 hours
  • Model/API cost: low single-digit USD (varies by conditions)
  • Human time: review, verification, and final editing

Optimization priorities: high precision, strong evidence trails, strict scope adherence.

Limitations / current challenges

  • SPA-heavy targets still degrade performance due to deeper browser-state modeling demands.
  • Context growth can cause inefficient behavior or rare loops; mitigated via budgets, stop conditions, and summarization.
  • Coverage and reproducibility vary by exploration path, timing, and defenses.

Ethics

  • Authorized testing only within explicit VDP / bounty scopes.
  • Human-in-the-loop; no automatic submissions.
  • Scope enforcement via proxy and default-deny rules.
  • No harmful payload sharing.

Contact / Disclosure

Open to collaboration and feedback from teams building similar systems.

Email: info@layer8.jp