agent hooks

Why this isn't
a prompt assembler.

Most "memory for agents" packages stuff context into the next prompt. @remlabs/claude-code-hooks does something different: it captures errors at the moment they happen, summarizes them, and surfaces matching ones before the next attempt. Between-pass consolidation, not pre-attempt stuffing.

+15.33ppSWE-bench Lite · Opus+REM · n=150 · p<0.05

See the 4-handler shape ↓ Install Benchmark methodology

the shape

Four handlers. One feedback loop.

Every coding agent already emits the signals you need: tool calls, exit codes, stderr, file paths. The hook layer catches them at the right moment, condenses them into a single error signature, and queries the prior error memory before the agent retries. That's it.

deriveProjectTag

Builds a scope-tight retrieval key from {repo, tool, file_path}. Memory only matches inside the same repo and tool surface — cross-project leakage is what makes "memory for agents" lose more than it wins.

Fires on every tool call

detectFailure

Captures stderr + exit code on PostToolUseFailure. No stack-trace heuristics, no LLM-judges — just the raw failure bytes, deterministic.

PostToolUse hook, exit_code != 0

summariseToolCall

Condenses to { error_signature, fix_applied }. The signature is the deduplication key — the same error + fix only get stored once, regardless of how many tasks hit it.

After fix is applied, before next call

rerank

Surfaces matching prior errors before the next attempt — not stuffed into context, but injected as a one-line "you've seen this before:" hint. The agent uses it as priors, not history.

PreToolUse hook, with prior-match

install

Two files. One settings change.

The package is in private beta — the npm registry is gated until GA. The settings.json shape below is what the public package will install.

Terminal — preview (private beta)

# Private beta -- request access in Discord first
# https://discord.gg/ux8NYVfK2

$ npm install @remlabs/claude-code-hooks  # private beta
$ npx @remlabs/claude-code-hooks init     # writes settings.json

~/.claude/settings.json

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash|Edit|Write",
        "hooks": [{
          "type": "command",
          "command": "npx @remlabs/claude-code-hooks pre"
        }]
      }
    ],
    "PostToolUse": [
      {
        "matcher": "Bash|Edit|Write",
        "hooks": [{
          "type": "command",
          "command": "npx @remlabs/claude-code-hooks post"
        }]
      }
    ]
  },
  "env": {
    "REM_API_KEY": "bl_live_...",
    "REM_PROJECT_TAG": "auto"
  }
}

single-task walkthrough

One Django ORM task. Two passes.

Real SWE-bench example. Pass 1 fails on the apply step — the patch hunk targets a line that's already changed. Pass 2 sees the prior error pattern and applies the fix correctly the first time. This is the +15.33pp mechanism in a single task.

× Pass 1 — cold (no hooks)

Task: django__django-13315 — ORM filter with OuterRef raises TypeError.

Agent's patch: targets line 482 of django/db/models/sql/query.py.

Result: git apply fails — the line numbers in the unified diff don't match the actual file. Apply error. Task fails.

[failure stored to REM with signature: django.orm.outerref.apply_offset_drift]

✓ Pass 2 — with hooks active

PreToolUse fires. deriveProjectTag = {django, Edit, query.py}. Rerank finds 1 prior match: apply_offset_drift.

Hint surfaced: "you've seen this before — previous patch failed on offset drift; emit unified diff with full context lines, no hunk headers."

Agent's second patch: emits --unified=5 diff. git apply succeeds. Tests pass.

[recovered — one of 26 in the n=150 run]

The honest +15.33pp note

+15.33pp strict on SWE-bench Lite n=150, 95% CI [+9.33, +22.00], p<0.05. Cold Opus-4.7 was 30.0%; Opus-4.7 + REM hooks was 45.3%.

Mechanism is between-pass error consolidation. The lift comes from 26 recovered tasks (+) and 3 regressions (−), netting +23 strict. Apply-errors dropped 48% Pass 1 → Pass 2.

This is reproducible. Methodology, eval logs, and per-task diffs live at /benchmarks. If you want to verify, the run script and Docker image are in the repo — that's the entire point. We retired the older +16pp n=50 number when it failed to reproduce; we'll retire this one if it does the same.

30.0%

Opus-4.7 cold

45.3%

Opus-4.7 + REM

+15.33pp

strict lift

26 / 3

recovered / regressed

−48%

apply-errors

n = 150

tasks, seed=42

Why this isn'ta prompt assembler.