Transparent methodology

How ContextClean decides what looks like signal, noise, or sensitive data

ContextClean is a deterministic text filter, not an AI diagnosis service. The current browser workbench applies three optional operations: remove known low-signal patterns for the selected log type, remove exact duplicate lines, and replace common credential patterns with explicit placeholders.

Decision

Examples

Reason

Always preserve

Explicit errors, exception names, assertion differences, exit codes, application file paths

These lines identify the failure or locate it in code controlled by the user.

Usually preserve

Caused-by chains, first application frames, failing commands, relevant compiler diagnostics

They explain causality and execution context, but may need formatting rather than deletion.

Usually reduce

Exact duplicates, progress messages, cache hits, successful setup steps, post-job cleanup

They describe activity around a failure without changing its first-pass diagnosis.

Mode-dependent

Framework internals, dependency frames, package-manager warnings, container layer output

They are noise for application bugs but evidence when debugging the framework or environment itself.

Mode-specific rules

Node.js: reduces internal module loader frames and package-manager warnings while retaining application frames and module names.

Python: reduces library frames but preserves exception chaining, file locations, and application calls.

React / Next.js: reduces React DOM and bundler internals while retaining component names, server/client differences, and source files.

Docker / CI: reduces transfer, cache, checkout, and cleanup narration while retaining the failed step, command, error, and exit code.

Secret detection scope

The workbench checks common bearer tokens, JWT-shaped strings, AWS access-key identifiers, and assignments using names such asapi_key,token,secret, andpassword.

It cannot recognize every proprietary token, customer identifier, private hostname, source-code secret, or value whose meaning depends on business context. Manual review remains mandatory.

Known failure modes

Exception chains are flattened

Removing the first exception can leave only a generic wrapper. Preserve every distinct exception and its causal order.

Environment evidence disappears

A successful setup line may prove which runtime, dependency, or generated artifact was used. Restore it when environment drift is plausible.

Repeated lines encode time

Similar messages can represent retries or a loop. Exact deduplication is unsafe when timestamps or sequence count matter.

A framework regression is misclassified

Framework internals are normally low-signal, but they become primary evidence when the bug is inside the framework version or adapter.

How to validate a cleaned result

Compare the cleaned output with the original instead of reviewing it in isolation.
Confirm the first explicit error, application location, and causal chain still exist.
Restore setup or environment lines when versions, configuration, or generated files may matter.
Search manually for secrets and private data after automatic redaction.
Add the expected behavior and recent change before requesting a diagnosis.

Review the annotated examples