From 6c1df64b40d4af6aedb929f9fb54869a70542085 Mon Sep 17 00:00:00 2001 From: root Date: Sat, 25 Oct 2025 13:27:29 +0200 Subject: [PATCH] =?UTF-8?q?chore:=20dodaj=20wsparcie=20Docker=20i=20dokume?= =?UTF-8?q?ntacj=C4=99=20Claude=20Code?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Dodano Dockerfile z multi-stage build (artifacts + dev environment) - Dodano .dockerignore dla optymalizacji budowania - Dodano CLAUDE.md z dokumentacjΔ… architektury i workflow dla Claude Code πŸ€– Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude --- .dockerignore | 16 ++++ CLAUDE.md | 250 ++++++++++++++++++++++++++++++++++++++++++++++++++ Dockerfile | 45 +++++++++ 3 files changed, 311 insertions(+) create mode 100644 .dockerignore create mode 100644 CLAUDE.md create mode 100644 Dockerfile diff --git a/.dockerignore b/.dockerignore new file mode 100644 index 0000000..d3ecaee --- /dev/null +++ b/.dockerignore @@ -0,0 +1,16 @@ +.log +node_modules +sidebar.js +web-ext-artifacts/ +lib/* +yarn-error.log +rentgen.zip + +# Generated PNG icons (build artifacts) +assets/icons/*.png +assets/icon-addon-*.png + +# Exception: do not ignore the `browser-api` directory inside `lib` +!/lib/browser-api/ + +Dockerfile diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..6ba8bef --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,250 @@ +# CLAUDE.md + +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. + +## Project Overview + +**Rentgen** is a privacy-focused browser extension for Firefox and Chrome that intercepts web traffic, identifies third-party tracking, and visualizes stolen data (cookies, browsing history, etc.). It generates GDPR-compliant reports and email templates for Polish website administrators and the Personal Data Protection Office. + +**Language Note**: The codebase is in English, but the extension UI and generated reports are in Polish. Comments and documentation may be bilingual. + +## Build & Development Commands + +### Standard Build Workflow +```bash +npm install # Install dependencies +npm run build # Build for Firefox (default) +npm run build:firefox # Build for Firefox explicitly +npm run build:chrome # Build for Chrome +npm run create-package # Package into web-ext-artifacts/ directory +npm run build-addon # Complete build: install + build + package +``` + +### Development Workflow +```bash +npm run watch # Watch mode - auto-rebuild on file changes +npm run watch:firefox # Watch for Firefox +npm run watch:chrome # Watch for Chrome +npm run ext-test # Run extension in temporary Firefox profile (web-ext run) +``` + +### Quality Checks +```bash +npm run typecheck # Run TypeScript type checking (tsc --noEmit) +npm run lint # Lint extension with web-ext lint +``` + +### Testing in Browser +After building, load the temporary add-on: +1. Firefox: Navigate to `about:debugging` β†’ This Firefox β†’ Load Temporary Add-on +2. Chrome: Navigate to `chrome://extensions/` β†’ Enable Developer Mode β†’ Load unpacked + +**Note**: There are no automated test suites in this codebase. Testing is manual via browser extension loading. + +## Architecture Overview + +### Core Pattern: Event-Driven Singleton with Observer Pattern + +Rentgen uses a centralized **Memory** singleton (background script) that: +- Intercepts all HTTP requests via `webRequest` API +- Maintains hierarchical data structure: `origin β†’ shorthost β†’ RequestCluster β†’ StolenDataEntry` +- Emits `'change'` events when data updates +- Drives UI re-renders in React components via custom `useEmitter` hook + +### Key Components + +#### Background Service (`background.ts` + `memory.ts`) +- **Memory class** (extends SaferEmitter): Central orchestrator managing all extension state + - Listens to `webRequest.onBeforeRequest` and `webRequest.onBeforeSendHeaders` + - Maintains `clusters` map: `origin β†’ Map` + - Emits `'change'` events to notify UI components + - Updates browser badge with domain count and color indicators + - Accessible globally via `getMemory()` singleton + +#### Network Interception Layer +- **ExtendedRequest class** (`extended-request.ts`): Wraps individual HTTP requests + - Static registry: `ExtendedRequest.by_id[requestId]` for fast lookup + - Two-phase initialization: constructor (body) + `init()` method (headers) + - Detects third-party requests by comparing origins + - Extracts "stolen data" from: cookies, query params, pathname, headers, request body + - Generates HAR (HTTP Archive) format for reports + - Calculates priority scores based on data sensitivity + +- **RequestCluster class** (`request-cluster.ts`): Groups requests by `origin + shorthost` + - Aggregates StolenDataEntry items with deduplication + - Tracks expanded/collapsed state for UI + - Auto-marks suspicious entries (history exposure, tracking IDs) + - Emits `'change'` events on modifications + +#### Data Classification +- **StolenDataEntry class** (`stolen-data-entry.ts`): Individual data points + - Sources: `cookie`, `pathname`, `queryparams`, `header`, `request_body` + - Classifications: `id` (tracking ID), `history` (browsing history), `location` (geolocation) + - Smart value parsing: recursively decodes Base64, JSON, URLs, nested structures + - Priority calculation: combines value length, origin exposure, data type + - Mark/unmark system for user selection in reports + +#### Browser API Abstraction (`lib/browser-api/`) +- **Cross-browser compatibility layer** selected at build time via `TARGET` env var +- `types.ts`: Unified interface for tabs, badge, webRequest, cookies APIs +- `index.ts`: Exports Chrome or Firefox implementation based on `process.env.TARGET` +- Standardizes differences (e.g., `browserAction` vs `action`) + +### Data Flow + +``` +HTTP Request Initiated + ↓ +Memory.onBeforeRequest β†’ Create ExtendedRequest (capture body) + ↓ +Memory.onBeforeSendHeaders β†’ ExtendedRequest.init() (capture headers) + ↓ +Extract stolen data (cookies, params, headers, body) + ↓ +Memory.register() β†’ Check if third-party β†’ Add to RequestCluster + ↓ +Memory.emit('change', shorthost) β†’ Broadcast event + ↓ +React components (via useEmitter hook) β†’ UI re-renders +``` + +### UI Components + +#### Sidebar (`components/sidebar/`) +- **sidebar.tsx**: Main extension UI listing third-party domains +- **stolen-data.tsx**: Renders RequestClusters with filtering options +- **stolen-data-cluster.tsx**: Expandable cluster showing individual StolenDataEntry items +- Filters: `minValueLength`, `cookiesOnly`, `cookiesOrOriginOnly` +- Real-time updates via `useEmitter(Memory)` hook + +#### Report Window (`components/report-window/`) +- Multi-stage report generation workflow: + 1. **Survey**: User questionnaire (role, tone, gender pronouns) via `survey-react` + 2. **Screenshot**: External service generates domain screenshots + 3. **Preview**: Final email/report content with GDPR violation analysis +- **deduce-problems.tsx**: Analyzes survey answers to identify GDPR violations +- **har-converter.tsx**: Generates filtered HAR archives +- **email-content.tsx**: Renders Polish email template (polite or harsh tone) + +#### Toolbar (`components/toolbar/`) +- Browser action popup (top-right icon) + +### Build System + +- **esbuild** (`esbuild.config.js`): TypeScript β†’ JavaScript bundler + - Entry points: toolbar.tsx, sidebar.tsx, report-window.tsx, background.ts, diag.tsx, styles + - External React libs loaded via globals (`globalThis.React`, `globalThis.ReactDOM`) + - SCSS plugin for styling + - Define flags: `PLUGIN_NAME`, `PLUGIN_URL` + - Watch mode available for development + +- **Target Selection**: Set `TARGET=firefox` or `TARGET=chrome` before build to select browser API implementation + +- **Manifest**: Currently uses Manifest V2 (Firefox). Chrome support is partial and being expanded. + +## Important Implementation Details + +### Third-Party Detection Heuristics +When determining if a request is third-party (in `extended-request.ts`): +1. Compare request origin with tab origin +2. Check `documentUrl` and `originUrl` from webRequest details +3. Use `urlClassification.thirdParty` if available +4. Analyze `frameAncestors` for nested iframe scenarios +5. Fall back to comparing hostnames + +### Stolen Data Extraction Strategy +Data is extracted from multiple sources (priority order): +1. **Cookies**: Via `browser.cookies.getAll()` +2. **Query Parameters**: Parsed from URL search string +3. **Pathname**: URL path segments +4. **Headers**: Request headers (Cookie, Referer, etc.) +5. **Request Body**: POST/PUT data (form data, JSON) + +### Value Parsing Chain +`StolenDataEntry` recursively decodes values: +1. Detect Base64 encoding β†’ decode +2. Detect URL encoding β†’ decode +3. Detect JSON β†’ parse +4. Detect nested URLs β†’ extract +5. Stop at maximum recursion depth + +### Auto-Marking Rules +RequestClusters automatically mark entries as suspicious if: +- Value exposes browsing history (referrer, path info) +- Cookie length > 100 characters +- Known trackers: Google Analytics, Facebook, DoubleClick, etc. +- Classified as `id` or `history` type + +### Event System +- **SaferEmitter** (`safer-emitter.ts`): EventEmitter wrapper with async emission + - Uses `setTimeout(..., 0)` to decouple events from synchronous request handling + - Prevents errors in listeners from breaking request flow +- **useEmitter Hook** (`components/sidebar/sidebar.tsx`): React integration + - Increments counter state on each event to trigger re-renders + - Automatically subscribes/unsubscribes via `useEffect` + +## Common Development Tasks + +### Adding a New Data Source +1. Add extraction logic to `ExtendedRequest.getAllData()` in `extended-request.ts` +2. Define new source type in `StolenDataEntry.sources` in `stolen-data-entry.ts` +3. Update classification logic in `StolenDataEntry.getClassification()` +4. Update UI components to display the new source (if needed) + +### Supporting a New Browser +1. Create implementation in `lib/browser-api/` (e.g., `safari.ts`) +2. Update `lib/browser-api/index.ts` to export based on `TARGET` env var +3. Add corresponding npm scripts in `package.json` (`build:safari`, etc.) +4. Test browser-specific APIs for compatibility + +### Modifying Report Templates +- **Email templates**: `email-template-polite.js`, `email-template-harsh.js` +- **Problem deduction**: `components/report-window/deduce-problems.tsx` +- **Survey questions**: `components/report-window/questions.tsx` +- Templates are in Polish and follow GDPR complaint structure + +### Debugging Request Interception +- Check `ExtendedRequest.by_id` registry for request lookup issues +- Verify `Memory.clusters` structure for data organization +- Use browser DevTools β†’ Extensions β†’ Background Page for logging +- Enable `console.log` in `memory.ts` event listeners + +## Project Constraints + +- **No automated tests**: Manual testing only via browser extension loading +- **Polish language**: UI and reports are Polish-focused (English i18n is future work) +- **Manifest V2**: Primary target is Firefox; Chrome V3 migration is in progress +- **External screenshot service**: Report generation depends on external API +- **No minification**: Currently disabled in esbuild config (commented out) +- **Node.js 16.x requirement**: Specified in README + +## Repository Information + +- **Primary Repository**: https://git.internet-czas-dzialac.pl/icd/rentgen (Gitea) +- **Mirror**: GitHub (issues not accepted there) +- **Issue Tracking**: Email kontakt@internet-czas-dzialac.pl +- **License**: GPL-3.0-or-later +- **Authors**: Kuba Orlik, Arkadiusz Wieczorek (Internet. Time to act! Foundation) + +## File Organization + +``` +rentgen/ +β”œβ”€β”€ background.ts # Extension entry point +β”œβ”€β”€ memory.ts # Central state manager (Memory singleton) +β”œβ”€β”€ extended-request.ts # HTTP request wrapper and data extraction +β”œβ”€β”€ request-cluster.ts # Request aggregation by domain +β”œβ”€β”€ stolen-data-entry.ts # Individual data point representation +β”œβ”€β”€ safer-emitter.ts # EventEmitter wrapper +β”œβ”€β”€ util.ts # Utility functions +β”œβ”€β”€ components/ +β”‚ β”œβ”€β”€ sidebar/ # Main extension UI +β”‚ β”œβ”€β”€ toolbar/ # Browser action popup +β”‚ └── report-window/ # Report generation workflow +β”œβ”€β”€ lib/ +β”‚ └── browser-api/ # Cross-browser API abstraction +β”œβ”€β”€ email-template-*.js # Polish email templates +β”œβ”€β”€ esbuild.config.js # Build configuration +β”œβ”€β”€ manifest.json # Extension manifest (V2) +└── assets/ # Icons and screenshots +``` diff --git a/Dockerfile b/Dockerfile new file mode 100644 index 0000000..78cfa5d --- /dev/null +++ b/Dockerfile @@ -0,0 +1,45 @@ +# Rentgen Browser Extension - Docker Build +# +# Usage: +# Build and extract artifacts directly: +# docker buildx build . --output artifacts +# +# Or traditional build (creates full development environment): +# docker build -t rentgen . +# docker run --rm rentgen ls -lh /app/web-ext-artifacts/ +# +# Run commands in the container: +# docker run --rm rentgen npm run build:chrome +# docker run --rm rentgen npm run typecheck + +# Build stage +FROM node:lts AS builder + +WORKDIR /app + +# Copy package files for dependency installation (better layer caching) +COPY package.json package-lock.json ./ + +# Install dependencies +RUN npm install + +# Copy source code (respecting .dockerignore) +COPY . . + +# Build the extension for Firefox (default) +RUN npm run build + +# Create the package +RUN npm run create-package + +# Artifacts stage - only contains the built artifacts (for --output) +FROM scratch AS artifacts + +# Copy only the built extension zip file to root +COPY --from=builder /app/web-ext-artifacts/*.zip / + +# Default stage - full development environment +FROM builder + +# Default command shows the built artifact +CMD ["ls", "-lh", "/app/web-ext-artifacts/"]