An interactive visualization of publicly released emails connected to Jeffrey Epstein
This project is an interactive visualization of publicly released email records connected to Jeffrey Epstein. It maps communication patterns across the archive, including group conversations and one-to-one exchanges.
The goal is to make a large body of material more navigable while preserving its relational structure. The visualization emphasizes structural clarity rather than interpretation. It does not draw conclusions, and appearance in the dataset does not imply wrongdoing.
The archive contains correspondence among lawyers, journalists, assistants, financial advisors, and other professionals. Many of these interactions are routine. The visualization presents all of them without editorial filtering.
The source material is drawn from two public releases:
| Metric | Count |
|---|---|
| Email threads processed | 22,707 |
| Individual messages | 39,987 |
| People tracked | 685 |
| Communication environments | 27 |
The dataset represents a subset of the total DOJ EFTA releases. Not all available datasets have been ingested yet, and the archive may grow as additional documents are processed. When viewing individual threads, the visualization provides source attribution with links to original documents.
Does appearing in this visualization imply wrongdoing?
No. Many people in this archive are lawyers, journalists, assistants, and other professionals who interacted with Epstein's orbit for entirely legitimate reasons. Presence in the dataset reflects only that a person's name appears in the released email records.
What are the "rooms"?
Rooms are groups of people who frequently appeared together in the same email conversations. They are generated algorithmically, not by editorial choice. A room represents a communication pattern, not a physical location or organizational unit.
What does "Talked About" mean?
The visualization distinguishes between people who directly sent or received emails (participants) and people whose names appear in email body text but who were not on the message (mentions). Being mentioned is not the same as being part of a conversation.
What do the thread badges mean?
Threads may carry badges: "Notable" for threads flagged by the scoring system, "Contradiction" for threads containing evidence that may conflict with someone's public claims, and "Revealing" for threads assessed as containing sensitive information. These are analytical markers, not editorial judgments.
How current is this data?
The dataset reflects a subset of documents available as of early 2026. Additional DOJ datasets are being processed over time. The visualization will be updated as new material is ingested.
Source documents are PDF files from the DOJ. Many are scanned images, so text is extracted using OCR where needed. The system then identifies email messages within each document and resolves sender/recipient identities — merging name variants, nicknames, and email addresses into single identities where possible.
The "rooms" are generated automatically through community detection: when the same group of people appears together across multiple email threads, the algorithm clusters them. These clusters are not hand-picked — they emerge from communication patterns in the data.
Individuals are flagged as "notable" through automated scoring based on Wikipedia presence, inferred role, mention frequency, and email volume. The system also evaluates threads for significance and checks for contradictions with participants' public statements.
AI disclosure. AI (Claude, by Anthropic) assists with entity extraction, name deduplication, role inference, topic labeling, and pattern detection. All AI outputs are treated as provisional.
Built with D3.js using SVG and HTML Canvas. All processing is client-side — the application loads pre-computed JSON files and renders them in the browser.
Entity resolution. Names are normalized via a curated alias table, heuristic nickname matching, and AI-assisted deduplication. Non-person entities are filtered out.
Community detection. Clusters are identified using greedy modularity optimization on a co-presence graph, with edge weights reflecting interaction type (direct exchanges weighted highest).
Pipeline. Two-pass approach: an initial snapshot bootstraps the people index, enrichment scripts add Wikipedia data and AI analysis, then a second pass incorporates all signals for final scoring and community structure.