Methodology
How this data is collected, processed, and presented
This document describes the technical pipeline that powers Cooked. Every figure displayed on this site traces back to a federal public record. Where we compute derived metrics, the method, formula, and limitations are documented below.
Important: Statistical patterns are not evidence of wrongdoing. A campaign finance anomaly may reflect legitimate fundraising strategy, data collection artifacts, or reporting timing. Correlation between lobbying activity and voting records does not establish causation.
1. Data Sources
All data originates from official federal disclosure systems. No private datasets, scraped content, or third-party aggregators are used. The following APIs and datasets are queried directly:
Federal Election Commission (FEC)
Endpoint: api.open.fec.gov/v1
Individual contributions (Schedule A) and independent expenditures (Schedule E) for all federal candidates. Data covers House, Senate, and Presidential races. FEC filings are reported by committees and released on a rolling basis. Negative amounts (refunds, redesignations) are clamped to zero during ingestion.
Congress.gov & Senate.gov
Endpoints: api.congress.gov/v3, Senate XML roll-call feed
Roll-call vote records for the U.S. House (via Congress.gov API) and U.S. Senate (via the official XML feed). Each vote row records the member, position (Yea/Nay/Not Voting), question text, and result. Vote data is available for the 117th Congress onward.
Lobbying Disclosure Act (LDA) API
Endpoint: lda.senate.gov/api/v1
LD-2 quarterly lobbying filings and LD-203 semi-annual PAC contribution reports filed by registered lobbying firms. LD-2 filings include client name, registrant (lobbying firm), issue codes, and free-text descriptions that often reference specific bills. LD-203 filings report contributions from lobbying firm PACs to federal candidates.
GovInfo BILLSTATUS
Endpoint: api.govinfo.gov
Official bill metadata from the Government Publishing Office. Used to resolve bill numbers referenced in vote questions and lobbying descriptions into titles, policy areas, and latest actions. Bill references in raw data are matched using a normalized type-number-congress key.
congress-legislators Dataset
Source: github.com/unitedstates/congress-legislators
Open-source directory mapping every legislator to their official identifiers: bioguide ID, FEC candidate ID, LIS ID, THOMAS ID, and ICPSR ID. This crosswalk is used to link FEC contribution records to Congressional vote records for the same person. The dataset is periodically snapshot-versioned in the database to support incremental matching improvements.
Voteview (UCLA)
Source: voteview.com
Academic dataset of roll-call voting records maintained by UCLA political scientists. Used as a secondary validation source to cross-check member alignment and vote record consistency against the official Congress.gov data. Not used for primary display; diagnostic only.
2. Ingestion Pipeline
Source data is fetched through a series of specialized workers, each targeting one upstream API. Workers run on a recurring schedule and store records in dedicated tables per source.
Processing Order
The pipeline runs in a fixed order to respect data dependencies:
- 1.Identity — Fetch the congress-legislators dataset, build crosswalks linking bioguide IDs to FEC candidate IDs.
- 2.Congress — Ingest recent House and Senate roll-call votes.
- 3.FEC — Fetch contributions and independent expenditures for all federal candidates, using crosswalk IDs to match candidates.
- 4.GovInfo — Fetch bill metadata for bills referenced in votes and lobbying filings.
- 5.Lobbying (LD-2) — Fetch quarterly disclosure filings from the LDA API.
- 6.LD-203 — Fetch PAC contribution reports from lobbying registrants.
- 7.Normalize — Rebuild the derived graph from all raw tables (see Section 3).
Data Quality Controls
- Contribution amounts are validated as finite numbers; non-numeric values are discarded.
- Dates are validated against ISO 8601 format; malformed dates are rejected at ingest time.
- Candidate matching uses crosswalk-backed identifiers where available, falling back to name-based search with state and office constraints.
3. Normalization & Graph Construction
Raw records from different sources use different identifier systems. A contributor in FEC data has no inherent link to a lobbying client in LDA data or a member in Congressional records. The normalization step reads all source records and builds a unified directed graph linking people, organizations, bills, votes, and issue areas.
Entity Resolution
Each entity (politician, contributor, committee, lobby client, lobbying firm, bill, issue area) is identified by a canonical key from its source system. For politicians, the bioguide ID serves as the primary key. For contributors, a composite of normalized name, employer, and occupation is used. For lobby clients, the LDA-assigned client ID is canonical. This approach ensures that the same real-world entity is consistently represented across data sources.
Edge Construction
The graph currently defines 12 active directed edge kinds. Each edge carries metadata (accumulated dollar amounts, vote positions, filing references) and links to evidence records that trace back to specific raw table rows with source record IDs.
| Edge | From | To | Source |
|---|---|---|---|
| Contribution | Individual | Committee | FEC Sched. A |
| Committee receipt | Committee | Politician | FEC Sched. A |
| Outside spending | PAC/Org | Politician | FEC Sched. E |
| Vote cast | Politician | Roll call | Congress.gov |
| Vote on bill | Roll call | Bill | GovInfo |
| Lobbying on issue | Lobby client | Issue area | LDA LD-2 |
| Lobbying on bill | Lobby client | Bill | LDA LD-2 |
| Lobby-politician link | Lobby client | Politician | LD-2 + vote join |
| Registrant-client | Lobbying firm | Client org | LDA LD-2 |
| Registrant PAC | Lobbying firm | Politician | LDA LD-203 |
| Bill policy area | Bill | Issue area | GovInfo |
4. Statistical Signals
After graph construction, a set of signal builders analyze each politician's subgraph and emit structured findings. Each signal carries a confidence level (low, medium, high), a numeric score, a human-readable summary, and evidence references linking back to source records.
The system currently supports 23 signal kinds. A given refresh may materialize fewer kinds depending on source coverage and whether any politician crosses the minimum threshold. Each supported signal is described below with its computation method and the minimum condition for the signal to appear. Signals are also assigned a confidence tier (low, medium, high) based on the magnitude of the finding.
5. Display Conventions
Name Formatting
FEC data stores individual names in “LAST, FIRST MIDDLE” format (all caps). During normalization, names are parsed and reordered to “First Last” using a title-case formatter. Organization names (PACs, lobbying clients, registrants) are title-cased without reordering. A suffix detection step prevents misinterpreting organization commas (e.g., “Brooklyn College Foundation, Inc.”) as personal name separators.
Dollar Amounts
Amounts above $1M are shown with one decimal (e.g., $2.4M); amounts above $1K are shown in thousands (e.g., $43K); smaller amounts are displayed as whole dollars.
Confidence Levels
Each signal is assigned a confidence level based on the strength and quantity of supporting evidence. “High” typically requires multiple corroborating data points above the threshold. “Medium” reflects a single strong signal or multiple weaker ones. “Low” is used for findings that meet the minimum threshold but lack strong corroboration.
Peer Comparisons
When the interface shows a ratio relative to peers (e.g., “7.6x peer avg”), this is the actual quotient of the politician's value divided by the peer group mean. Peer groups are defined by office (House or Senate) and party affiliation. A minimum group size of 5 is required. The leave-one-out method is used: the politician being evaluated is excluded from the peer statistics to avoid self-contamination.
6. Known Limitations
Incomplete contribution coverage
FEC Schedule A data is reported on a rolling basis. The database contains the most recent available filings but does not represent a complete historical record for every candidate. Concentration metrics (HHI, top donor share) are computed from imported rows only and may differ from figures derived from the full FEC dataset.
Vote coverage begins at the 117th Congress
Roll-call vote records are available from the 117th Congress (2021) onward. Politicians who served before this period will not have vote data, and party alignment metrics will reflect only recent legislative sessions.
Lobbying-to-politician links are indirect
LD-2 lobbying filings disclose which bills and issues a client lobbied on, but not which specific legislators were contacted. The link between a lobbying client and a politician is inferred by finding politicians who voted on bills referenced in the client's filings. This means the connection reflects shared legislative activity, not direct lobbying contact.
Name-based matching has limits
When crosswalk-backed identifiers are unavailable, candidate matching falls back to name similarity with state and office constraints. This can produce incorrect matches for common names. A nickname resolution table handles common variants (e.g., “Ted” matches both “Edward” and “Theodore”), but unusual name spellings may still cause missed matches.
Independent politicians require confirmed caucus affiliation
For peer-anomaly grouping, independent politicians (e.g., Sen. Sanders, Sen. King) are mapped to their caucus party when that affiliation is confirmed in metadata. Independents without a confirmed caucus are excluded from peer comparisons rather than assigned to a default group.
Lobby-donate matching uses employer name heuristics
The lobby-donate overlap signal matches lobbying clients to contributor employers by normalized business name, stripping common suffixes (LLC, Inc., Corp, etc.). This fuzzy matching can produce false positives when two distinct organizations share a normalized name, or false negatives when the same organization uses substantially different names across FEC and LDA filings.
7. Evidence Traceability
Every signal and edge in the system carries evidence references that link back to specific records in the raw source tables. Each evidence reference includes the source table name, the record's unique identifier, and a human-readable label. On politician profile pages, evidence links resolve to the original filing on FEC.gov, Congress.gov, or lda.senate.gov.
The full evidence chain for any politician can be exported as a structured document from the profile page via the “Export evidence” link.
This methodology is current as of March 2026. The pipeline, signal definitions, and thresholds may be updated as new data sources are integrated or existing computations are refined. Material changes will be reflected on this page.