How this dashboard works
From raw legal text to the opportunity score, in plain English. Every number shown anywhere in this dashboard is traceable back through the steps below.
From official text to structured data
Five stages, each idempotent and reproducible.
- 01Discovery. A curated regulatory matrix per jurisdiction is fed to a large language model with live web-search grounding, which identifies the statutes, regulations, circulars and guidance notes worth tracking. Output: 1,267 candidate norms across 23 jurisdictions.
- 02Scrape. Each norm is fetched from the official source (national gazettes, regulator portals, statutory databases). When a primary source is unreachable — strict firewalls, login walls, JavaScript-rendered single pages — the pipeline falls back to archive snapshots or to a headless browser, with the chosen path logged per norm.
- 03Translate. Non-English texts are auto-translated; the original language is preserved alongside so legal verification can quote either side.
- 04Analyse. A reasoning LLM reads each body and extracts thirteen structured signals:
regime,status_regulatorio, principal deadline + deadline type, sevenexige_*service triggers (audit, proof-of-reserves, pentest, AML/KYT, custody, formal verification, independent certification), free-textescopo, andgap_ou_ambiguidade. A second deterministic pass — pure algorithm, no LLM — then re-validates every claim: substring presence of the quote in the body, imperative-verb test for triggers, temporal-anchor test for deadlines, regime-vocabulary test for regimes. Anything that fails the validator is stripped back to null. Phase 1 rule: every non-null value MUST come with a verbatim quote copied from the body. Better silence than a fabricated claim. - 05Aggregate & export. Per-country overviews are built from the underlying norms. Any-true on triggers, earliest non-past deadline, six text-grounded coverage dimensions (issuance / custody / market abuse / AML / taxation / consumer protection) that drive the maturity tag. A flat CSV plus a typed graph JSON are exported and consumed by this dashboard.
How norms connect across borders
A regulatory text never lives alone. The graph captures nine ways one norm can relate to another.
Binding inheritance
53 edgesThe source norm legally transposes the target. Created when the body cites the anchor by ID AND, after the Phase 1 rerun, when transposition verbs (“transposes”, “implementa”, “umsetzt”) appear in an evidence quote.
Soft inspiration
95 edgesThe source norm aligns with the target without being legally bound to it. Captures non-EU jurisdictions referencing MiCA, every jurisdiction referencing the FATF Recommendations, and similar.
Cross-reference
1,025 edgesAn explicit citation between two norms in the corpus, surfaced from the body text.
Applies to
143 edgesA jurisdiction-level overview pointing to its framework anchors.
Triggers service
0 edgesA norm requires a CertiK service offering. Driven by the seven boolean exige_* triggers.
Citation / Semantic
1,682 edgesBackground relations: literal citations found in the body and model-suggested similarity. Lower signal — toggle off in the graph view to focus on binding and soft inheritance.
Total edges: 2,999 across nine typed relations. See src/typed_relations.py for the derivation rules.
What the graph already reveals
The strongest signal in the dataset is which texts everyone else copies — and which only Europe copies.
Universal anchors
Norms cited by the largest number of other norms across the entire corpus. The FATF Recommendations dominate — every jurisdiction tracked follows them — and the EU's MiCA is the binding spine of the European cluster.
- 1INTL-FATFRECS-2025The FATF Recommendations: International Standards on Combating Money Laundering 161inlinks
- 2EU-MICA-2023Regulation (EU) 2023/1114 of the European Parliament and of the Council of 31 Ma84inlinks
- 3CH-AMLA-1997Swiss AMLA / GwG — Anti-Money Laundering Act — base AML/CFT law; exchanges, cust30inlinks
- 4EU-GDPR-2016Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 Apr28inlinks
- 5EU-5AMLD-2018Directive (EU) 2018/843 of the European Parliament and of the Council of 30 May 19inlinks
- 6CH-FINMASA-2007Federal Act on the Swiss Financial Market Supervisory Authority (Financial Marke18inlinks
- 7BR-LGPD-2018Brazilian General Data Protection Law (LGPD - Law 13,709/2018) — governs PSAV KY17inlinks
- 8CH-FINSA-2018Swiss FinSA / FIDLEG — Financial Services Act — conduct rules and prospectus req16inlinks
Binding transposition — only Europe goes here
The derivado_de relation requires legal transposition. Today every binding transposition is a European national text rewriting an EU regulation; non-EU jurisdictions use soft alignment instead.
- EU-MICA-2023Regulation (EU) 2023/1114 of the European Parliament and of 30nat. laws
- EU-GDPR-2016Regulation (EU) 2016/679 of the European Parliament and of t9nat. laws
- EU-PROSPECTUSREG-2017Regulation (EU) 2017/1129 of the European Parliament and of 5nat. laws
- EU-TFR-2023Regulation (EU) 2023/1113 of the European Parliament and of 5nat. laws
- EU-DAC8-2023Council Directive (EU) 2023/2226 of 17 October 2023 amending4nat. laws
Soft influence — where the world looks
The inspirado_em relation: non-binding alignment. FATF covers everyone; MiCA's soft reach extends well beyond the EEA.
- INTL-FATFRECS-2025The FATF Recommendations: International Standards on Combati79followers
- EU-MICA-2023Regulation (EU) 2023/1114 of the European Parliament and of 10followers
- INTL-FSBGSCRECS-2023High-level Recommendations for the Regulation, Supervision a5followers
- US-HOWEY-1946Securities and Exchange Commission v. W. J. Howey Co., 328 U1followers
What makes a market mature
Maturity is not a structural heuristic. It's the count of regulatory dimensions actually addressed in the source text.
Six dimensions form the spine of a comprehensive crypto framework: token issuance, custody of client assets, market abuse, AML / KYT / Travel Rule, taxation, and consumer protection. For every analysed norm we detect which dimensions it touches — via boolean triggers plus a multilingual word-boundary keyword scan. A jurisdiction's maturity then ranks the union of dimensions across all its norms.
The previous 40-norms / 3-regulators / anchor-before-2020 heuristic is removed; see src/coverage.py.
The opportunity score
A composite of urgency, service intensity, and maturity. Tunable in one file.
- 40% Urgency. Days to the next regulatory deadline. Past-due decays from 100 toward a 50 floor over 12 months. Missing deadline + known regime = 30. Missing deadline + unknown regime = 0.
- 40% Service intensity. Share of the 14 CertiK services triggered (audit, pentest, AML/KYT, proof of reserves, …).
- 20% Market maturity. The text-grounded six-dimension count from section 4.
Formula in web/lib/scoring.ts. Weight sensitivity has been swept; nine jurisdictions stay top-12 under every combination — see the calibration note below.
Deadline policy
A deadline is shown only when it can be quoted from the source text.
A retroactive audit (scripts/audit_deadlines.py) confirmed only 16 of 83 norm-level deadlines had body grounding; the remaining 67 — many of them publication dates, founding years or inferred end-of-period markers wrongly flagged as deadlines — were removed from the vault. After the Phase 1 rerun, deadlines re-appear if and only if the body quotes the date next to a temporal anchor ("by", "before", "até", "spätestens", "deadline", "in force") within 80 characters.
How much to trust each number
A separate orthogonal score, never used to gate the ranking — it sits next to it so the decision-maker can see what the data is built on.
For every jurisdiction we publish a 0-100 data-confidence score and a tier badge. Four orthogonal components: how much of the country's corpus is LLM-analysed (35%), how many of the six dimensions are covered (25%), how many distinct regulators contribute (15%), and the share of extracted fields backed by a Phase 1 verbatim quote (25%). The badge shown on the home table, the country profile and the recommended-moves cards reflects this score.
Calibration — gold set + F1 + drift gate
Every change to the extraction logic is measured against a hand-labelled ground truth.
The gold set is a stratified sample seeded with the current extraction; a human reviewer corrects each field and pastes a verbatim source quote, then flips reviewed: true. The comparator (scripts/gold.py report) reports per-field precision, recall and F1 against a saved baseline. CI fails any PR that drops a field by more than 5 percentage points.
Current measurement on a 13-row starter pack: support-weighted F1 = 0.80. The model is perfect on exige_certificacao_independente and escopo (1.00) and strong on regime (0.91). The two structural weaknesses are unchanged after the Phase 1 rerun: a bias toward TRUE on exige_seguranca_custodia (0.36) and exige_kyt_aml (0.50), where the source body uses descriptive prose rather than the imperative verbs the validator demands. Net effect for the decision-maker: those two triggers should be cross-checked manually before any commercial move; the other eleven fields are reliable.
Honest limits
What this dashboard is NOT.
- Two jurisdictions (
IN,SE) have only stub norms — the analysis pass returned no usable signals. They are shown for completeness, not for ranking. competidores_ativosandforca_relacionamento_certikare intentionally empty — they require business-development input that the LLM cannot infer from public text.- A handful of national gazettes are behind aggressive firewalls and were fetched from official archive snapshots; their content can be slightly stale until the next refresh cycle.
- The opportunity score is a heuristic — a high score is a candidate to investigate, not a closed sale.