How this dashboard works

From raw legal text to the opportunity score, in plain English. Every number shown anywhere in this dashboard is traceable back through the steps below.

From official text to structured data

Five stages, each idempotent and reproducible.

01
Discovery. A curated regulatory matrix per jurisdiction is fed to a large language model with live web-search grounding, which identifies the statutes, regulations, circulars and guidance notes worth tracking. Output: 1,267 candidate norms across 23 jurisdictions.
02
Scrape. Each norm is fetched from the official source (national gazettes, regulator portals, statutory databases). When a primary source is unreachable — strict firewalls, login walls, JavaScript-rendered single pages — the pipeline falls back to archive snapshots or to a headless browser, with the chosen path logged per norm.
03
Translate. Non-English texts are auto-translated; the original language is preserved alongside so legal verification can quote either side.
04
Analyse. A reasoning LLM reads each body and extracts thirteen structured signals: regime, status_regulatorio, principal deadline + deadline type, seven exige_* service triggers (audit, proof-of-reserves, pentest, AML/KYT, custody, formal verification, independent certification), free-text escopo, and gap_ou_ambiguidade. A second deterministic pass — pure algorithm, no LLM — then re-validates every claim: substring presence of the quote in the body, imperative-verb test for triggers, temporal-anchor test for deadlines, regime-vocabulary test for regimes. Anything that fails the validator is stripped back to null. Phase 1 rule: every non-null value MUST come with a verbatim quote copied from the body. Better silence than a fabricated claim.
05
Aggregate & export. Per-country overviews are built from the underlying norms. Any-true on triggers, earliest non-past deadline, six text-grounded coverage dimensions (issuance / custody / market abuse / AML / taxation / consumer protection) that drive the maturity tag. A flat CSV plus a typed graph JSON are exported and consumed by this dashboard.

Jurisdictions

tracked end-to-end

Norms

1,267

451 fully analysed

Verified deadlines

body-grounded only

High-confidence markets

of 23, after Phase 1 rerun

How norms connect across borders

A regulatory text never lives alone. The graph captures nine ways one norm can relate to another.

Binding inheritance

53 edges

derivado_de

The source norm legally transposes the target. Created when the body cites the anchor by ID AND, after the Phase 1 rerun, when transposition verbs (“transposes”, “implementa”, “umsetzt”) appear in an evidence quote.

Soft inspiration

95 edges

inspirado_em

The source norm aligns with the target without being legally bound to it. Captures non-EU jurisdictions referencing MiCA, every jurisdiction referencing the FATF Recommendations, and similar.

Cross-reference

1,025 edges

referencia_cruzada

An explicit citation between two norms in the corpus, surfaced from the body text.

Applies to

143 edges

aplica_se_a

A jurisdiction-level overview pointing to its framework anchors.

Triggers service

0 edges

exige_servico

A norm requires a CertiK service offering. Driven by the seven boolean exige_* triggers.

Citation / Semantic

1,682 edges

citation / semantic

Background relations: literal citations found in the body and model-suggested similarity. Lower signal — toggle off in the graph view to focus on binding and soft inheritance.

Total edges: 2,999 across nine typed relations. See src/typed_relations.py for the derivation rules.

What the graph already reveals

The strongest signal in the dataset is which texts everyone else copies — and which only Europe copies.

Universal anchors

Norms cited by the largest number of other norms across the entire corpus. The FATF Recommendations dominate — every jurisdiction tracked follows them — and the EU's MiCA is the binding spine of the European cluster.

1INTL-FATFRECS-2025The FATF Recommendations: International Standards on Combating Money Laundering 161inlinks
2EU-MICA-2023Regulation (EU) 2023/1114 of the European Parliament and of the Council of 31 Ma84inlinks
3CH-AMLA-1997Swiss AMLA / GwG — Anti-Money Laundering Act — base AML/CFT law; exchanges, cust30inlinks
4EU-GDPR-2016Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 Apr28inlinks
5EU-5AMLD-2018Directive (EU) 2018/843 of the European Parliament and of the Council of 30 May 19inlinks
6CH-FINMASA-2007Federal Act on the Swiss Financial Market Supervisory Authority (Financial Marke18inlinks
7BR-LGPD-2018Brazilian General Data Protection Law (LGPD - Law 13,709/2018) — governs PSAV KY17inlinks
8CH-FINSA-2018Swiss FinSA / FIDLEG — Financial Services Act — conduct rules and prospectus req16inlinks

Binding transposition — only Europe goes here

The derivado_de relation requires legal transposition. Today every binding transposition is a European national text rewriting an EU regulation; non-EU jurisdictions use soft alignment instead.

EU-MICA-2023Regulation (EU) 2023/1114 of the European Parliament and of 30nat. laws
EU-GDPR-2016Regulation (EU) 2016/679 of the European Parliament and of t9nat. laws
EU-PROSPECTUSREG-2017Regulation (EU) 2017/1129 of the European Parliament and of 5nat. laws
EU-TFR-2023Regulation (EU) 2023/1113 of the European Parliament and of 5nat. laws
EU-DAC8-2023Council Directive (EU) 2023/2226 of 17 October 2023 amending4nat. laws

Soft influence — where the world looks

The inspirado_em relation: non-binding alignment. FATF covers everyone; MiCA's soft reach extends well beyond the EEA.

INTL-FATFRECS-2025The FATF Recommendations: International Standards on Combati79followers
EU-MICA-2023Regulation (EU) 2023/1114 of the European Parliament and of 10followers
INTL-FSBGSCRECS-2023High-level Recommendations for the Regulation, Supervision a5followers
US-HOWEY-1946Securities and Exchange Commission v. W. J. Howey Co., 328 U1followers

What makes a market mature

Maturity is not a structural heuristic. It's the count of regulatory dimensions actually addressed in the source text.

Six dimensions form the spine of a comprehensive crypto framework: token issuance, custody of client assets, market abuse, AML / KYT / Travel Rule, taxation, and consumer protection. For every analysed norm we detect which dimensions it touches — via boolean triggers plus a multilingual word-boundary keyword scan. A jurisdiction's maturity then ranks the union of dimensions across all its norms.

High

5+ of 6 dimensions

Medium

3-4 of 6 dimensions

Low

1-2 of 6 dimensions

The previous 40-norms / 3-regulators / anchor-before-2020 heuristic is removed; see src/coverage.py.

The opportunity score

A composite of urgency, service intensity, and maturity. Tunable in one file.

40% Urgency. Days to the next regulatory deadline. Past-due decays from 100 toward a 50 floor over 12 months. Missing deadline + known regime = 30. Missing deadline + unknown regime = 0.
40% Service intensity. Share of the 14 CertiK services triggered (audit, pentest, AML/KYT, proof of reserves, …).
20% Market maturity. The text-grounded six-dimension count from section 4.

Formula in web/lib/scoring.ts. Weight sensitivity has been swept; nine jurisdictions stay top-12 under every combination — see the calibration note below.

Deadline policy

A deadline is shown only when it can be quoted from the source text.

A retroactive audit (scripts/audit_deadlines.py) confirmed only 16 of 83 norm-level deadlines had body grounding; the remaining 67 — many of them publication dates, founding years or inferred end-of-period markers wrongly flagged as deadlines — were removed from the vault. After the Phase 1 rerun, deadlines re-appear if and only if the body quotes the date next to a temporal anchor ("by", "before", "até", "spätestens", "deadline", "in force") within 80 characters.

How much to trust each number

A separate orthogonal score, never used to gate the ranking — it sits next to it so the decision-maker can see what the data is built on.

For every jurisdiction we publish a 0-100 data-confidence score and a tier badge. Four orthogonal components: how much of the country's corpus is LLM-analysed (35%), how many of the six dimensions are covered (25%), how many distinct regulators contribute (15%), and the share of extracted fields backed by a Phase 1 verbatim quote (25%). The badge shown on the home table, the country profile and the recommended-moves cards reflects this score.

Calibration — gold set + F1 + drift gate

Every change to the extraction logic is measured against a hand-labelled ground truth.

The gold set is a stratified sample seeded with the current extraction; a human reviewer corrects each field and pastes a verbatim source quote, then flips reviewed: true. The comparator (scripts/gold.py report) reports per-field precision, recall and F1 against a saved baseline. CI fails any PR that drops a field by more than 5 percentage points.

Current measurement on a 13-row starter pack: support-weighted F1 = 0.80. The model is perfect on exige_certificacao_independente and escopo (1.00) and strong on regime (0.91). The two structural weaknesses are unchanged after the Phase 1 rerun: a bias toward TRUE on exige_seguranca_custodia (0.36) and exige_kyt_aml (0.50), where the source body uses descriptive prose rather than the imperative verbs the validator demands. Net effect for the decision-maker: those two triggers should be cross-checked manually before any commercial move; the other eleven fields are reliable.

Honest limits

What this dashboard is NOT.

Two jurisdictions (IN, SE) have only stub norms — the analysis pass returned no usable signals. They are shown for completeness, not for ranking.
competidores_ativos and forca_relacionamento_certik are intentionally empty — they require business-development input that the LLM cannot infer from public text.
A handful of national gazettes are behind aggressive firewalls and were fetched from official archive snapshots; their content can be slightly stale until the next refresh cycle.
The opportunity score is a heuristic — a high score is a candidate to investigate, not a closed sale.