How this dashboard works

From raw legal text to the opportunity score, in plain English. Every number shown anywhere in this dashboard is traceable back through the steps below.

1

From official text to structured data

Five stages, each idempotent and reproducible.

  1. 01
    Discovery. A curated regulatory matrix per jurisdiction is fed to a large language model with live web-search grounding, which identifies the statutes, regulations, circulars and guidance notes worth tracking. Output: 1,267 candidate norms across 23 jurisdictions.
  2. 02
    Scrape. Each norm is fetched from the official source (national gazettes, regulator portals, statutory databases). When a primary source is unreachable — strict firewalls, login walls, JavaScript-rendered single pages — the pipeline falls back to archive snapshots or to a headless browser, with the chosen path logged per norm.
  3. 03
    Translate. Non-English texts are auto-translated; the original language is preserved alongside so legal verification can quote either side.
  4. 04
    Analyse. A reasoning LLM reads each body and extracts thirteen structured signals: regime, status_regulatorio, principal deadline + deadline type, seven exige_* service triggers (audit, proof-of-reserves, pentest, AML/KYT, custody, formal verification, independent certification), free-text escopo, and gap_ou_ambiguidade. A second deterministic pass — pure algorithm, no LLM — then re-validates every claim: substring presence of the quote in the body, imperative-verb test for triggers, temporal-anchor test for deadlines, regime-vocabulary test for regimes. Anything that fails the validator is stripped back to null. Phase 1 rule: every non-null value MUST come with a verbatim quote copied from the body. Better silence than a fabricated claim.
  5. 05
    Aggregate & export. Per-country overviews are built from the underlying norms. Any-true on triggers, earliest non-past deadline, six text-grounded coverage dimensions (issuance / custody / market abuse / AML / taxation / consumer protection) that drive the maturity tag. A flat CSV plus a typed graph JSON are exported and consumed by this dashboard.
Jurisdictions
23
tracked end-to-end
Norms
1,267
451 fully analysed
Verified deadlines
2
body-grounded only
High-confidence markets
4
of 23, after Phase 1 rerun

2

How norms connect across borders

A regulatory text never lives alone. The graph captures nine ways one norm can relate to another.

Binding inheritance

53 edges
derivado_de

The source norm legally transposes the target. Created when the body cites the anchor by ID AND, after the Phase 1 rerun, when transposition verbs (“transposes”, “implementa”, “umsetzt”) appear in an evidence quote.

Soft inspiration

95 edges
inspirado_em

The source norm aligns with the target without being legally bound to it. Captures non-EU jurisdictions referencing MiCA, every jurisdiction referencing the FATF Recommendations, and similar.

Cross-reference

1,025 edges
referencia_cruzada

An explicit citation between two norms in the corpus, surfaced from the body text.

Applies to

143 edges
aplica_se_a

A jurisdiction-level overview pointing to its framework anchors.

Triggers service

0 edges
exige_servico

A norm requires a CertiK service offering. Driven by the seven boolean exige_* triggers.

Citation / Semantic

1,682 edges
citation / semantic

Background relations: literal citations found in the body and model-suggested similarity. Lower signal — toggle off in the graph view to focus on binding and soft inheritance.

Total edges: 2,999 across nine typed relations. See src/typed_relations.py for the derivation rules.


3

What the graph already reveals

The strongest signal in the dataset is which texts everyone else copies — and which only Europe copies.

Universal anchors

Norms cited by the largest number of other norms across the entire corpus. The FATF Recommendations dominate — every jurisdiction tracked follows them — and the EU's MiCA is the binding spine of the European cluster.

  1. 1INTL-FATFRECS-2025The FATF Recommendations: International Standards on Combating Money Laundering 161inlinks
  2. 2EU-MICA-2023Regulation (EU) 2023/1114 of the European Parliament and of the Council of 31 Ma84inlinks
  3. 3CH-AMLA-1997Swiss AMLA / GwG — Anti-Money Laundering Act — base AML/CFT law; exchanges, cust30inlinks
  4. 4EU-GDPR-2016Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 Apr28inlinks
  5. 5EU-5AMLD-2018Directive (EU) 2018/843 of the European Parliament and of the Council of 30 May 19inlinks
  6. 6CH-FINMASA-2007Federal Act on the Swiss Financial Market Supervisory Authority (Financial Marke18inlinks
  7. 7BR-LGPD-2018Brazilian General Data Protection Law (LGPD - Law 13,709/2018) — governs PSAV KY17inlinks
  8. 8CH-FINSA-2018Swiss FinSA / FIDLEG — Financial Services Act — conduct rules and prospectus req16inlinks

Binding transposition — only Europe goes here

The derivado_de relation requires legal transposition. Today every binding transposition is a European national text rewriting an EU regulation; non-EU jurisdictions use soft alignment instead.

  • EU-MICA-2023Regulation (EU) 2023/1114 of the European Parliament and of 30nat. laws
  • EU-GDPR-2016Regulation (EU) 2016/679 of the European Parliament and of t9nat. laws
  • EU-PROSPECTUSREG-2017Regulation (EU) 2017/1129 of the European Parliament and of 5nat. laws
  • EU-TFR-2023Regulation (EU) 2023/1113 of the European Parliament and of 5nat. laws
  • EU-DAC8-2023Council Directive (EU) 2023/2226 of 17 October 2023 amending4nat. laws

Soft influence — where the world looks

The inspirado_em relation: non-binding alignment. FATF covers everyone; MiCA's soft reach extends well beyond the EEA.

  • INTL-FATFRECS-2025The FATF Recommendations: International Standards on Combati79followers
  • EU-MICA-2023Regulation (EU) 2023/1114 of the European Parliament and of 10followers
  • INTL-FSBGSCRECS-2023High-level Recommendations for the Regulation, Supervision a5followers
  • US-HOWEY-1946Securities and Exchange Commission v. W. J. Howey Co., 328 U1followers

4

What makes a market mature

Maturity is not a structural heuristic. It's the count of regulatory dimensions actually addressed in the source text.

Six dimensions form the spine of a comprehensive crypto framework: token issuance, custody of client assets, market abuse, AML / KYT / Travel Rule, taxation, and consumer protection. For every analysed norm we detect which dimensions it touches — via boolean triggers plus a multilingual word-boundary keyword scan. A jurisdiction's maturity then ranks the union of dimensions across all its norms.

High
5+ of 6 dimensions
Medium
3-4 of 6 dimensions
Low
1-2 of 6 dimensions

The previous 40-norms / 3-regulators / anchor-before-2020 heuristic is removed; see src/coverage.py.


5

The opportunity score

A composite of urgency, service intensity, and maturity. Tunable in one file.

  • 40% Urgency. Days to the next regulatory deadline. Past-due decays from 100 toward a 50 floor over 12 months. Missing deadline + known regime = 30. Missing deadline + unknown regime = 0.
  • 40% Service intensity. Share of the 14 CertiK services triggered (audit, pentest, AML/KYT, proof of reserves, …).
  • 20% Market maturity. The text-grounded six-dimension count from section 4.

Formula in web/lib/scoring.ts. Weight sensitivity has been swept; nine jurisdictions stay top-12 under every combination — see the calibration note below.


6

Deadline policy

A deadline is shown only when it can be quoted from the source text.

A retroactive audit (scripts/audit_deadlines.py) confirmed only 16 of 83 norm-level deadlines had body grounding; the remaining 67 — many of them publication dates, founding years or inferred end-of-period markers wrongly flagged as deadlines — were removed from the vault. After the Phase 1 rerun, deadlines re-appear if and only if the body quotes the date next to a temporal anchor ("by", "before", "até", "spätestens", "deadline", "in force") within 80 characters.


7

How much to trust each number

A separate orthogonal score, never used to gate the ranking — it sits next to it so the decision-maker can see what the data is built on.

For every jurisdiction we publish a 0-100 data-confidence score and a tier badge. Four orthogonal components: how much of the country's corpus is LLM-analysed (35%), how many of the six dimensions are covered (25%), how many distinct regulators contribute (15%), and the share of extracted fields backed by a Phase 1 verbatim quote (25%). The badge shown on the home table, the country profile and the recommended-moves cards reflects this score.


8

Calibration — gold set + F1 + drift gate

Every change to the extraction logic is measured against a hand-labelled ground truth.

The gold set is a stratified sample seeded with the current extraction; a human reviewer corrects each field and pastes a verbatim source quote, then flips reviewed: true. The comparator (scripts/gold.py report) reports per-field precision, recall and F1 against a saved baseline. CI fails any PR that drops a field by more than 5 percentage points.

Current measurement on a 13-row starter pack: support-weighted F1 = 0.80. The model is perfect on exige_certificacao_independente and escopo (1.00) and strong on regime (0.91). The two structural weaknesses are unchanged after the Phase 1 rerun: a bias toward TRUE on exige_seguranca_custodia (0.36) and exige_kyt_aml (0.50), where the source body uses descriptive prose rather than the imperative verbs the validator demands. Net effect for the decision-maker: those two triggers should be cross-checked manually before any commercial move; the other eleven fields are reliable.


9

Honest limits

What this dashboard is NOT.

  • Two jurisdictions (IN, SE) have only stub norms — the analysis pass returned no usable signals. They are shown for completeness, not for ranking.
  • competidores_ativos and forca_relacionamento_certik are intentionally empty — they require business-development input that the LLM cannot infer from public text.
  • A handful of national gazettes are behind aggressive firewalls and were fetched from official archive snapshots; their content can be slightly stale until the next refresh cycle.
  • The opportunity score is a heuristic — a high score is a candidate to investigate, not a closed sale.