Transforming how the world consumes regulatory information  


RegGenome is a University of Cambridge commercial spin-out

Our vision is to transform how the world consumes regulatory information. We provide machine-readable regulatory content that is dynamic, granular, and interoperable, all powered by AI-based textual information extraction techniques. This enables regulatory authorities to increase accessibility and dissemination of regulatory information and empowers organisations to deepen their regulatory intelligence and digitise their compliance and risk management processes.

RegGenome content is vendor-neutral, so that it can plug into any application or system, enabling the development of an efficient ecosystem of providers and users – all powered by and speaking the same language.

RegGenome together with Cambridge Judge Business School are founding members of the Regulatory Genome Project

What is RegGenome?

A machine-readable structured content repository

The Challenge

The world’s financial systems could spiral out of control without regulation. Each financial crisis brings more checks and balances to protect us all, yet keeping track has become a major challenge – for both the regulated, and the regulators.

What if we take the rules in place for one sector, then multiply that by markets, countries and themes?  Then allow for constant advances in crypto, cyber, AI and sustainability, factor in socio-economic factors and filter through geopolitical lenses? The task is becoming unmanageable. 

Keeping up with regulations and assessing the risks involves teams of people usually working with legacy infrastructure and proprietary applications – often duplicated from one function to another, and jurisdiction to jurisdiction. Meanwhile the barriers to entry for new specialist application providers keep rising, as do the risks of another financial crisis. 

This huge hive mind of regulation is critical to the global economy, but no-one has seen a clear path through it. Until now.

The Idea

What if we could identify and decode the DNA of regulation across the globe? What if the world’s regulatory content was translated to a common language and used to power a dynamic and effective compliance ecosystem? 

Just as the sequencing of the human genome transformed our understanding of biology, sequencing and organizing global regulation would transform our understanding of how to manage the risks, and respond.

With consistent classification every piece of regulation could be tagged and made machine-readable. A thematic query could reveal actionable information you need, in every relevant market. A global anti-money laundering (AML) requirement that took months to cascade might now take days. 

And the efficiency gains of harnessing machines to do the reading, allowing humans to do the thinking, would be immense. 

The Solution

RegGenome is built on the licence the company has to develop and bring to market technologies emerging from the Regulatory Genome Project. That is, developing a content repository according to the University of Cambridge Regulatory Genome information structure that can be integrated  into any third-party system, and subsequently developed into products specific to individual businesses, sectors, and markets.

Our goal is to develop the world’s golden source for regulatory content. By translating human-readable regulations into machine readable content, to supply best-in-class data that power best-in-class applications.

The Regulatory Genome Project is a multi-year project launched by the University of Cambridge in 2020 to develop and support the adoption of an open-standard framework for classifying and organizing regulatory content, creating a community of members across regulatory agencies, financial services organizations, and academic researchers.

We offer opportunities for the industry to engage in commercial innovation with the Project. Find our more about the value for key stakeholders by reading the article below.

Financial Services Organizations

Application Providers

Regulators & Compliance Professionals


RegGenome is the repository of all actionable regulatory content, developed on top of the Cambridge Regulatory Genome Open Information Structure taxonomies with additional classifiers and data to meet market demand.  

The content structure uses a pre-public release version of the Cambridge Regulatory Genome root taxonomies . 

Interoperable by design:

allowing firms to ‘Map once’ instead of ‘Mapping many’ reducing the need to incorporate multiple versions of content representations with each application onboarded

img sample icon grid 02


covering emerging regulatory themes spanning the long tail of regulatory jurisdictions

img sample icon grid 03


able to tailor to a firm’s internal risk framework and continuously refined by data-driven network effects

Dynamic and accurate:

highly automated capture of content through deep tech processing

The content is machine-readable and suitable for detailed searches of 40 different  document types,  including enforcement proceedings and enabling parameter, keyword and similarity searches at document and paragraph level as well as the ability to filter for actionable content related to Policies, Systems and Controls, or Core Normative Statements.

Today, the RegGenome covers 11 themes: Anti-money laundering (AML), Payments, Cybersecurity, Data Protection, Environment, Social, and Governance (ESG), Crypto Issuance, Crypto Intermediaries, Shareholding Disclosures, Board Accountability, Peer2Peer, and Equity Crowd funding, across more than 150 jurisdictions. 

Industry Engagement

Empowering regulatory innovation

There are multiple ways for industry organizations to engage with RegGenome content and the underlying framework with the objective of obtaining deeper regulatory intelligence and enabling the digitisation of compliance and risk management processes.

Direct Engagement with Regulated Institutions and their vendors to provide machine sequenced regulatory content for the purpose of regulatory intelligence and their risk and compliance systems and processes. 

We engage Application Providers as partners whose solutions would benefit from incorporating RegGenome content, and help them develop best-in-class applications. As part of the partnership, RegGenome also certifies applications as being ‘RegGenome Compliant’ and supports them through a network program.  The CRG’s jurisdiction-agnostic taxonomy structure is the output of multiple cycles of iteration and validation by both regulators and regulated firms; certification signifies to the market that the application accurately represents this data structure, giving a competitive advantage to recipients. Find out more here.

We convene Industry Special Interest Groups (SIGs) of professionals to collaborate on issues relating to regulatory change for a given industry vertical.  SIGs are focused on providing tangible outputs, using RegGenome content to help participants with market visibility and risk mitigation for areas of common interest. Example outputs include peer-group benchmarking, special gap analysis reports, and common use case identification for further development. Find out more about SIGs here.

Sponsors join a rapidly growing collaboration network of regulatory agencies, companies and academic researchers and become part of a community with a shared mission to develop and support the adoption of an open standard framework for classifying regulatory content. Convening industry communities is key in responding to market challenges and driving adoption. 

Find out more about becoming a Sponsor of the Regulatory Genome Project

How RegGenome Works

RegGenome structures regulation into a powerful code

RegGenome content services are not bundled with any regulatory applications. We enable the Application Provider (RegTech) ecosystem by providing content that is interoperable by design, making it easier for institutions to integrate best-in-class applications and mitigate vendor lock-in. Firms no longer need to develop or subscribe to multiple, proprietary, vertical applications. They can just plug into a single source of content, then steer in any direction – with interoperability between systems.
Machine-readable documents are produced by trained jurisdiction-agnostic classifiers, optimised, to allow like for like comparison across regulatory frameworks, and powered by human tagged training data from a diverse set of regulatory content.


RegGenome’s repository consists of documents from hundreds of regulatory sources, collected automatically from regulators’ websites and document portals. AI-based models are applied to the documents to enrich them with metadata such as title, publisher, publication date, and version.

Documents are broken down into ‘snippets’ of text – e.g. paragraphs, clauses, or bullet point lists – allowing applications to make granular queries on the repository.

RegGenome uses Natural Language Processing to identify whether snippets of text in the repository are related to specific concepts covered by the Cambridge Regulatory Genome (CRG). This process is based on real world examples collected by RegGenome’s team of regulatory analysts, who use the sample data to ‘train’ statistical models. These models are applied to each snippet in the repository, applying ‘tags’ when a match is found with a concept in the CRG. Tags are applied at both the document and snippet level.

Once ‘tagged’ according to the Cambridge Regulatory Genome taxonomies, the content is offered for integration into other products via RegGenome’s suite of APIs.