Substra

Substra enables secure, decentralized machine learning across data silos using federated learning and blockchain.

Substra is an open-source framework that enables privacy-preserving, decentralized machine learning through federated learning techniques. Built to support sensitive and distributed data environments, Substra allows multiple organizations to collaboratively train AI models without having to share their data directly.

The framework is particularly useful in regulated sectors such as healthcare, finance, and public research, where data privacy and compliance are critical. Substra’s architecture combines federated learning with blockchain-based traceability to ensure transparency, auditability, and trust among all participants.

Developed by Owkin and maintained by the Substra Foundation, Substra is used in production by hospitals, research institutes, and pharmaceutical companies to accelerate machine learning on sensitive data without compromising data ownership or security.


Features
Substra includes a range of features that enable secure, collaborative AI across organizational boundaries.

Federated Learning
Substra enables decentralized model training by running learning tasks locally on each data holder’s infrastructure. The model learns from each dataset without the data leaving its secure environment.

Privacy by Design
Since raw data never moves, Substra helps organizations comply with data protection regulations like GDPR, HIPAA, and similar privacy standards.

Blockchain-Based Audit Trail
All actions—such as training, model updates, and data access—are logged using a permissioned blockchain ledger for full traceability and trust.

Modular Architecture
Substra separates roles for data providers, algorithm developers, and orchestrators. Each party maintains control over their assets while contributing to model development.

Support for Multiple ML Frameworks
Compatible with major machine learning libraries such as PyTorch, TensorFlow, and Scikit-learn.

Docker-Based Execution
Training and evaluation tasks run in isolated Docker containers, which simplifies deployment and enhances security.

Compute Plan Management
Users define workflows for training, evaluation, and aggregation using compute plans, allowing for fully customizable and repeatable experiments.

Multi-Stakeholder Collaboration
Multiple institutions can participate in a federated learning project while keeping data local and confidential.

Access Control
Granular permissions and identities ensure that only authorized users and systems can access specific components of the platform.

Open Source
Available under the Apache 2.0 license, Substra can be deployed, audited, and extended by any organization.


How It Works
Substra operates using a decentralized computing model, where multiple parties collaborate to train a machine learning model without ever sharing their raw data.

Each participant installs Substra components locally, including a Node, a Backend, and optionally a Frontend UI. The data remains within each organization’s secured environment. Training algorithms are packaged in Docker containers and executed locally against that data.

Model updates are then aggregated across all participants, either sequentially or in parallel, depending on the defined compute plan. Throughout the process, all transactions—including training runs, performance evaluations, and updates—are recorded on a blockchain-based ledger to ensure transparency.

Substra supports two modes of learning:

  • Cross-silo federated learning (between institutions)

  • Cross-device federated learning (across multiple machines)

The result is a robust, privacy-preserving AI training process where collaboration happens without compromising data control or security.


Use Cases
Substra is ideal for sectors and applications where data sensitivity and collaboration are both essential.

Healthcare and Medical Research
Hospitals and research institutes use Substra to jointly train predictive models on patient data such as imaging, genomics, and EHRs without sharing sensitive health records.

Pharmaceutical Research
Drug developers can partner with healthcare providers to build models for treatment response, trial design, and biomarker discovery while respecting data sovereignty.

Financial Services
Banks and insurers can use federated learning to build fraud detection, credit scoring, and risk models without pooling customer data in a centralized system.

Public Sector and Government
Agencies can collaborate on public health, transportation, or education data analytics projects while maintaining citizen data privacy.

Academic Collaborations
Universities and research labs can share algorithm development and model training across institutions without violating data usage agreements.

AI Consortia
Multi-party collaborations between industry players, startups, and institutions can use Substra as a shared platform for secure model development.

Legal and Compliance Analytics
Organizations handling regulated documents and communications can perform NLP-based analysis without exposing confidential content.


Pricing
Substra is a free and open-source platform licensed under the Apache 2.0 license.

This means:

  • Free for commercial and non-commercial use

  • No subscription or licensing fees

  • Source code fully accessible on GitHub

  • Supported by the Substra Foundation and community contributors

While the software itself is free, organizations may incur infrastructure costs related to Docker hosting, storage, and orchestration, especially in production deployments.

Explore Substra on GitHub


Strengths
Substra offers many advantages for privacy-conscious organizations seeking to implement secure machine learning workflows.

Data Privacy Protection
Keeps sensitive data at the source, eliminating the risks associated with data centralization or transfer.

Proven in Healthcare
Used in real-world medical AI projects, demonstrating robustness in highly regulated environments.

End-to-End Traceability
Blockchain-based audit trail provides full transparency over data usage, model updates, and access permissions.

Open Source and Transparent
Being open source promotes trust, collaboration, and extensibility across different use cases.

Flexible Architecture
Supports various machine learning frameworks and is easily containerized for reproducibility and scalability.

Secure by Design
Each computation is sandboxed in Docker containers, limiting access and exposure.

Community-Backed
Supported by an active foundation and ecosystem of contributors, ensuring continued development and support.

Designed for Collaboration
Built to handle the multi-party complexities of federated learning projects with role-based access and orchestration.


Drawbacks
Despite its strengths, Substra may not be the right solution for every organization.

Requires Technical Setup
Initial installation and configuration require familiarity with Docker, REST APIs, and backend systems.

Not a Plug-and-Play Tool
It is an infrastructure-level platform, not a user-facing app or automated ML system. Requires development effort.

Blockchain Overhead
The ledger-based tracking system adds complexity that may not be necessary for less regulated or small-scale projects.

Limited GUI Options
The command-line and API-based configuration can be daunting for non-technical users, though a web frontend exists.

No Built-in Data Science UI
Substra does not offer a native notebook or modeling interface; it relies on external ML code and tooling.


Comparison with Other Tools
Substra is often compared with frameworks like Flower, Federated AI Technology Enabler (FATE), and TensorFlow Federated (TFF).

Flower is a flexible and lightweight federated learning library but lacks built-in auditability and secure orchestration features.
FATE is more enterprise-focused and supports vertical federated learning but is more complex and tied closely to Chinese infrastructure standards.
TensorFlow Federated offers tight integration with the TensorFlow ecosystem but has less flexibility for custom orchestration or blockchain features.

Substra stands out by combining privacy, transparency, and collaboration in a framework built from the ground up for real-world, multi-institutional AI projects.


Customer Reviews and Testimonials
While Substra does not publish commercial testimonials, it has been used in notable real-world projects, such as:

  • MELLODDY: A federated learning project involving ten pharmaceutical companies to predict drug toxicity.

  • Hospitals in France: Applied to breast cancer and melanoma prediction by training across multiple medical centers.

  • Owkin Collaborations: Used in partnerships with major hospitals and labs for AI-powered diagnostics.

These applications validate Substra’s ability to deliver secure, collaborative AI in high-stakes environments.


Conclusion
Substra is a cutting-edge federated learning framework purpose-built for privacy, compliance, and collaboration. It empowers organizations to extract value from sensitive data while maintaining full control, transparency, and legal compliance.

Whether you’re a healthcare provider, research lab, financial institution, or government agency, Substra offers the tools to collaborate securely on machine learning projects without compromising trust or data protection.

Scroll to Top