Hopsworks

Name: Hopsworks
Rating: 8.1 (1 reviews)

🇸🇪

European open-source feature store and ML platform for the data-for-AI lifecycle

8.1/10

EU-BuiltGDPREU DataOpen SourceFree Tier

Hopsworks is a Stockholm-based ML platform and open-source feature store founded in 2016 as a spinout of KTH Royal Institute of Technology and RISE SICS AB by professor Jim Dowling. It provides a unified platform for the data-for-AI lifecycle — real-time and batch feature engineering, online feature serving via RonDB, model registry, model serving, and MLOps pipelines. Available as open-source community, managed serverless, and enterprise editions.

Headquarters

Stockholm, Sweden

Founded

2016

Pricing

Freemium

EU Data Hosting

Yes

Employees

51-200

Open Source

Yes

feature-storemlopsmachine-learningreal-time-mlopen-sourcepythonapache-sparkeu-aimodel-registryvector-database

Ratings

Ease of Use6.5

Feature Depth9.0

Value for Money8.0

EU Compliance9.0

Support Quality7.0

Integration Ecosystem8.0

Features

Core Features

✓Online and offline feature store
✓Real-time feature serving via RonDB
✓Batch feature pipelines with Spark and Flink
✓Model registry and versioning
✓Model serving and inference pipelines
✓Feature monitoring and data drift detection
✓Vector database integration
✓Python HSFS client library

Standout Features

★RonDB-backed online feature store for sub-millisecond feature lookups
★Unified real-time and batch feature computation in one platform
★Model registry with lineage tracking from features to models
★Vector similarity search for embedding-based ML use cases
★Feature group versioning and time travel

Compliance

☖GDPR compliant
☖SOC 2 Type II (managed cloud)
☖EU data hosting available
☖Open source AGPLv3 community edition
☖Data lineage and audit trails

Pricing

Community (Open Source)

Free

Full Hopsworks feature store
Self-hosted deployment
Apache Spark and Flink pipelines
Community support via GitHub/Slack

Serverless

Pay-as-you-go

Managed cloud on AWS/Azure/GCP
Free tier with usage limits
Pay-as-you-go beyond free tier
Hopsworks-managed infrastructure
99.9% SLA on paid usage

Enterprise

Contact Sales

Dedicated managed cluster
VPC and private networking
SAML/SSO and RBAC
Dedicated customer success
Custom data residency options

Billing: free, usage_based, custom

Integrations & API

Apache SparkApache FlinkPython (HSFS client)AWS SageMakerAzure MLGoogle Vertex AIDatabricksdbtApache KafkaSnowflake

API AvailableWebhook Support

Support

EmailSlack-communityGithubDedicated-csmDocs: GoodCommunity Forum

Pros

✓Pioneered the open-source feature store concept — the project has the longest track record and most mature featurisation patterns of any open-source ML platform
✓KTH and RISE research spinout brings deep academic rigour to distributed ML infrastructure; founder Jim Dowling is one of the leading MLOps researchers in Europe
✓Real-time online feature serving via RonDB delivers low-latency feature lookups for production ML systems, not just batch training pipelines
✓EU-headquartered and EU-hosted — strong alignment with GDPR data residency requirements for European enterprises deploying ML on sensitive data
✓Open-source community edition is genuinely full-featured, not a crippled gateway to the paid tier

Cons

✕Steeper learning curve than managed ML platforms like AWS SageMaker or Vertex AI — requires investment to understand feature store concepts and deployment patterns
✕Smaller ecosystem and community than Databricks or MLflow; fewer pre-built integrations with less common tools
✕Serverless tier has usage limits that can surprise teams scaling from development to production
✕Documentation quality varies across the feature set — core concepts are well covered but advanced deployment scenarios can require digging through GitHub issues

Frequently Asked Questions

A feature store is a data platform specifically designed to manage ML features — the computed attributes used to train and serve machine learning models. Without a feature store, teams typically duplicate feature computation logic across training pipelines and serving code, leading to training-serving skew (where models behave differently in production than during training). Hopsworks unifies feature computation, storage, and serving so the same features used in training are served in real-time production inference.

Yes. The Hopsworks community edition is available under the AGPLv3 licence on GitHub. It includes the full feature store, model registry, and ML pipeline infrastructure. The managed Serverless and Enterprise editions add managed infrastructure, SLAs, and enterprise features like SSO and dedicated support. There is no artificial feature limitation in the open-source edition to push users toward paid tiers.

Hopsworks predates SageMaker Feature Store and offers a more complete feature engineering workflow, including support for Spark and Flink-based batch pipelines alongside real-time stream processing. Hopsworks is cloud-agnostic and can run on AWS, Azure, GCP, or on-premise — SageMaker Feature Store is AWS-only. For European teams with data residency requirements, Hopsworks' EU headquarters and EU hosting options are significant advantages over AWS.

Hopsworks was founded by Professor Jim Dowling, a distributed systems researcher at KTH Royal Institute of Technology in Stockholm. The company spun out of KTH and RISE SICS AB (Research Institutes of Sweden) in 2016. The academic lineage gives Hopsworks unusually deep expertise in distributed ML infrastructure compared to typical enterprise software vendors.

Yes. Hopsworks is headquartered in Stockholm, Sweden (EU member state) and offers EU-hosted managed cloud deployments on AWS Europe, Azure Europe, and GCP Europe regions. The managed platform is SOC 2 Type II certified. For organisations with strict GDPR data residency requirements, Enterprise deployments can be configured with dedicated EU-region infrastructure and private networking.

What Is Hopsworks?

The infrastructure layer for production machine learning has been built almost entirely by US companies. Databricks (San Francisco), Tecton (San Francisco), AWS SageMaker (Seattle), Google Vertex AI (Mountain View) — the organisations that set the standards for how enterprises manage ML features, train models, and serve predictions are, with a few exceptions, American. European ML teams frequently build on American infrastructure, accept US data residency by default, and navigate GDPR compliance as an afterthought.

Hopsworks is the significant exception. Founded in 2016 as a spinout of KTH Royal Institute of Technology and RISE SICS AB — Sweden's leading research institutes — by Professor Jim Dowling, the platform predates AWS SageMaker Feature Store (launched 2020) and can reasonably claim to have invented the open-source feature store concept. The company is headquartered in Stockholm (operating as Logical Clocks AB), subject to Swedish and EU law, and offers EU-hosted managed deployments on AWS Europe, Azure Europe, and GCP Europe regions.

The ML feature store problem Hopsworks solves is fundamental: without a dedicated platform, ML teams duplicate feature computation logic between training pipelines and serving code, creating "training-serving skew" where models behave differently in production than during training. Hopsworks unifies feature engineering, storage, versioning, and serving — so the same feature definitions used to train models are the same ones served at inference time.

Key Features

Feature Store with Online and Offline Serving

Hopsworks provides both an offline feature store (for batch training pipelines) and an online feature store (for real-time inference). This distinction matters in production ML systems: offline features are computed in batch — typically using Spark or Flink on historical data — and stored for model training. Online features need to be served at low latency during inference — milliseconds, not seconds — because they are called at prediction time.

Most feature store implementations have weaker online serving than offline storage. Hopsworks addresses this with RonDB, a distributed in-memory key-value store derived from NDB Cluster technology, which backs the online feature store and delivers sub-millisecond feature lookups. This matters for ML applications like fraud detection, recommendation systems, and personalisation, where prediction latency is a product constraint.

Model Registry and Model Serving

Beyond the feature store, Hopsworks includes a model registry for versioning and managing trained ML models, and model serving infrastructure for deploying and serving predictions. The model registry tracks lineage from feature group versions through training data to the deployed model, which is critical for auditing and reproducing ML system behaviour — a growing regulatory requirement in EU contexts under the AI Act.

Vector Database Integration

More recent versions of Hopsworks integrate vector similarity search, supporting embedding-based ML applications such as semantic search, RAG (retrieval-augmented generation), and recommendation systems. Vector search runs alongside the feature store, meaning embedding features and structured features can be served together in a unified low-latency lookup.

Open-Source AGPLv3 Community Edition

The community edition is fully open source under AGPLv3 and available on GitHub. Unlike many "open core" platforms that withhold commercially useful features to drive enterprise sales, Hopsworks' community edition includes the full feature store, model registry, and pipeline infrastructure. Teams can self-host on any cloud or on-premise infrastructure, inspect the code, and contribute to development.

Python HSFS Client

The primary interaction model for Hopsworks is the Python HSFS (Hopsworks Feature Store) client library. ML engineers define feature groups, feature views, and feature pipelines in Python, and the platform handles storage, versioning, and serving underneath. The HSFS client integrates with Pandas, Polars, and Spark DataFrames, which means minimal changes to existing Python ML workflows when adopting Hopsworks.

Pricing

Hopsworks operates a three-tier model. The community edition is free: open source, self-hosted, full-featured, community-supported via GitHub and Slack. The serverless managed tier starts free (with usage limits) and scales to usage-based pricing beyond the free tier limits — designed for individual ML engineers and small teams who want managed infrastructure without an enterprise sales process.

Enterprise pricing is custom and includes dedicated managed clusters, private VPC networking, SAML/SSO, RBAC, dedicated customer success, and custom data residency arrangements. Enterprise is the tier for organisations with production ML systems, data governance requirements, and SLA needs.

For teams evaluating whether to self-host the community edition versus use the managed serverless tier, the calculation is typically: self-hosting costs engineering time for infrastructure management; the serverless tier costs money for that time back. Most teams starting with Hopsworks begin on serverless and migrate to enterprise when production scale and governance requirements mature.

EU Compliance & Privacy

Hopsworks' EU positioning is among the strongest in the ML platform space, and it is structural rather than cosmetic.

The company is headquartered in Stockholm, Sweden — a founding EU member with one of the world's strongest data protection regimes (Datainspektionen, now the Swedish Authority for Privacy Protection, or IMY). Logical Clocks AB is a Swedish entity subject to Swedish law and EU GDPR. The managed cloud platform offers EU-region deployments on AWS Europe (Frankfurt, Ireland), Azure Europe (Netherlands, Ireland), and GCP Europe (Belgium, Frankfurt), meaning data can remain within EU borders throughout the ML lifecycle.

The managed cloud platform is SOC 2 Type II certified. The open source community edition allows organisations to self-host entirely within their own infrastructure — on-premise, in a private cloud, or in a specific EU data centre — with zero data leaving the organisation's control.

For European enterprises navigating GDPR compliance in ML systems — where training data and model outputs can contain personal data — Hopsworks' EU headquarters, EU hosting options, data lineage capabilities, and feature group versioning provide compliance tooling that US-headquartered ML platforms cannot offer with equivalent simplicity.

Who It's Best For

European enterprises building production ML systems on sensitive data (financial services, healthcare, telecommunications) where GDPR data residency requirements apply to training data and model outputs. Hopsworks' EU headquarters and EU hosting options provide structural GDPR alignment.

ML engineering teams replacing ad hoc feature pipelines who are experiencing training-serving skew, duplicated feature computation, or loss of model lineage. These are the canonical feature store problems Hopsworks was built to solve.

Research organisations and academia where the KTH/RISE spinout origin resonates, the open-source AGPLv3 licence is preferable to commercial tools, and the academic publication record (Professor Dowling's MLOps research) provides confidence in the platform's design decisions.

Teams evaluating Databricks or AWS SageMaker Feature Store who want a cloud-agnostic, EU-headquartered alternative with an open-source option.

If the priority is a cloud-agnostic, EU-headquartered feature store with open-source transparency, choose Hopsworks. If the priority is a low-code, notebook-first experience for early-stage data science teams, choose Deepnote or stay in plain Jupyter instead. If production ML is centred entirely on AWS and AWS-only dependencies are acceptable, SageMaker Feature Store will integrate more cleanly.

The Verdict

Hopsworks occupies a distinctive position: the original open-source feature store, founded at one of Europe's leading technical universities, with production-grade real-time serving, model registry, and vector search capabilities — all from a Stockholm-headquartered team subject to EU law. For European ML engineering teams building production systems, the combination of genuine technical depth, EU data sovereignty, and open-source transparency is not replicated elsewhere. The learning curve is real and the community is smaller than Databricks, but for teams who prioritise EU alignment and open-source audibility in their ML infrastructure, Hopsworks is the serious choice.

Frequently Asked Questions

What is a feature store and does my team need one?

A feature store manages the computed attributes (features) that ML models use for training and predictions. Teams typically need one when they have more than one or two production ML models, when multiple teams need to share feature definitions, or when they are experiencing training-serving skew (models behaving differently in production than in training). Without a feature store, feature computation logic gets duplicated across training and serving pipelines with no guarantee of consistency.

Is Hopsworks suitable for small teams or only enterprise?

The open-source community edition and free serverless tier are specifically designed for individual ML engineers and small teams. A data scientist can start with Hopsworks on the serverless tier with no infrastructure management and no upfront cost. Enterprise contracts are for organisations with production SLAs, private networking requirements, and dedicated support needs. The licensing model does not artificially restrict features to push small teams toward enterprise.

How does Hopsworks compare to AWS SageMaker Feature Store?

Hopsworks predates SageMaker Feature Store by several years and offers a more mature feature engineering workflow, including native Spark and Flink pipeline support, deeper online serving capabilities via RonDB, and a cloud-agnostic architecture (vs. SageMaker's AWS-only dependency). For European teams, Hopsworks' Swedish headquarters and EU-region managed deployments provide data sovereignty that AWS — as a US company under CLOUD Act jurisdiction — cannot structurally replicate.

What is the academic background behind Hopsworks?

Hopsworks was founded by Professor Jim Dowling, who holds a position at KTH Royal Institute of Technology in Stockholm. The company spun out of KTH and RISE SICS AB (Research Institutes of Sweden) in 2016. Dowling has published extensively on distributed ML systems, MLOps, and feature stores. This academic origin gives Hopsworks an unusually rigorous theoretical foundation for its design decisions, and active KTH and RISE research connections continue to influence the platform's development.

Is Hopsworks GDPR compliant for training data with personal information?

Hopsworks' Swedish headquarters makes it subject to EU GDPR directly. Managed deployments in EU cloud regions ensure data does not leave EU borders. Feature group versioning and data lineage tracking provide the audit trails needed to respond to GDPR data subject requests (right to erasure, right of access) in ML systems — a capability that generic ML platforms rarely address explicitly. Enterprise customers can additionally configure private VPC networking and custom data residency arrangements for specific compliance requirements.