DatE
March 17, 2025
Reading Time
17 MIN.

Data for the People – How data products and APIs come together

API

By

Andreas Siegel

Data is the foundation of digital innovation. It enables informed decision-making, powers AI applications, and creates new business models. Yet, all too often, valuable data remains trapped in silos—unused and difficult to access.

There can be very valid reasons for this: Microservices, domain-driven design, and, more generally, the principle of information hiding or data encapsulation foster a software culture where data is treated as a hidden sanctuary. Access is granted only to those who have authority over it. Typically, these are the applications within whose context the data is generated or needs to be processed. As a result, these applications (along with their APIs) act as literal gatekeepers, strictly regulating who gets access—and to what. Essentially, due to this level of indirection, we can no longer say that we are providing access to data; rather, we are providing a service and its API, which "coincidentally" grants access to hidden data from the application context through its endpoints.

From this perspective, the service acts as a barrier that must first be built. When dealing with large volumes of data that we actually want to make available, developing and deploying services for integration require considerable resources—an approach that is difficult to scale. Things become even more complicated when data needs to be shared with external partners.

However, data can provide immense added value and serve as the foundation for new business models. For these models to succeed, data provisioning must be as efficient as possible. So how can companies ensure efficient access to their data, both internally and for external partners? The answer: data products and APIs. Below, we illustrate how a modern architecture inspired by the Data Mesh concept bridges the gap between data products and APIs, unlocking new opportunities for data-driven innovation.

The First Building Block of a Scalable Data Strategy: Thinking of Data as a Product

The Data Mesh concept, introduced by Zhamak Dehghani, is based on the realization that centralized data lakes or data warehouses often lack the flexibility that modern enterprises require. Instead, Data Mesh advocates for a decentralized organization of data, where ownership and responsibility are distributed across individual business domains.

This starting point is quite similar to the microservices landscape outlined earlier, with one key distinction: A central concept in Data Mesh is the data product. Instead of considering data as merely a byproduct of operational applications (which remains hidden within dedicated databases), Data Mesh places data at the center—treating it as a standalone, value-generating entity.

A data product is not merely about marketing a database or dataset. It also encompasses governance rules, access controls, quality metrics, and standardized interfaces for consumption. All of this (and more) is described in the Data Contract, which—despite the name—is not a legal contract but a specification of the data being offered, similar to the OpenAPI Specification for APIs. Data Contracts define the format and quality of a data product to ensure reliable usage. This includes details on data structure, update frequency, and service level agreements (SLAs) for data access.

This product-oriented approach establishes accountability and is crucial for the success of data-driven use cases. Key factors include:

  • Clear ownership: Every data product has a dedicated team responsible for its quality and availability.
  • Effective marketing: To be utilized within the company, data products must be discoverable—either through an internal data catalog system or targeted communication.
  • Comprehensive documentation: A well-documented data product facilitates integration and reduces support inquiries. For instance, it should clarify how to access the data—are specific predefined queries required? This must be documented.

Within an organization, there are numerous valid use cases for data products, including internal reporting, machine learning models, and real-time operational analytics. The key is to view them not just as technical assets but as products with clear value for end users. To maximize success, data products should be tailored to specific use cases, emphasizing their practical value. This business-oriented approach also enables non-technical departments to define data products.

Thus, data products become the first building block of a scalable data strategy. They primarily serve analytical purposes but can also be utilized by custom software applications. However, a limitation remains: Data products are designed for internal consumers, leveraging technologies such as SQL. External parties cannot access them directly due to the security risks and challenges of exposing internal systems and infrastructure. Additionally, some systems or SaaS solutions require an HTTP API for integration.

The Second Building Block of a Scalable Data Strategy: APIs as the Bridge

Such external access requires an additional abstraction layer—APIs.

Data products lay a strong foundation for data provisioning. However, making them accessible to external stakeholders requires flexible and standardized mechanisms. This is where APIs come into play as public interfaces. APIs are reinforced by classic API management capabilities, including authentication, access control, and monetization—all essential for the secure and economically viable distribution of data. API management also adds significant value by standardizing protocols and simplifying access.

Moreover, APIs enhance the user experience when accessing data by offering additional functionalities such as filtering options, query mechanisms, and multiple export formats.

At this point, integration becomes necessary to provide these supplementary data-related functionalities. However, this does not mean that each API should be manually developed as a separate integration project. The goal is not to create a new, custom-built API for every data product through traditional API development—as this approach is too time-consuming and costly.

Instead, integration should be efficient, scalable, and reusable. The key here is automation. The commonalities among internally provided data products serve as the foundation. By defining an API specification for the data product, supplemented by any necessary configuration parameters, we can automatically generate an API service. This results in a standardized interface. This approach drastically reduces development efforts while ensuring consistent API quality: Developers only need to adjust the configuration file or API specification without writing API code themselves.

Conclusion: Efficient Data Utilization Over Mere Data Possession

In summary, our scalable data strategy architecture combines data platforms with automated API generation. The idea is that a data product does not just provide raw data but also includes metadata and configuration parameters that can be leveraged to automatically create an API.

The workflow at a glance:

  1. A data product is defined, including data structure, access controls, and quality metrics.
  2. An API is automatically generated from the data products based on a central configuration template.
  3. API management ensures governance and security, handling aspects such as access rights, billing models, and monitoring mechanisms.

This approach integrates the cross-functional capabilities of API management with the flexibility and user experience enhancements surrounding data access. By aligning data products with specific use cases, we maximize their value.

Data is the backbone of the digital economy. However, only through the combination of data products and APIs can its full potential be unlocked. Data products enable structured and responsible data provisioning within an organization, while APIs ensure that this data can be efficiently and flexibly integrated into applications—both internally and externally.

An architecture that emphasizes automation and code generation offers a crucial advantage: APIs are no longer manually developed standalone projects but rather scalable interfaces derived from data products.

The future belongs to organizations that do not just collect data but intelligently provision and utilize it. Data for the People!