A data hub is a central repository that collects, stores, and processes data from multiple sources, and determines how data is exchanged within the system landscape. In essence, it governs how data is structured and processed within an organization. All data from sources such as databases, applications, and sensors, runs into the hub, where it is then cleansed, transformed, and integrated into a unified data model.
In a hub architecture, all systems that exchange information for relevant business processes are connected to the hub via an interface. Instead of a point-to-point connection linking individual systems, all systems in the landscape are linked through the hub, making their communication more efficient and reliable. The data hub can also provide the basis for stakeholders in an organization to make data-driven decisions, offering them a single, integrated view of data that can be readily accessed and analyzed.
Example of a a multi-layered data hub structure
Overall, the technologies used in a data hub will vary depending on the specific needs and requirements of the organization, but they typically involve some combination of ETL processes, data warehousing technologies, data integration tools, and data quality and governance tools. Also, the choice of programming languages is highly important.
- Extract, transform, load (ETL) processes: ETL processes are commonly used to collect, cleanse, transform, and integrate data from multiple sources into a data hub.
- Data warehousing technologies: Data warehousing technologies provide a centralized repository for storing and managing large amounts of data. They are often used in conjunction with ETL processes to support data hubs.
- Data integration tools: Data integration tools are designed to connect and integrate data from a variety of sources, making it easier to access and use the data within a data hub.
- Data quality and governance tools: These tools are used to ensure the data within a data hub is accurate, consistent, and up-to-date, and to monitor and manage the data as it flows through the system.
- Languages: When setting up a data hub, the chosen programming language is also important. It makes sense to use languages that support the principle of parallel programming, allowing the simultaneous execution of certain operations on a server. Experience shows that Erlang, Go, Typescript or Elixir are particularly suitable.
There is no single template for building a data hub. Depending on the individual use cases and requirements, there are a number of approaches that can be used to build the underlying system architecture. One strategy that has proven itself, however, is adopting a so-called hexagonal architecture approach [see figure]. Also known as the "ports and adapters" pattern, this is a software architecture pattern that separates the internal workings of a software system from its external interactions. This allows the core of the system to be isolated from the details of the specific ways in which it is used or integrated with other systems.
Hexagonal architecture is often used to implement systems that are highly flexible and extensible, as it allows new features and functionality to be added without having to make changes to the core of the system. In a data hub using this architecture, the system is divided into various loosely decoupled components that communicate with their environment through ports and adapters. All data entering the hub is processed by a message broker. The advantage of this approach is that it is horizontally scalable, as it enables the hub to handle high data throughputs. This is suitable for fast-growing companies as well as for business models with a fluctuating workload, for example online retailers with seasonal business. To ensure that no information is lost, the message broker also backs up every data change – even if individual systems should fail.
In addition, this architecture facilitates test-driven development and shows clear boundaries between infrastructure code such as frameworks, databases and message brokers, as well as business logic code. This allows for the sustainable development of a clean and maintainable application. A side effect of the modular architecture is that no one steps on each other's toes during development. Since the components of the software are cleanly separated from each other, several developers can work on the same software. This also strengthens teamwork.
Furthermore, it makes sense to pursue an event-driven architecture approach: every change in the data, for example when product information is updated, is understood as an event and announced to the other systems. They then decide whether the displayed event is relevant for their own application. Such an internal event system enables the export processes from the hub to scale.
There is excellent off-the-shelf middleware available, some of which is configurable and simple to use. This means that simple use cases can be implemented quickly and easily. Enterprise Service Bus solutions offered by larger vendors can also be a workable option.
However, if requirements become unique and customized solutions are required, developing a company's own data hub is a viable option. The complexity of the IT landscape, not the size of the company, is decisive here. In general, the more different applications that are integrated into a system and the closer they must be in exchange with one another, the more benefits will lie in implemeting a data hub.
Still developing solid IT infrastructure takes time and effort – and custom development is more difficult to plan than implementing more standardized solutions. Companies should set clear goals and milestones, so as not to get lost in the weeds or derail their project by starting the with most difficult areas. Finally, sector-specific expertise can go a long way towards a successful data hub project. This is especially true in B2B, where solving complex issues such as product availability, stock levels, or pricing often requires highly specific knowledge only found with specialized departments or external service providers.