Data Platform - the centerpiece where it all comes together - Part 1
Published:
A data platform serves as the “link” between IoT connectivity and AI services. It enables the seamless integration and consolidation of data from various sources while acting as a robust foundation for AI services. Additionally, a data platform can provide data reliability, scalability, and robustness while also considering security, governance, and cost constraints. In this Blogpost I will describe some basics of a dataplatform and the requirements a dataplatform should fulfill. In the following blogpost (part 2) I will describe the components of a data platform as well as its functionalities.
1 The Platform Approach for the World of Digital Data
Software platforms have become indispensable in our modern world. They form the foundation for creating and executing applications, services, and other software components, bundling various components and tools for software development or their use.
By utilizing software platforms, developers are provided with the opportunity to build upon existing functionalities and focus on implementing their own ideas and solutions instead of reinventing fundamental technical details. The use of a platform can thus enable time and cost savings by shortening development cycles and simplifying maintenance. A prominent example is app stores on mobile devices, which allow users to install various applications on their devices with just a few clicks while also supporting developers in providing new applications.
With the increasing collection and processing of data, particularly in the industrial sector, the demand for comparable platforms for the collection and processing of digital data is growing. The term “data platform,” often used synonymously for specific technological solutions, generally refers to a conglomerate of technologies and tools such as databases, data processing tools, and data analysis software.
A data platform enables the collection, transformation, and standardization of digital data from various sources, making processed data readily available for end users or enabling its utilization by software applications and business intelligence solutions.
A data platform can encompass diverse types and forms of digital data, classified by their structure, origin, or generation rate. One particularly relevant classification for industrial applications distinguishes between time-series data and descriptive data, which are defined in the following sections.
1.1 Time-Series Data
Time-series data consists of observations recorded at specific time intervals, typically with uniform spacing. These data are commonly used to understand the behavior of a system or process over time. In industrial facilities, parameters such as temperatures, pressures, vibrations, or flow rates are frequently monitored and recorded.
Analyzing time-series data can reveal patterns, trends, and anomalies within a system or process, facilitating services like automatic fault detection, predictive maintenance, or general efficiency optimizations (see my blogposts about AI Services). If captured and analyzed in real-time, these data can provide insights into the performance and operational state of industrial processes and facilities. Real-time data streams are critical in industrial environments, allowing operators to respond quickly to changes and take proactive measures to prevent issues before they occur.
Examples of Time-Series Data:
- Machine Condition Data: Parameters such as temperature, pressure, humidity, flow rate, and vibration from machine components or systems. The measurements collected by these sensors over time can be used to monitor system health and detect potential issues before they become critical.
- Energy Consumption Data: Information about energy consumption in industrial facilities, including electricity, fuel, and other energy sources. Analyzing these data helps identify areas where energy usage can be reduced, leading to cost savings and enhanced sustainability.
- Environmental Data: Measurements of environmental factors such as air quality, water quality, and noise levels generated by industrial facilities. Collecting these data over time helps detect trends and patterns that indicate environmental concerns requiring intervention.
- Supply Chain Data: Tracking supply chain performance, including lead times, shipping and receiving activities, and inventory levels. Analyzing these data helps businesses identify areas for improvement in supply chain efficiency and cost reductions.
1.2 Descriptive Data
Descriptive data in the industrial analysis context refers to information that describes operational processes and equipment used in industrial environments. This includes CRUD data (Create, Read, Update, Delete) related to database records, operational data collected via interfaces such as APIs, and metadata providing additional information about collected data, including source, quality, and relevant details.
Descriptive data encompasses a broad spectrum of information, including hardware specifications, production rates, machine downtime, maintenance logs, machine thresholds, and key performance indicators.
By analyzing descriptive data, companies can identify trends, patterns, and anomalies in their operations, leading to improved efficiency, cost reductions, and increased profitability. Descriptive data also plays a crucial role in predictive and prescriptive analytics, where historical data is used to make forecasts and define actions for future performance improvements.
Examples of Descriptive Data:
- Production Line Metrics: Data describing production processes, such as output types, quality metrics, and material consumption, used to identify trends, patterns, and performance issues.
- Maintenance History: Records of equipment maintenance, including repair logs, replacements, and preventive maintenance activities. These data help track performance, detect issues, and optimize maintenance schedules, particularly in predictive maintenance approaches.
- Operational/Financial Data: Machine performance data linked to financial information such as revenue and cost streams. Maintenance schedules can be optimized based on market demand. These organizational data help bridge the gap between workshop operations and corporate decision-making.
2 Requirements and Objectives of a Data Platform
A modern data platform serves multiple objectives for processing and utilizing data within a company. It must also fulfill various functional and non-functional requirements. The primary goals and requirements include:
Single Source of Truth A data platform should centralize all company data to ensure consistency, accuracy, and reliability. Without a unified source, data may be scattered across different systems and departments, leading to inconsistencies, errors, and a lack of trust in the data.
Self-Service
An essential requirement for a data platform is the ability for users to easily and flexibly access data and create analyses or reports. This is particularly important for non-technical roles, such as business analysts, who may not have the skills or knowledge to perform complex data queries. To meet these requirements, tools must be provided that are easy to use for all users, such as Microsoft Excel, Power BI, or other common analytics tools.Data Democratization
A data platform supports the concept of data democratization by potentially providing everyone in the company with access to data and analytics. By offering standardized ways to access data, it ensures that every user can retrieve data in the same manner without needing to understand the different configurations and structures of various data sources. This simplifies data access compared to directly accessing different purpose-built databases or data dumps. However, it must be ensured that data is protected in terms of privacy and confidentiality and that only users with the appropriate access permissions can view it.Transparency
In the spirit of transparency, data should help visualize the company and its processes, enabling fact-based decision-making. The goal is to base decisions on reliable data rather than relying on gut feeling. Therefore, a data platform should provide a dashboard or reporting system that allows users to see the most important business metrics at a glance.Agility
In our fast-paced world, companies must be able to respond to changing market conditions. This requires that new data sources can be quickly and easily integrated into the platform. A data platform must therefore be agile and adaptable to react swiftly to changes.Performance
The use of a data platform enables many analytical queries that would not be possible with purpose-built databases, as they are typically not designed for analytical workloads. Additionally, a data platform helps ensure that the production system is not overloaded. Data scientists do not have to worry about their analyses slowing down or even crippling the system.
Beyond the core requirements, there are additional secondary requirements, such as data security, data quality, and data timeliness. A data platform must ensure that data is protected against unauthorized access, that data quality remains high, and that data is always up to date.
Good documentation of data and exploration capabilities (such as a data catalog and data lineage) are also important secondary requirements. These ensure that users understand where the data comes from and how it can be used.
Overall, a data platform can help companies use their data more effectively and make better decisions based on reliable data. By meeting the requirements mentioned above, a data platform can serve as a single source of truth and increase trust in the data across the entire organization.
Leave a Comment