Data moves. Almost all sources of data have an element of dynamism and motion about them. Even data at rest in some form of archival storage tier will have previously have led a more fluid life moving between applications, devices and network backbones and, inevitably, will have also moved to its resting place via a transport mechanism.
But although almost all data moves, not all data moves at the same speed, at the same cadence and with the same kind of system circularity, size or value.
Continuous data in streams
In the always-on world of cloud computing, mobile device ubiquity and with the planet’s new population of intelligent ‘edge’ machines in the Internet of Things (IoT), a continuous flow of data exists that we naturally refer to as a stream.
So then, what is data streaming and how should we understand it and work with it?
Data streaming is a computing principle and operational system reality where (typically small in size) elements of data travel through an IT system in a time-ordered sequence. Often referred to in the same breath as IT ‘events’ (everything from a user pressing a button on a mouse or a keyboard keystroke… and onward to asynchronous changes that happen as applications execute code and perform their work), the smaller elements of data that form data streaming flows might be made up of log files (small records tracing every behavioral step taken by applications and services), financial transaction logs, web browser activity records, IoT smart-machine sensor readings, geospatial or telemetry information, in-game video game movement and action information… and everything down to the smallest device instrumentation record, which like everything else here creates a data droplet that forms part of a continuous flow that ultimately makes up a stream.
An organization that works with a data-driven data-centric and data-derived approach can take practical steps to analyze its data streaming pipelines in real-time to provide a granular and accurate view of what’s happening in the business. By using a data streaming platform to perform sequentially-processed analysis of every data record in the stream, an organization can sample, filter, correlate and aggregate its data streaming pipeline and start to create a new layer of business insight and control.
MORE FROMFORBES ADVISOR
Real-time data streaming apps
Starting simple, a business might choose to build simple real-time streaming alert applications that flag minimum and maximum values to drive alarms and alerts when chosen metrics fall under or exceed pre-specified thresholds. Moving onwards, that same business might then think about applying Machine Learning (ML) algorithms to their data streaming pipeline to look for deeper trends that may be surfacing on a long-term (and eventually, also a shorter-term) basis.
If that’s data streaming for (quite smart) dummies and tech-aware business-tech people covered, then please feel good about ingesting this essential exposition and explanation i.e. this is a technology that is currently being applied to applications in every conceivable business vertical.
Data streaming players & open source
The shape of the data streaming market is typical of the enterprise cloud space at large. There are offerings from all of the major cloud services provider hyperscalers (AWS, Google Cloud Platform and Microsoft Azure), IBM has its finger in the pie and there are a group of IT vendors traditionally known for their enterprise data management and integration platforms (Tibco is a good example) that also enjoy share of voice.
Then there is open source, which in this case centralizes around Apache Kafka, an open source data stream processing platform written in the Java and Scala languages. Supporting the use of Kafka at the enterprise level is Confluent.
Confluent is a full-scale data streaming platform that enables users to access, store and manage data as continuous, real-time streams – built by the original creators of Apache Kafka, Confluent expands the benefits of Kafka with enterprise-grade features, while removing the burden of Kafka management or monitoring. Originally created by software developers working at LinkedIn back in in 2011, Kafka has today evolved from being a ‘simple’ messaging queue, to now represent a technology that works as a full data streaming platform. It can handle over 1 million messages per second, or trillions of messages per day.
Providing organizations with cloud-native, simple, scalable data streaming pipelines, Confluent offers 120+ pre-built connectors for real-time integration between source and destination systems, in-flight processing of data streams and a set of security, governance and resiliency features designed to comply with enterprise regulations governing use cases across distributed mission-critical workloads.
According to the company, Confluent also enables customers to go beyond just real-time integration between data systems, supporting real-time stream processing and analysis to power real-time decision-making and modern applications. The technology proposition from the company hinges around the idea that companies might love Apache Kafka, but they may well hate managing it. Consequently then, Confluent provides a cloud-native fully managed service that promises to go above & beyond Kafka.
What that above and beyond factor means in operational terms is data streaming without the need to shoulder tasks like cluster sizing (to determine the size of data backbone needed for any given data development lifecycle task) and an ability to avoid the over-provisioning of cloud data systems (buying and paying for more compute, analysis and storage than needed) before data streaming is brought online. The company also offers failover design, infrastructure management, security, data governance and global availability.
Removing underlying mechanics
Positioning its core technology offering as a route to deriving value from business data without the need to shoulder the management burden associated with its ‘underlying mechanics’ (such as how data is being transported or integrated between different disparate systems) the company says that it simplifies connecting data sources to Kafka, which is clearly a key enabler to building streaming applications. Additionally, this connectivity factor helps secure, monitor and manage a Kafka infrastructure.
According to the organization’s core product pages, “Today, Confluent is used for a wide array of use cases across numerous industries, from financial services, omni-channel retail, and autonomous cars, to fraud detection, microservices and IoT.”
The process here is one of integrating both historical and real-time data within the platform, all of which (Confluent claims) creates a new category of software applications i.e. data-driven ones capable of accessing a single source of data truth inside what the firm calls a universal data pipeline.
For its 2022 State of Data in Motion market analysis, Confluent surveyed some 1,950 IT and engineering leaders across six countries. One user expressed the trend movement in software engineering eloquently. “Real-time data streams are becoming core to how we serve customers and run our business,” said Yaël Gomez, vice president, global IT, integration and intelligent automation, Walgreens Boots Alliance.
Gomez explained that his department (and company) has used data in motion to manage customer engagement, to ensure vaccine and testing accessibility for patients, while also working to enable a differentiated online retail proposition in what it calls a seamless omni-channel experience.
Modern business & data streaming
If this picture of data movement does anything, it hopefully conveys just how much data dynamism has changed over the last quarter century.
As Confluent co-founder and CEO Jay Kreps has explained, we used to live in a very ‘batch processing’ oriented world of business and data. Companies would close up shop for the day, week or month and take stock of where their product inventories were, how the staff were performing and perhaps take a look what customer sentiment might look like. Company operational adjustments would happen on a quarterly basis, maybe, sometimes.
That business world does not exist any more. In the era of cloud, web, mobile ubiquity and the wider world of connected systems, organizations have to enable continuous movement and processing of data for better workflows, more automation, real-time analytics and differentiated digital customer experiences.
The term ‘modern’ is already over-used in technology circles with vendors claiming to offer modern programming tools (low-code), modern databases (with smart big data analytics), modern automation systems (with huge AI power) and everything else you can think of all the way up to modern user interfaces (capable of providing the same experience from desktop to tablet to smartphone or even kiosk and beyond) today.
But for all that modernization chatter, the one thing that might really be becoming modern is the business itself. A modern business runs hundreds of applications, cloud layers and services… all serving thousands (if not hundreds of thousands of user and machine endpoints) in a countless number of workflows.
The modern business will now be looking to harness event-driven applications using real-time data and that means data streaming. Don’t forget to paddle.