Welcome to insideBIGDATA’s “Heard on the Street” round-up column! In this regular feature, we highlight thought-leadership commentaries from members of the big data ecosystem. Each edition covers the trends of the day with compelling perspectives that can provide important insights to give you a competitive advantage in the marketplace. We invite submissions with a focus on our favored technology topics areas: big data, data science, machine learning, AI and deep learning. Enjoy!
Factors influencing the demand of AI in today’s world. Commentary by Shubham A. Mishra, Global CEO and Co-Founder of Pixis
AI is fundamentally changing the way businesses communicate with their audiences, in that, it’s helping improve the accuracy of communication and targeting. With an expected growth of 40.2% over the next few years, AI will be transformative to any business helping them provide a seamless customer experience. With AI, businesses are able to get data-backed insights into performance, which empowers practitioners across the board to get clarity on the effectiveness of their efforts. Marketers will gain sharper insights into the “what the audience wants” aspect, thus optimizing their business growth. In the next few years, we will witness modern AI generative networks completely rebooting the landscape of digital content creation and empowering brands to hyper tune their messaging to every single potential customer. With the shift to cookieless web, AI is going to play an important role in promoting strategic process, and will be in the front seat of executing any campaign because of its power to optimize efficiency by its self-evolving nature.
The Growing Impact of Data Storytelling and How to Harness It. Commentary by Mathias Golombek, CTO, Exasol
As more organizations today become increasingly data-driven, they are using data storytelling to glean the most accurate, meaningful, and actionable insights from their data. Data storytelling provides the much-needed context for painting a clearer picture. Without this context, data insights can fall flat. For business leaders, data storytelling explains what the data is showing and why it matters. According to Exasol’s research, nearly all (92%) IT and data decision-makers surveyed agreed that storytelling is an effective means of delivering the findings of data and analytics. Given this trend, there is a major demand for data storytellers across all industries with companies seeking to build best-in-class data teams with people from different backgrounds with various skillsets. These modern data scientists need to have more than just technical knowledge and advanced data science skills but also the ability to interpret data for business-focused stakeholders. To make data storytelling truly successful, organizations must empower knowledge workers to become more data savvy so that they can also interpret the data along with their more technical counterparts. Data storytelling isn’t just about being able to work a data platform; it’s also about data literacy skills and the ability to communicate more widely – understanding the business context, the importance of numbers, and then break those down into a pithy, compelling narrative. Fortunately, there are new smarter self-service tools that help both teams turn data into stories — including new easy-to-use BI tools, self-service data exploration and data preparation solutions and auto-machine learning tools that enable nearly all employees to interpret complex information on their own, act on their findings and then let them tell their own data stories.
Apple outage potentially caused by lack of connection between systems and data. Commentary by Buddy Brewer, New Relic Group Vice President & General Manager
As companies scale and tech stacks become more complex, the risk of outages will rise. Outages like this can happen to any company at any time. When an outage happens, the impact to the business can snowball really fast. Not only is the IT team trying to get the system back up and running, they are also fielding what can be a massive influx of requests ranging from internal stakeholders up the Board level to customer complaints. Minimizing the time to understand the issue is critical. What makes this difficult is that most companies have observability data scattered everywhere. The first thing any company needs to do to fix the issue is to focus on connecting the data about their systems together, ideally storing it all together so that they can gain a single pane of glass view of their system to resolve issues quickly, minimizing the impact to their end-users. Redundancy in the form of failovers, multi-cloud, and more is also important for the resilience of their system.
Why More Companies are Leaving Hadoop. Commentary by Rick Negrin / Vice President, Product Management, Field CTO at SingleStore
While Hadoop started off with the promise of delivering faster analytical performance on large volumes of data at lower costs, its sheen has worn off and customers are finding themselves stuck with complex and costly legacy architectures that fail to deliver insights and analytics fast. It doesn’t take more than a quick Google search to see why enterprises around the world are retiring Hadoop. It wasn’t built to execute fast analytics or support the data-intensive applications that enterprises demand. Moreover, Hadoop requires significantly more hardware resources than a modern database. That’s why more companies are seeking out replacements or, at the very least, augmenting Hadoop. As an industry, we must meet the needs of demanding, real-time data applications. We must ensure there are easier, more cost and energy efficient choices for users who need reliable data storage and rapid analytics for this increasingly connected world.
Snowflake’s new cloud service signals new trend of industry-based cloud offerings. Commentary by Clara Angotti, President at Next Pathway
Snowflake’s new Healthcare & Life Sciences Data Cloud is a great example of the new trend to vertically specialized cloud offerings and services. The use of the cloud is becoming purpose-driven as companies are choosing cloud data warehouse and cloud platforms based on their ability to enable specialized business change. Companies are looking for industry-specific solutions based on applications, services, security and compliance needs to drive unique business outcomes. As this market becomes more lucrative and competitive, the players will look to differentiate themselves through unique, vertical offerings.
New Data Privacy Laws. Commentary by David Besemer, VP/Head of Engineering at Cape Privacy.
If data is kept encrypted when stored in the cloud, the risks associated with unauthorized access are mitigated. That is, even if data becomes inadvertently exposed to outside actors, the encryption maintains the privacy of the data. The key to success with this approach is to encrypt the data before moving it to the cloud, and then keep it encrypted even while processing the data.
How AI Puts a Company’s Most Valuable Asset to Work. Commentary by David Blume, or VP of Customer Success, RFPIO
Every business leader asks themselves a common question; how do I get my employees to perform their best work? And while effective hiring, pay incentives and a positive workplace environment all play a large role, one aspect that often gets overlooked involves the tools that, once implemented, can improve every aspect of employee efficiency. That’s where machine learning-driven response management software comes into play. Response management software that incorporates machine learning helps employees at every level utilize their company’s most valuable resource: knowledge. When a company invests in the technology that allows its workers to utilize all their content in an accurate, accessible and effective manner, it can have wide ranging and substantial benefits to the organization. For higher-level executives, this helps reduce repetitive questions from lower-level staff and minimizes errors as a result of old or inaccurate data. For employees who just joined a company, the onboarding process will be quicker and more streamlined as many of the questions they will have can now be easily accessible, accurate, and addressable via a shared knowledge library. Used properly, response management software can improve employee productivity resulting in increased ROI and boosted bottom lines.
Cloud Costs On the Rise? Developers are less than thrilled. Commentary by Archera CEO Aran Khanna
It’s no secret that cloud costs are devilishly hard to predict and control, but at least they trend downward over time. Every customer faces a visibility puzzle, making it tough to understand which team is causing cloud spending shocks – and whether they stem from traffic gains (good) or wasteful deployments (bad). Then throw in complex billing structures, countless resource and payment options, and the fact that customers consume before they pay. Small wonder “the house” always seems to win when the invoice arrives, regardless of which cloud provider the house is. You can now add price inflation to these challenges, with Google boosting some prices 100% starting October 1st. But, some prices – certain archival storage at rest options, among others – are dropping. Or, capacity changes; Always Free Internet egress will jump from 1 GB to 100 GB per month. Developers, overloaded trying to control cloud costs in a blizzard of choices, are not thrilled with the new “flexibility”. The answer? A Google FAQ encourages customers to “better align their applications” to these new business models to mitigate some of the price changes. IT cannot compare millions of choices, however, and still hope to get actual work done. They need a sustainable methodology and the capability for centralized, automated cloud resource management to correctly match provider options to consumption. We recommend a dynamic monthly sequence of establishing visibility, forecasts, then budgets, governance, contract commitments, and monitoring and adjusting. This lets organizations avoid unnecessary spending, even when the world goes upside-down and prices inflate.
Removing Human Bias From Environmental Evaluation. Commentary by Toby Kraft, CEO, Teren
Historically, environmental data has been captured and evaluated through “boots on the ground” and local area experts – relying heavily on human interpretation to glean insights. Subject matter and local experts make decisions based on data retrieved from field surveys, property assessments and publicly available data that may or may not be up to date and accurate. Additionally, the human interpretation of these data lends itself to error and inconsistencies, which can have dramatic impacts on the companies using it, such as insurance and construction organizations. Remotely-sensed data, geospatial technology, and data science can eliminate human error in environmental data by automating highly accurate, relevant data capture, processing and interpretation. Automated analytics can extract unbiased, reliable and, most importantly, replicable insights across industries to detect, monitor and predict environmental changes. Companies have long used geospatial technology for large-scale asset management, however, the data is generally limited to infrastructure without much insight into the environmental conditions within which the asset is situated. Asset owners are now combining remotely-sensed data, machine learning, and geospatial technologies to manage the environmental data surrounding an asset and proactively mitigate potential threats. Insurance and construction firms can take note and apply the same methodology to underwriting and project scoping – saving time and lowering risk before an asset is even operational.
NVIDIA: “We Are A Quantum Computing Company.” Commentary by Lawrence Gasman, President of Inside Quantum Technology
Quantum has evolved to the point where a semiconductor giant and a Wall Street darling like Nvidia is self-identifying itself as a quantum computing company. That’s a huge development for a company that’s been making strides by circling the market, creating the cuQuantum software kit for quantum simulations, currently used by Pasqal, IBM, Oak Ridge National Laboratory (ONRL) and others. The recent announcement of a new quantum compiler and a new software appliance to run quantum jobs in data centers is a further statement that Nvidia is intent on pursuing quantum market opportunities. They already serve the high-performance computing community with powerful processors and accelerated architectures. This shift will help embrace a unified programming model for hybrid classical-quantum systems.
The risks that come with big data: highlighting the need for data lineage. Commentary by Tomas Kratky, founder and CEO, MANTA
The benefits of harnessing big data are obvious — it feeds the applications powering our digital world today, like advanced algorithms, machine learning models, and analytics platforms. To get the desired value, we deploy tens or hundreds of technologies like streaming, ETL/ELT/reverse ETL, APIs or microservices. And such complexity actually poses some serious risks to organizations. A solid data management strategy is needed to remove any blind spots in your data pipelines. One heavily overlooked risk with big data architectures (or frankly any complex data architectures) is the risk of data incidents, extreme costs associated with incident resolution, and limited availability of solutions enabling incident prevention. An associated risk is low data quality. Data-driven decisions can only be as good as the quality of the underlying data sets and analysis. Insights gleaned from error-filled spreadsheets or business intelligence applications might be worthless – or in the worst case, could lead to poor decisions that harm the business. Thirdly, compliance has become a nightmare for many organizations in the era of big data. As the regulatory environment around data privacy becomes more stringent, and as big data volumes increase, the storage, transmission, and governance of data become harder to manage. To minimize compliance risk, you need to gain a line of sight into where all your organizational data has been and where it’s going.
Why semantic automation is the next leap forward in enterprise software. Commentary by Ted Kummert, Executive Vice President of Products and Engineering, UiPath
Demand for automation continues to skyrocket as organizations recognize the benefits of an automation platform in improving productivity despite labor shortages, accelerating digital transformation during pandemic-induced challenges, and enhancing both employee and customer experiences. Semantic automation enhances automation by reducing the gap between how software robots currently operate and the capacity to understand processes the way humans do. By observing and understanding how humans complete certain tasks, software robots powered by semantic automation can better understand the intent of the user, in addition to relationships between data, documents, applications, people, and processes. Robots that can understand higher levels of abstraction will simplify development and deployment of automation. Further, as software robots continue to learn how to complete tasks and identify similarities, organizations will see better outputs from their automations and can also find new opportunities to scale across the business.
The Best Data Science Jobs in the UK according to data. Commentary by Karim Adib, Data Analyst, The SEO Works
According to a report commissioned by the UK government, 82% of job openings advertised online across the UK require digital skills. While digital industries are booming, some industries are easier to break into than others, and some industries pay higher than others. The Digital PR team at The SEO Works gathered data on the average salary from top UK job boards Glassdoor, Indeed, and Prospects to reveal some of the most in-demand digital jobs along with how difficult it is to get started in them. All smart businesses and organizations now use data to make decisions, so there is a growing demand for these jobs. It’s also not too hard to get into Data Science compared with some other digital jobs. All three of the data science jobs analyzed fall between 40 and 60 on the difficulty score because of the long time frame associated with getting into data science and the degree requirements a lot of the jobs have. Data Analyst came out as the best salary to difficulty ratio in the study, with Data Scientist just behind it. More and more businesses nowadays are adopting data as a way to make informed decisions and as a result, the demand for those who work with data is constantly increasing, making it a great choice for those looking to get into the digital industry.
The Digital Transformation and Managing Data. Commentary by Yael Ben Arie, CEO of Octopai
Businesses have never had access to the amount of data that is infiltrating corporations today. As the world goes through a digital transformation, the amount of information and data that a company collects is enormous. However, the data is only useful when it can be leveraged to improve processes, business decisions, strategy, etc. How can businesses leverage the data that they have? Are businesses even aware of all the data that they possess? And is the data in the hands of the right executives so that information can be used throughout the entire organization and in every department? After all, the digital transformation is turning everyone into a data scientist and valuable data can’t be utilized solely by the BI department. In order for data to be leveraged effectively and to access the full picture, businesses need to automate it – a comparison is equal to a Google search versus going to the library. Manually researching the data flows is almost impossible and leaves the data untrustworthy and prone to errors. Automating data and implementing a centralized platform that extracts metadata and presents it in a visual map from all systems and spreadsheets provides an accurate and efficient process to discover, and track data and its lineage, within seconds. The advantage of doing such, enables any user to find the data they need, make faster business decisions, providing an insight into the true metrics of the business. One of the most metamorphic aspects of the digital transformation is that data will become the foundation of corporate growth, propelling all corporate employees to take part in data discovery. As the digital transformation continues to move forward we should expect to see more emphasis on verifying the accuracy and “truth” of the data that was uncovered.
Why adaptive AI is key to a fairer financial system. Commentary by Martin Rehak, Co-founder and CEO of Resistant AI
Despite the best intentions of regulators, the world’s financial system remains the biggest, most critical, and yet most obscure network underpinning our global economy and societies. Attempts to fight the worst the world has on offer — financial crime, organized crime, drug smuggling, human trafficking, and terrorist financing — are generally less than 0.1% effective. And yet, it is this very same system that the world’s largest economies are relying on to sanction Russia into curtailing its aggression in Ukraine. At the heart of this inefficiency is a natural mismatch any data scientist would recognize: an overwhelming amount of economic activity that needs to be detected, prioritized, analyzed, and reported by human financial crime investigators, and all within the context of jurisdictionally conflicting and ever-updating compliance regulations. Previous attempts to solve this problem with AI have usually relied on expensive and rigid models that both fail transparency tests with regulators and at catching ever-adaptive criminals who render them obsolete within months by adopting new tactics. Instead, the path to a financial system that actually benefits law-abiding citizens lies in nimble, fast-deploying and fast-updating multi-model AI anomaly detectors that can explain each and every finding at scale. That path will require constant collaboration between machines, financial crime investigators, and data scientists. Failure to build a learning cycle that includes human insights is metaphorically throwing our hands up at the idea of a fairer and safer financial future for all.
The need for strong privacy regulations in the US is greater than ever. Commentary by Maciej Zawadziński, CEO of Piwik PRO
The General Data Protection Regulation (GDPR) is the most extensive privacy and security law in the world containing hundreds of pages’ worth of laws for organizations worldwide. Though it was put into effect by the European Union (EU), it imposes requirements onto the organizations that target or collect data related to people in the EU as well. Google Analytics is by far the most popular analytics tool on the market, but the recent decision of the Austrian Data Protection Authority, the DSB, states the use of Google Analytics constitutes a violation of GDPR. The key compliance issue with Google Analytics stems from the fact that it stores user data, including information about EU residents, on US-based cloud servers. On top of that, Google LLC is a US-owned company and is therefore subject to US surveillance laws, such as the Cloud Act. Companies that collect data of EU residents need to rethink their choices as more European authorities will soon follow the DSB’s suit, possibly resulting in a complete ban on Google Analytics in Europe. The most privacy-friendly approach would be to switch to an EU-based analytics platform that protects user data and offers secure hosting. This will guarantee that you collect, store and process data in line with GDPR.
Could AI support a future crisis by strategically planning a regional supply chain? Commentary by Asparuh Koev, CEO of Transmetrics
If you haven’t heard the word enough, collaboration within the supply chain is what will provide sustainable, long-term futures for our retailers, shippers, manufacturers, and suppliers combined. And with the extent and size of today’s business network, joining forces without AI is no longer an option. From the pandemic to the Russia-Ukraine war causing unexpected havoc on an already beaten chain of backlogs, consolidating data sources, forecasting demand, and initiating ‘just-in-case’ stock planning is the move for successful supply chains. Armed with complete transparency and visibility of data, historically isolated functions can benefit from AI’s power to read multitudes of information in seconds and create optimal scenarios for capacity management, route optimization, asset positioning, and last-mile planning. Fully integrated supply chains can work with real-time and historical data to enhance pricing models by understanding the entire market while increasing early detection and disruptions in advance of a crisis. This enables scenario planning on a scale that has never been seen before, indispensable in a time of crisis.
US AI Bias Oversight? Commentary by Sagar Shah, Client Partner at Fractal Analytics
There was era where explainability was talked about, then came era of fairness and now it’s privacy and monitoring. Privacy and AI can co-exist with the right focus on policy creation, process compliance, advisory and governance. It needs a lot of assistance in advisory to educate companies in the ethical use of AI respecting transparency, accountability, privacy and fairness. “Privacy by Design” is a pillar which is becoming stronger in a cookie-less world. Many companies are exploring avenues to make personalization engines with in this new normal, to make it a win-win for consumer experience and customized offerings. Differential privacy injects noise to decrease correlation between features. However, it is not a foolproof technique since the injected noise can be traced backwards by the data science professional.
How can bad data ruin companies? Commentary by Ben Eisenberg, Director of Product, Applications and Web at People Data Labs
As all businesses become more data-driven, questions of data quantity will give way to questions of data quality. And with businesses increasingly leaning heavily on data to guide more of their decision-making, the risk associated with bad data has grown. Where once bad data might have a limited impact, it can now proliferate across multiple systems and processes leading to widespread dysfunction. To avoid these problems, businesses should prioritize investing in data that is compliantly sourced. Data is increasingly regulated across states and regions, and it’s important that any data you acquire from a third party be fully compliant. That means checking your vendor’s privacy compliance and questioning their practices around ensuring compliance from their sources. Another tactic is keeping data fresh. Most of the data businesses rely on reflects individual human beings, and human beings are not static. Every year millions of people move, change jobs, get new contact information, take out new loans, and adopt new spending habits. The fresher your records, and the more often you enrich them with fresh data, the more likely you are to avoid data decay that can diminish the value of your data and lead to problems.
World Backup Day. Commentary by Pat Doherty, Chief Revenue Office at Flexential
We’ve learned to expect the unexpected when it comes to business disruption, illustrating the immense need for proper backup solutions. In 2022, investment in Disaster Recovery-as-a-Service (DRaaS) will be a major theme for businesses of all sizes to ensure long-term business success and survival —no matter the disruption. Moving DRaaS to a secondary site cloud environment can ensure that data is safe and secure and that organizations can operate as normal even when employees are not on site.
World Backup Day. Commentary by Indu Peddibhotla, Sr. Director, Products & Strategy, Commvault
Enterprise IT teams today are increasingly starting to realize that backup extends far beyond serving as their ‘last line of defense’ against cyberattacks. It can now help them take the offense against cybercriminals, by allowing them to discover and remediate cyberattacks before their data is compromised. For example, data protection solutions now have the ability to detect anomalous behaviors indicating a threat to a company’s data. In addition, emerging technologies will soon allow enterprise IT teams to create deceptive environments that can trap cybercriminals. These features, coupled with other early warning capabilities, will allow companies to use their backups to detect, contain, and intercept cyberattacks before they can lock, alter, steal, or destroy their data.
World Backup Day. Commentary by Stephen McNulty, President of Micro Focus
When disasters occur, organizations suffer. That is why they see backups, recovery, and security of data and systems as crucial for business continuity. Backups are an essential practice to safeguard data, but they are not the most important step. While they do indeed ensure availability and integrity of data, I believe recovery strategies should take precedence. Here’s why – it is the ability to restore data and systems to a workable state, and within a reasonable time frame, that makes backups valuable. Without this ability, there is no point in performing the backup in the first place. Furthermore, backups must also be complemented with adequate security controls. To that end, business leaders should consider the Zero Trust model, which implements a collection of solutions covering a range of needs – from access control and privilege management to the monitoring and detection of threats. This will ultimately provide the best protection possible as information travels across devices, apps, and locations.
World Backup Day. Commentary by Brian Spanswick, CISO at Cohesity
While all eyes are on backup today, organizations must strive for holistic cyber resilience — and recognize that backup is just one component of a much larger equation. Achieving true cyber resilience means developing a comprehensive strategy to safeguard digital assets, including integrated defensive and recovery measures that give organizations the very best chance of weathering the storm of a cyber attack. Organizations should embrace a next-gen data management platform that enables customers to adopt a 3-2-1 rule to data backups, ensure data is encrypted both at transit and at rest, enable multi-factor authentication, and employ zero trust principles. Only then can organizations address mass data fragmentation challenges while also reducing data proliferation. Further, backups that can be restored to a precise point in time deliver the business continuity required for organizations to not only survive attacks, but continue to thrive in spite of them.
World Backup Day. Commentary by Brian Pagano, Chief Catalyst and VP at Axway
It is important to distinguish between “syncing” and “backup”, most people conflate the two. In order to qualify as a backup, you should be able to do a fresh install with complete data recovery. Sync is designed to allow you to work seamlessly across devices by pushing deltas to the cloud. But if something happens to corrupt your local copy, that corruption may get synced and propagate across your devices. Organizations can help customers with backups by allowing easy export of a complete data in a standard (not proprietary) format. You as a user have the responsibility of keeping a copy of your data on a local or remote drive that is not connected to sync. You must periodically do this (either manually or with a script that triggers after a certain period).
World Backup Day is a great reminder for businesses to take a closer look at their full business continuity and disaster recovery (BCDR) plans—which includes everything from the solutions they use to their disaster recovery run book. The shift to remote working completely transformed the way organizations protect and store their data. Today, there is a greater focus on protecting data no matter where it lives — on-prem, on the laptops of remote employees, in clouds and in SaaS applications. Recovery time objectives (RTOs) are increasingly shrinking in today’s always-on world, with goals being set in hours—if not minutes. Cybercriminals have taken advantage of the remote and hybrid work environments to conduct increasingly sophisticated cyberattacks, and the data recovery process post-incident has become more complex due to new cyber insurance requirements. These new regulations include critical audits and tests that businesses must comply with in order to restore their data and receive a payout after an attack—which can slow down the recovery process. With data protection becoming increasingly complex, more organizations are turning to vendors that provide Unified BCDR, which includes backup and disaster recovery, AI-based automation and ransomware safeguards as well as disaster recovery as a service (DRaaS). Unified BCDR has become a necessity due to the growing amount of data organizations must protect and the increasing number of cyberattacks taking place against businesses of all sizes.
Sign up for the free insideBIGDATA newsletter.
Join us on Twitter: @InsideBigData1 – https://twitter.com/InsideBigData1