Welcome to insideBIGDATA’s “Heard on the Street” round-up column! In this new regular feature, we highlight thought-leadership commentaries from members of the big data ecosystem. Each edition covers the trends of the day with compelling perspectives that can provide important insights to give you a competitive advantage in the marketplace. We invite submissions with a focus on our favored technology topics areas: big data, data science, machine learning, AI and deep learning. Enjoy!
Paradigm shifts needed to establish greater harmony between artificial intelligence (AI) and human intelligence (HI). Commentary by Kevin Scott, CTO, Microsoft.
“Comparing AI to HI is a long history of people assuming things that are easy for them will be easy for machines and vice versa, but it’s more the opposite. Humans find becoming a Grand Master of chess, or performing very complicated and repetitive data work, difficult, whereas machines are easily able to do those things. But on things we take for granted, like common sense reasoning, machines still have long way to go. … AI is a tool to help humans do cognitive work. It’s not about whether AI is becoming the exact equivalent of HI—that’s not even a goal I’m working toward.”
Importance of utilizing the objectivity of data to discover areas of opportunity within organizations. Commentary by Eric Mader, Principal Consultant, AHEAD.
“During this era of accelerated tech adoption and digital transformation, more and more companies are turning to data collection and storage to fuel business decisions. This is, in part, because 2020 lacked the stability to serve as a functional baseline for business decisions in 2021. With this surge of organizations leaning on data and analytics, it’s essential to have a clear understanding of how this information can be helpful, but also harmful. These companies may inherently understand the importance of analyzing their data, but their biggest problem lies in their approach to preventing data bias. Accepting the natural appetite within organizations to lead their data to some degree is an important first step in capitalizing on the true objectivity of data. Luckily, there is one piece of advice that organizations must remember to avoid falling prey to confirmation bias in data analytics: Be careful how you communicate your findings. It’s a simple practice that can make all the difference. While the researchers analyzing the data should be aware of common statistical mistakes and their own biases towards a specific answer, careful attention should also be paid to how data is presented. Data teams have to avoid communicating their findings in ways that might be misleading or misinterpreted. By upholding a level of meticulousness within your data strategy, organizations can ensure that their data approach is working with them and not against them.”
Delivering Better Patient Experiences with AI. Commentary by Joe Hagan, Chief Product Officer at LumenVox.
“Call management is critical in the healthcare industry to support patient needs such as scheduling, care questions and prescription refills. However, data suggests that more than 50% of contact agents’ time is spent on resetting passwords for patient applications and portals – each password taking three or more minutes to reset. How can healthcare call centers overcome this time suck and better serve patients? AI-enabled technologies such as speech recognition and voice biometrics. According to a survey from LumenVox and SpinSci, nearly 40% of healthcare providers want to invest in AI for their contact centers in the next one to three years. The great digital disruption in healthcare that prioritizes the patient experience is here. As healthcare contact centers take advantage of technologies such as AI, they will be better equipped to deliver high quality service to patients.”
AI is the Future of Video Surveillance. Commentary by Rick Bentley, CEO of Cloudastructure.
“Not long ago, if you wanted a computer to recognize a car then it was up to you to explain to it what a car looks like: “It’s got these black round things called ‘wheels’ at the bottom, the ‘windows’ kind of go around the top half-ish part, they’re kind of rounded on top and shiny, the lights on the back are sometimes red, the ones up front are sometimes white, the ones on the side are sometimes yellow…” As you can imagine, it doesn’t work very well. Today you just take 10,000 pictures of cars and 50,000 pictures of close-but-not cars (motorcycles, jet skis, airplanes…) and your AI/ML platform will do a better job of detecting cars than you ever could. Intelligent AI and ML powered video solutions are the way of the future, and if businesses don’t keep up, they could be putting themselves and their employees at risk. Two years ago, more than 90% of enterprises were still stuck on outdated on-premises security methods, many with limited if no AI intelligence. The industry is under rapid transformation as they take advantage of this technology and move their systems to the cloud. Enhanced AI functionality and cloud adoption in video surveillance have allowed business owners and IT departments to monitor the security of their businesses from the safety of their homes. The AI surveillance solution can be accessed from any authorized mobile device and the video is stored safely off premises so that it cannot be hacked, and it is safe from environmental hazards. Additionally, powerful AI analytics allow intelligent surveillance systems to sort through large volumes of footage to identify interesting activity more than 10x faster and more accurately than manual, on-premises solutions. In the upcoming years, AI functionality will continue to get more and more advanced allowing businesses to generate real-time insight and enable a rapid response to incidents resulting in a more efficient and safer society.”
How Businesses Can Properly Apply and Maximize AI’s Impact. Commentary by Bren Briggs, VP of DevSpecOps at Hypergiant.
“Businesses that do not utilize AI and ML solutions will become increasingly irrelevant. This is not because AI or ML are magic bullets, but rather because utilizing them is a hallmark of innovative, resilience-first thinking. For businesses to maximize the impact of AI, they must first pose critical business questions and then seek solutions that streamline data management as well as improve and strengthen business processes. AI, when utilized well, helps companies predict problems and then act to swiftly respond to those challenges. I always encourage companies to focus on the basics: hire the experts, set up the models for success, and pick the AI solutions that will most benefit their organization. In doing that, companies should ramp up their data science and MLOps teams, which can help assess which AI problems are most likely to be successful and have a strong ROI. A cost/benefit analysis will help you determine if an AI integration is actually the best use of your company’s resources at any given time.”
AI as a “Software Developer” in the Context of Product Development. Commentary by Jonathan Grandperrin, CEO, Mindee.
“As artificial intelligence continues to take great strides and more machine learning based products for engineering teams emerge, a challenge arises, a pressing need for software developers to skill up and understand how AI functions and how to appropriately leverage them. AI’s capabilities to automate tasks and optimize multiple – if not all processes – have revolutionized the world. To reach the promised efficiency, AI must be integrated into all day-to-day products, such as websites, mobile applications, even products like smart TVs. However, a problem comes up for developers in this context: AI does not rely on the same principles as software development. Different from software development, which relies on deterministic functions, most of the time, AI is based on a statistical approximation, which changes the whole paradigm from the point of view of a software developer. It is also the reason behind the rise of data science positions in software companies. To succeed, developers must hold the capacity to create AI models from scratch and provide technical teams with ML features that they can understand and utilize. Fortunately, it’s becoming increasingly common to see ML libraries for developers. In fact, those looking to ramp up on their ML skills can participate in intro courses that can easily extend their skillset.”
How to Solve ML’s Long Tail Problem. Commentary by Russell Kaplan, Scale AI’s Head of Nucleus.
“The most common problem in machine learning today is uncommon data. With ML deployed in ever more production environments, challenging edge cases have become the norm instead of the exception. Best practices for ML model development are shifting as a result. In the old world, ML engineers would collect a dataset, make a train/test split, and then go to town on training experiments: tuning hyperparameters, model architectures, data augmentation, and more. When the test set accuracy was high enough, it was time to ship. Training experiments still help, but are no longer highest leverage. Increasingly, teams hold their models fixed while iterating on their datasets, rather than the other way round. Not only does this lead to larger improvements, it’s also the only way to make targeted fixes to specific ML issues seen in production. In ML, you cannot if-statement your way out of a failing edge case. To solve long tail challenges in this way, it’s not enough to make dataset changes—you have to make the right dataset changes. This means knowing where in the data distribution your model is failing. The concept of “one test set” no longer fits. Instead, high performing ML teams curate collections of many hard tests sets covering a diversity of settings, and measure accuracy on each. Beyond helping inform what data to collect and label next to drive the greatest model improvement, the many-test-sets approach also helps catch regressions. If aggregate accuracy goes up but performance in critical edge cases goes down, your model may need another turn before you ship.”
Top Data Privacy Considerations for M&As. Commentary by Matthew Carroll, CEO, Immuta.
“M&A deals continue to soar globally, with the technology and financial services and insurance and industries leading the pack. However, with the increase in deals comes an increase in deals that fall through. Research shows that one of the primary reasons M&A deals fall through at a surprisingly high rate — between 70% and 90% — is data privacy and regulatory concerns as more companies move their data to the cloud. M&A transactions lead to an instantaneous growth in the number of data users,but the scope of data used is often complex and risky – especially when it involves highly-sensitive personal, financial or health-related data. With two companies combining their separate vast data sets, it’s imperative to find an efficient way to ensure that data protection methods and standards are consistent, that only authorized users can access data for approved uses, and privacy regulations are adhered to across jurisdictions and the globe. Merging data is just the beginning. Once mergers are completed, the joint entities must be able to provide comprehensive auditing to prove compliance. Without a strong data governance framework, stakeholder buy-in, and automated tools that work across both companies’ data ecosystems, this can lead to unmanageable and risk-prone processes that inhibit the value of the combined data and could lead to data vulnerabilities.”
Why Training Your AI Systems Is More Complex Than You Think. Commentary by Doug Gilbert, CIO and Chief Digital Officer, Sutherland.
“Few enterprises, if any, are ready to deploy AI systems or ML models that are completely free from any form of human intervention or oversight. When training algorithms, it’s important to first understand the inherent risks of bias from the training environment, the selection training data and algorithms based upon human expertise in that particular field, and the application of AI against the very specific problem it was trained to solve. The avoidance of any or all of these can lead to unpredictable or negative outcomes. Human oversight – using methods such as Human-in-the-Loop (HitL), Reinforcement Learning, Bias Detection, and Continuous Regression Testing – helps ensure AI systems are trained adequately and effectively to deal with real-life interactions, work and use cases and create positive outcomes.”
Scientific vs. Data Science Use Cases. Commentary by Graham A. McGibbon, Director of Partnerships, ACD/Labs.
“Current scientific informatics systems support electronic representations of data obtained from experiments and tests which are often confirmatory analyses with interpretations. Data science is more often exploratory and the supporting systems typically rely on data pipelines and large amounts of clean and comprehensive data required for appropriate statistical treatments. Data science systems are founded on large volumes of data being “well-identified” via metadata, which is needed for the critical capability of machines to self-interpret these large datasets, and subsequently derives correlations and predictions that are otherwise not obvious. Ultimately, some of these systems could cycle continuously and autonomously given sufficient coupling with automated data generation technologies. However, scientists want the ability to judge the output of their analyses and view and explore unanticipated features in their data along with any machine-derived interpretations. Consequently, these scientific consumers need representations of results that they can easily evaluate. When comparing the current output capabilities of data science systems to contemporary or historical scientific systems, they lack some of the semiotics that domain-specific scientists expect. As such, there remains a need to bridge data science and domain-specific science, particularly if changes are desired in the latter to make it machine-interpretable for further adoption. It’s important to understand that data science and domain-specific science will likely have to make adjustments for accommodating the other to ultimately reap the full benefit of generating human-interpretable knowledge outputs.”
Why Predictive Analytics are Increasingly Important for Smart, Sustainable Operations. Commentary by Steve Carlini, Vice President of Innovation and Data Center at Schneider Electric.
“In the data center world, predictive analytics are used mainly for critical components in the power and cooling architecture to prevent unplanned downtime. For example, a DCIM solution can look at UPS system batteries and collect data on a number of discharges, temperatures, and overall ageto come up with recommendations for battery replacement. These recommendations are based on different risk tolerances, for example, the software will say something like, “Battery System 4 has a 20% chance of failure next week and a 50% chance of failure with 2 months.” Facility operators can then manage risk and make informed decisions regarding battery replacement. When using analytics on larger data centers, it is important that facility-level systems are included because they are the backbone of IT rooms. Power system software must cover the medium voltage switchgear, the busway, the low voltage switchgear, all the transformers, all the power panels, and the power distribution units. Cooling system software must cover the cooling towers, the chillers, the pumps, the variable speed drives, and the Computer Room Air Conditioners (CRACs). Due to the scale and level of machinery in larger data centers, it’s necessary that all systems are included for comprehensive predictive analytics. As edge data centers become a critical part of the data center architecture and are deployed at scale, DCIM software benefits from unlimited cloud storage and data lakes. Predictive analytics become highly valuable as almost none of these sites are manned with service technicians. The DCIM system can leverage predictive analytics with a certain degree of linkage and automation to dispatch service personnel and replacement parts. As more data is collected, the accuracy of these analytics leveraging machine learning models will become trusted. This is already in process, as even today operators of mission critical facilities have the ability to plan or design systems with less physical redundancy and rely on the software for advanced notifications regarding battery health.”
Sign up for the free insideBIGDATA newsletter.
Join us on Twitter: @InsideBigData1 – https://twitter.com/InsideBigData1