Looking Beyond the Incumbent: Setting New Goals for Machine Learning & Artificial Intelligence for 2022 – insideBIGDATA

Machine learning and artificial intelligence have given us great things in the last several years. Along with recommending the next TV show to binge or new song to listen to, machine learning and artificial intelligence have improved safety in energy, financial services and transportation, as well as helped pharmaceuticals like Johnson & Johnson speed COVID-19 vaccine development. 

CEOs, politicians, doctors, teachers, researchers—people from all walks of life have benefited, yet the triumphs of machine learning and artificial intelligence have also exposed their limitations. For instance, what to do if you are looking for a cure to one of the 6,800 rare diseases in the world (Source: National Human Genome Research Institute)? 

Or what if you are a small company that wants to take on Google or Amazon? Lacking access to unlimited data and GPUs, should you give up now, and just resign yourself to permanent second-class status? And while we are on the subject of class, what do we say to diversity-minded HR professionals dealing with racial, gender and class biases coded right into their Talent Acquisition software?

<!–

–>

<!–

–>

The bottom line is that without access to deep stores of the right data, machine learning and artificial intelligence can be worse than useless; they can deliver results that cost companies revenue and damage their brands. A recent report from Gartner found that the average US company loses between $9.7 million and $14.2 million annually to bad data. At the aggregate level, IBM estimates that bad data drains companies of more than $3 trillion per year.

As we begin setting priorities for 2022, I would like to propose an alternative for organizations of all sizes to develop more effective ways to leverage machine learning and artificial intelligence. The goal is not to disparage anyone’s approach; rather, it is to offer a framework that empowers smaller stakeholders, alongside major players seeking to innovate. 

Problems with the Incumbent Approach to Data

Like the assembly-line workers, packers, and delivery people that are the links of the real-world supply chain, data scientists, data labelers, and program managers comprise the data-driven, virtual supply chain upon which machine learning and artificial intelligence depend. For us to improve how we leverage data, we need to forge a new chain that doesn’t perpetuate flaws. 

In an encouraging recent story, Etsy gave AI and other machine learning tools to its army of 5 million artisans. The immediate objective was to help sellers hurt by the pandemic pivot toward essential items like face masks and hand sanitizers. The tools Etsy gave its community included the same cutting-edge data science, AI and marketing applications used by major retailers.

The approach yielded immediate results. With the traditional supply chain in tatters and the demand for masks surging, Etsy’s shares climbed 600% since a March 2020 pandemic low, and active buyers and sellers have doubled to 90 million and 5 million, respectively. Thanks to renewed momentum, analysts are betting that Etsy will hit a 30% sales increase by end-of-year 2021. Whether this bottom-up approach will turn the artisan marketplace into the “anti-Amazon” remains to be seen, but, for now, they seem to be on the right track.

Large companies are also getting in on the action. When developing Alexa, Etsy’s (and everybody else’s) nemesis Amazon realized that its internal testing team would not generate enough data, so it turned to an outside firm that rented out apartments and homes around Boston filled with electronics embedded with the technology. Contractors were instructed to read scripts containing “open-ended queries.” The process went on six days a week for six months, with 20+ intelligent devices per test-site capturing every grunt and syllable. 

According to Brad Stone in Amazon Unbound (Simon & Schuster, 2021), the raw, unlabeled data generated was so helpful to Amazon developers that the program was expanded to ten cities around the US. Today, Alexa has gained 100,000 skills since launch (source: TechCrunch), approximately 10.8% of digital buyers used Amazon Alexa for online shopping in 2020 (source: eMarketer), and 130 million Alexa-powered Echo speakers are expected to be sold by 2025 (source: eMarketer).

The lesson here is that in order to create a new offering, both Amazon and Etsy had to step outside their incumbent approach to managing data. The reason, as Herbert Roitblat explains in Algorithms Are Not Enough (MIT Press, 2020), is that machine learning and artificial intelligence over-rely on existing data to make predictions. They can’t solve problems for which the data is missing or biased on their own.”

What Organizations Should Do in 2022

For organizations launching new data-driven offerings in 2022, stepping outside your existing data stack is the first step. The next, as we learned from both Etsy and Amazon, is putting tools in the hands of domain experts and business owners. Unlike the approach which gives all the power to specialists, this lean approach speeds development, by removing unnecessary complexity.  

According to a recent “State of Data Science” survey of 4,200 professionals in 140 countries by Anaconda, data scientists spend 45% of their time on data preparation, 19% on data loading, 21% on visualization—leaving only 11% to 12% of their day for model selection, training and scoring. With the average data scientist earning $100,560 (source: U.S. Bureau of Labor Statistics), the incumbent approach to data is costly, as well as frequently inefficient.

Knowing the problem you are solving is always key. If you are a startup or an established organization launching a new offering, stepping outside your stack, putting tools in the hands of business and domain owners is the way to go, as well as creating an interactive loop. Like the sprints used by agile product specialists, an interactive loop favors AI-assisted exploration of large sets of unlabeled data. Discovery and quick actions from subject matter experts lead to evolving schemas, more targeted exploration, and new discoveries. Interactivity goes both ways: the models become more accurate and robust, the subject matter experts become smarter and more knowledgeable.

As Eric Ries advises in The Lean Startup, the interactive loops should continue until the business, the data, and the machine learning tools demonstrate 1:1 alignment. Done right, your organization should not simply work better and smarter, it will unlock the true capabilities of machine learning and artificial intelligence to transform how all of us—regardless of data access and processing muscle—live and do business in 2022 and beyond.

About the Author

Patrice Simard, a former Microsoft executive, is CEO and Co-founder of Intelus.ai, the leading no-code/no-data science platform.

Join us on Twitter: @InsideBigData1 – https://twitter.com/InsideBigData1

Sign up for the free insideBIGDATA newsletter.

Spread the love

Leave a Reply

Your email address will not be published.