As humans, we learn both through explicit instruction under the supervision of an expert and autonomously through interactions with our environment. Starting from early childhood, our brains are constantly absorbing sensory inputs and building connections between inputs and experiences. The human brain exhibits a dramatic expansion in size in our formative years, allowing us to retain these early lessons as we age. Arguably, most of our education about how to operate in the real world comes from autonomous learning and from our ability to generalize such learnings in contextually appropriate ways.
In contrast, despite remarkable recent advances, artificial intelligence systems still rely disproportionately and often wholly on learning with supervision. And even the most knowledgeable AI agents can lack the ability to apply common sense reasoning. For example, a question such as “how long would it take to swim to the moon?” may elicit an “I don’t know” instead of “you cannot swim to the moon.”
In seeking to advance AI to the next level of performance, researchers today are starting to explore foundational elements of generalizability and autonomous learning. For example, recently they’ve been exploring increasingly larger neural network models for language processing and computer vision tasks. This can been seen as an attempt to create essential learning infrastructure that enables connections between inputs and outputs (or actions/experiences). Just creating larger models is not a scalable approach in the long run, but for now, they allow researchers to expand the limits of learning without direct supervision.
In my role at Amazon as VP of Alexa AI, I have found that there are three emerging directions in research across the industry that are exciting and offer the promise of ushering in a new “age of self” in AI. One of these directions is AI being able to continuously learn through interactions (self-learning). Another involves it gaining the ability to perform some common-sense reasoning tasks without being pre-programmed for them, by maintaining a sense of its state in the context of its operating environment (self-awareness). Importantly, the third direction is making AI’s capabilities more readily accessible to everyone (self-service).
Self-learning is crucial for AI to improve and expand its capabilities without human intervention. A recent advancement that falls into this category is the AI system Generative Pretrained Transformer 3 (GPT-3), which learns to summarize and compose text simply by reading a lot of text. Ask GPT-3 to “write a poem in the voice of E.E. Cummings about the pandemic,” and the program will output an uncanny reflection of the poet’s style—his tone, characteristic use of punctuation, and imagery.
GPT-3’s ability is predicated upon its self-learning: When a developer inputs an unfamiliar command, the system is able to infer what might have been intended by reviewing a database of information—for instance, the foundational principles of English grammar and language, E.E. Cummings’ body of work, and notable events surrounding the pandemic—and use its research to reason a logical, informed response.
Today, when asked “What is the time?” a voice service such as Alexa leverages knowledge of the device location to provide the time of day for that time zone. In the absence of that knowledge, the user would have to say “What is the current time in Los Angeles?” even when the user and device are both in Los Angeles. Clearly even a limited self-awareness (in this case, of its current location) allows an AI to interact naturally with a human user.
A key question for researchers is, how do we scale this self-awareness to enable far more complex yet natural and frictionless experiences? A home AI assistant that maintains understanding of ambient state—such as time of day, thermostat readings, and recent actions—and employs common-sense reasoning to make inferences that also incorporate world knowledge could enable truly magical experiences.
Most machine learning models are still black boxes and are often unable to explain the rationale behind specific outcomes.
Consider a self-driving car working in conjunction with a home AI assistant. On Mondays at 8 am, you usually leave for work and turn off the lights in your house, which informs your smart home to start the engine of your car to help it warm up (it’s December, and your car knows the weather is brisk that day). Upon entering the car, you use your voice assistant to instruct the car to suggest the optimal route for work—so the system identifies your current location and destination, reviews traffic data to infer an efficient route, and then begins your journey. Further, the AI detects it is likely to rain and recommends you take an umbrella.
Yes, some of these visions of the future can appear to be beyond the edge of ambition but even accomplishing a part of such visions in the near-term can benefit users. For example, Alexa hunches can recognize anomalies in a customer’s daily routines—such as noticing that a light was left on at night—and suggest corrections such as offering to turn the light off. Powered by common-sense reasoning, self-awareness can go further: if a customer turns on the television five minutes before the kids’ soccer practice is scheduled to end, AI might infer that the customer needs a reminder about pickup.
As AI enters the “age of self,” and self-learning and awareness continue to advance, smart systems will also include self-service features, thereby democratizing AI. Users without any AI expertise are already able to customize smart systems and utilize their devices in ways that in the past may have been technically limiting or intimidating, if not impossible. Currently, developers and programmers who have software expertise but no AI expertise can build new Alexa skills and capabilities. In this coming “age of self,” the aspiration is for individuals with no programming experience to accomplish similar tasks.
For example, an increased emphasis on low- and no-code machine learning frameworks will result in advances that allow users to train, test and deploy deep learning models without needing to write novel code. This will enable people worldwide to influence and shape the future development of AI capabilities for a diverse set of use cases–from healthcare to content delivery to education.
Of course, the age of self is not upon us just yet. Most machine learning models are still black boxes and are often unable to explain the rationale behind specific outcomes. This lack of self-explanation with current machine learning models—the ability for AI to explain in detail how it interprets and deduces information to determine a specific action—is a gap that has attracted many bright minds who are already demonstrating encouraging results. But every year, with contributions from a global research community, we are making progress toward this new age that will more fully redeem the promise of AI for everyone, everywhere.
Prem Natarajan is a Vice President in Amazon’s Alexa and leads a multidisciplinary science, engineering, and product organization focused on areas such as natural language understandin. Before joining Amazon, he was senior vice dean of engineering at the University of Southern California and executive vice president and principal scientist for speech, language, and multimedia at Raytheon BBN Technologies.