- MIT, Cornell, and McGill University researchers have developed an AI system that can independently learn the rules and patterns of human languages.
- This model can also learn higher-level linguistic patterns that can be applied to several languages, allowing it to achieve greater outcomes.
- The model was trained and evaluated using issues from linguistics textbooks featuring 58 distinct languages.
- The researchers want to apply this approach in the future to uncover surprising answers to challenges in other disciplines.
Human languages are infamously complicated, and linguists have long believed that teaching a machine to examine speech sounds and word patterns like human investigators do is impossible.
Table of Contents
Self-learning AI that can understand language patterns
Researchers at MIT, Cornell University, and McGill University, on the other hand, have taken a step in this direction. They developed an artificial intelligence system to self-learn human language norms and patterns.
When given words and examples of how those words change in one language to indicate other grammatical functions such as tense, case, or gender, this machine-learning model generates rules that explain why the forms of those words vary. For example, it may discover that the letter “a” needs to be added to the end of a word in Serbo-Croatian to turn the masculine form feminine.
This model can also learn higher-level linguistic patterns that can be applied to several languages, allowing it to achieve greater outcomes.
The model was trained and evaluated using issues from linguistics textbooks featuring 58 distinct languages. Each challenge included a different collection of words and word-form alterations. For 60% of the situations, the model provided a valid set of rules to explain the word-form modifications.
This system might examine language hypotheses and uncover minor commonalities in how different languages change words. It is particularly noteworthy since the system develops models that people easily understand, and it does it using minimal amounts of data, such as a few hundred words.
Furthermore, rather than employing a single big dataset for a single task, the system employs numerous tiny datasets. It is more akin to how scientists propose hypotheses – they examine multiple related datasets and develop models to explain occurrences across those datasets.
Kevin Ellis, an assistant professor of computer science at Cornell University and the paper’s primary author, stated, “One of the motivations of this work was our desire to study systems that learn models of datasets that are represented in a way that humans can understand. Instead of learning weights, can the model learn expressions or rules? And we wanted to see if we could build this system so it would learn on a whole battery of interrelated datasets, to make the system learn a little bit about how to model each one better.”
MIT faculty members Adam Albright, a professor of linguistics; Armando Solar-Lezama, a professor and associate director of the Computer Science and Artificial Intelligence Laboratory (CSAIL); and Joshua B. Tenenbaum, the Paul E. Newton Career Development Professor of Cognitive Science and Computation in the Department of Brain and Cognitive Sciences and a member of CSAIL; as well as senior author Timothy J. O’Donnell, assistant professor in the Department of Brain and Cognitive Sciences, also contributed to the paper. The findings were published in Nature Communications.
Teaching artificial intelligence human languages
The researchers opted to investigate the relationship between phonology (the study of sound patterns) and morphology (the study of word structure) to construct an AI system that could automatically generate a model from many related datasets.
Because many languages share essential characteristics, and textbook problems highlight specific linguistic occurrences, data from linguistics textbooks provided an appropriate testbed. College students may also handle textbook issues in a pretty basic manner. Still, they often have prior knowledge of phonology from previous sessions that they use to reason about new difficulties.
Ellis, who received his Ph.D. from MIT and was mentored jointly by Tenenbaum and Solar-Lezama, first heard about morphology and phonology in an MIT course co-taught by O’Donnell, a postdoc at the time, and Albright.
Albright said, “Linguists have thought that to understand the rules of a human language, to empathize with what it is that makes the system tick, you have to be human. We wanted to see if we can emulate the kinds of knowledge and reasoning that humans (linguists) bring to the task,”
The researchers utilized a machine-learning approach known as Bayesian Program Learning to create a model that could learn a set of rules for building words, known as grammar. The model addresses an issue by building a computer program using this method.
The program in this example is the grammar that the model believes is the most plausible explanation for the words and meanings in a linguistics problem. They created the model with Sketch, a popular software synthesizer created by Solar-Lezama at MIT.
However, Sketch might take a long time to choose the most likely program. To avoid this, the researchers had the model work in pieces, producing a tiny program to explain some data, then a larger program that updates the original program to cover additional data, and so on.
They also constructed the model such that it can learn what “good” programs look like. Because the languages are related, it may learn some basic principles on easy Russian issues that it will apply to a more difficult problem in Polish. This simplifies the model’s solution to the Polish problem.
Tackling textbook problems with the new AI
When they tested the model on 70 textbook problems, it found a grammar that correctly matched the whole set of words in the problem in 60% of cases and most of the word-form alterations in 79% of cases.
The researchers also attempted pre-programming the model with the knowledge that it “should” have learned if it had taken a linguistics course, and it solved all challenges better. Albright stated that “One challenge of this work was figuring out whether what the model was doing was reasonable. This isn’t a situation where there is one number that is the single right answer. There is a range of possible solutions which you might accept as right, close to right, etc.”
The model frequently generated surprising solutions. In one case, it uncovered not just the predicted answer to a Polish language problem but also another valid option that used a textbook error. According to Ellis, this demonstrates that the model can “debug” linguistics studies.
The researchers also ran tests to demonstrate that the model could learn certain general templates of phonological principles that could be applied to any scenario.
According to Ellis, “One of the things that were most surprising is that we could learn across languages, but it didn’t seem to make a huge difference. That suggests two things. Maybe we need better methods for learning across problems. And maybe, if we can’t come up with those methods, this work can help us probe different ideas we have about what knowledge to share across problems.”
The researchers want to apply this approach in the future to uncover surprising answers to challenges in other disciplines. They could potentially use the method in other cases where higher-level knowledge may be applied across interconnected databases. For example, Ellis suggests they may create a system to deduce differential equations from information on the motion of various objects.
He added, “This work shows that we have some methods which can, to some extent, learn inductive biases. But I don’t think we’ve quite figured out, even for these textbook problems, the inductive bias that lets a linguist accept the plausible grammars and reject the ridiculous ones.”
T. Florian Jaeger, a University of Rochester professor of brain and cognitive sciences and computer science who was not involved in the research, stated, “This work opens up many exciting venues for future research. I am particularly intrigued by the possibility that the approach explored by Ellis and colleagues (Bayesian Program Learning, BPL) might speak to how infants acquire language.”
“Future work might ask, for example, under what additional induction biases (assumptions about universal grammar) the BPL approach can successfully achieve human-like learning behavior on the type of data infants observe during language acquisition. I think it would be fascinating to see whether inductive biases that are even more abstract than those considered by Ellis and his team — such as biases originating in the limits of human information processing (e.g., memory constraints on dependency length or capacity limits in the amount of information that can be processed per time) — would be sufficient to induce some of the patterns observed in human languages,” he added.
The Air Force Office of Scientific Research, the Center for Brains, Minds, and Machines, the MIT-IBM Watson AI Lab, the Natural Science and Engineering Research Council of Canada, the Fonds de Recherche du Québec – Société et Culture, the Canada CIFAR AI Chairs Program, the National Science Foundation (NSF), and an NSF graduate fellowship contributed to this work in funding.