Researchers from MIT, Cornell and McGill University have developed a new machine learning model that self-discovers linguistic rules that often match those created by human experts

Human ability to form theories about the world is a fundamental characteristic of intelligence. In the recorded history of science, this ability is most evident, but it also emerges in more subtle ways in daily perception and during child development. Developing techniques to understand—and possibly even automate—the process of theory development is a fundamental goal of both artificial intelligence and computational cognitive science.
Linguists have long believed that training a machine to analyze speech sounds and word patterns like humans would be a challenge. However, scientists from MIT, Cornell University and McGill University have already made progress in this area. They have demonstrated the ability of an AI system to teach itself the grammar and phonological structures of a human language.
The machine learning model develops rules that explain why the forms of those words vary given words, and examples of how those words change to communicate different grammatical features (such as tense, case, or gender) in a language. For better results, this model can also automatically learn high-level language patterns common to many other languages.
58 different languages were used in tasks from linguistics textbooks, which the researchers used to train and evaluate the model. Each edition contained a specific set of words and related word modifications. For 60% of the problems, the model provided the appropriate rules to represent these word form changes.
This approach could be used to explore linguistic hypotheses and examine minute variations in word meanings in many languages. It’s particularly special because the system learns models from small snippets of data, like a few dozen words, that people can easily understand. In addition, the system uses numerous small data sets instead of a single large data set. This is closer to how scientists propose hypotheses: looking at numerous related datasets and developing models to explain phenomena in those datasets.
The researchers decided to study the relationship between phonology and morphology in their quest to create an AI system that can automatically train a model from numerous related data sets.
Since many languages share similar core characteristics, and textbook exercises emphasize certain linguistic phenomena, data from language textbooks provided an excellent testbed. College students can also deal with textbook problems quite easily, but they often have a prior understanding of phonology from previous lectures that they refer to while contemplating new difficulties.
The researchers used a machine learning method called Bayesian program learning to create a model that could learn grammar, or a set of rules for assembling words. Using this method, the model creates a computer program to address a challenge.
The program, in this case, is the grammar that the model deems the most plausible means of explaining the words and their meanings in a linguistic problem. They created the model using Sketch, a popular software synthesizer developed by Solar-Lezama at MIT.
The researchers used a machine learning method called Bayesian program learning to create a model that could learn grammar, or a set of rules for assembling words. Using this method, the model creates a computer program to address a challenge.
The program, in this case, is the grammar that the model deems the most plausible means of explaining the words and their meanings in a linguistic problem. They created the model using Sketch, a popular software synthesizer developed by Solar-Lezama at MIT.
Additionally, the model was tested to see if it could learn some universal phonological rule templates that could be used for all problems.
Researchers hope to apply this concept in the future to solve unforeseen problems in several other areas. You could also use the method in more circumstances where the application of advanced knowledge to related datasets is possible.
This Article is written as a research summary article by Marktechpost Staff based on the research paper 'Synthesizing theories of human language with Bayesian program induction'. All Credit For This Research Goes To Researchers on This Project. Check out the paper and reference article. Please Don't Forget To Join Our ML Subreddit
Tanushree Shenwai is a Consulting Intern at MarktechPost. She is currently pursuing her B.Tech from Indian Institute of Technology (IIT), Bhubaneswar. She is a data science enthusiast and is very interested in the application areas of artificial intelligence in various fields. She is passionate about exploring new technological advances and their application in real life.
#Researchers #MIT #Cornell #McGill #University #developed #machine #learning #model #selfdiscovers #linguistic #rules #match #created #human #experts Source