Is It Possible to Get Rid of Datasets in AI? – Analytics Insight


A generative model, a type of machine-learning model, takes significantly less memory to keep or transfer than a dataset.

MIT researchers created a method for training a machine learning framework that, rather than requiring a dataset, employs a particular form of a machine learning model to generate exceptionally realistic synthetic datasets that can train another system for downstream vision applications.

Their findings indicate that a contrastive representation learning model trained solely on synthetic data may develop visual representations that are comparable to, if not superior to, those learned from actual data.

A generative model, a type of machine-learning model, takes significantly less memory to keep or transfer than a dataset. Synthetic data has the ability to get past some of the privacy and use rights problems that limit how actual data may be distributed. A generative model might potentially be updated to eliminate particular features, such as race or gender, to overcome biases in traditional datasets.

Generating synthetic data

After a generative model has been trained on actual data, it may produce synthetic data that is essentially indistinguishable from the original. The training method consists of giving the generative model millions of photos containing items in a certain class (such as vehicles or cats), after which it learns how to produce comparable objects.

Researchers may utilize a pretrained generative model to generate a constant stream of unique, realistic pictures based on those in the model’s training dataset by just turning a switch, according to Jahanian.

But generative models are much more beneficial since they learn how to modify the underlying data on which they are tested, he explains. If the model has been trained on photos of vehicles, it can “imagine” how a car might appear in new scenarios — situations it hasn’t seen before — and then generate images that represent the car in different positions, colors, or sizes.

Different perspectives of the same picture are necessary for contrastive learning, which involves exposing a machine-learning model to a large number of unlabelled images in order to learn whether pairings are similar or different.

The researchers linked a pre-trained generative model to a contrastive learning model in such a way that the two models could automatically operate together. According to Jahanian, the contrastive learner may tell the generative model to generate several perspectives of an object and subsequently learn to recognize that thing from various angles.

“It was like putting two puzzle pieces together.” “The generative model can help the contrastive technique acquire better representations since it can offer us alternative views of the same object,” he explains.

Even better than the real thing

The researchers compared their technique to a number of different image classification models that had been trained using actual data and discovered that it performed as well as, if not better than, the other models.

A generative model has the benefit of being able to generate an endless number of samples in principle. So, the researchers also evaluated how the number of samples affects the model’s performance. They discovered that increasing the number of unique samples generated in some cases resulted in even more benefits.

More Trending Stories 

Share This Article

Do the sharing thingy

Spread the love

Leave a Reply

Your email address will not be published.