Skip to content
Pterodacty
  • Reinsurance News
  • Market Reaction
  • Business Articles
  • Google Business
  • Business News
  • About Us
    • Advertise Here
    • Contact Us
    • Privacy Policy
    • Sitemap

DALL-E 2, the future of AI research, and OpenAI’s business model

  • Home
  • DALL-E 2, the future of AI research, and OpenAI’s business model
By: magenet Posted on April 14, 2022

Table of Contents

Related Posts:

  • The Impact of AI on Digital Marketing — 2022 Trends and Forecasts
  • The beauty of DALL-E 2
  • The science behind DALL-E 2
  • Disputes over deep learning and AI research
  • The business case for DALL-E 2

We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 – 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Register today!


Artificial intelligence research lab OpenAI made headlines again, this time with DALL-E 2, a machine learning model that can generate stunning images from text descriptions. DALL-E 2 builds on the success of its predecessor DALL-E and improves the quality and resolution of the output images thanks to advanced deep learning techniques.

The announcement of DALL-E 2 was accompanied with a social media campaign by OpenAI’s engineers and its CEO, Sam Altman, who shared wonderful photos created by the generative machine learning model on Twitter.

DALL-E 2 shows how far the AI research community has come toward harnessing the power of deep learning and addressing some of its limits. It also provides an outlook of how generative deep learning models might finally unlock new creative applications for everyone to use. At the same time, it reminds us of some of the obstacles that remain in AI research and disputes that need to be settled.

The beauty of DALL-E 2

Like other milestone OpenAI announcements, DALL-E 2 comes with a detailed paper and an interactive blog post that shows how the machine learning model works. There’s also a video that provides an overview of what the technology is capable of doing and what its limitations are.

DALL-E 2 is a “generative model,” a special branch of machine learning that creates complex output instead of performing prediction or classification tasks on input data. You provide DALL-E 2 with a text description, and it generates an image that fits the description.

Generative models are a hot area of research that received much attention with the introduction of generative adversarial networks (GAN) in 2014. The field has seen tremendous improvements in recent years, and generative models have been used for a vast variety of tasks, including creating artificial faces, deepfakes, synthesized voices and more.

However, what sets DALL-E 2 apart from other generative models is its capability to maintain semantic consistency in the images it creates.

For example, the following images (from the DALL-E 2 blog post) are generated from the description “An astronaut riding a horse.” One of the descriptions ends with “as a pencil drawing” and the other “in photorealistic style.”

dall-e 2 astronaut riding a horse

The model remains consistent in drawing the astronaut sitting on the back of the horse and holding their hands in front. This kind of consistency shows itself in most examples OpenAI has shared.

The following examples (also from OpenAI’s website) show another feature of DALL-E 2, which is to generate variations of an input image. Here, instead of providing DALL-E 2 with a text description, you provide it with an image, and it tries to generate other forms of the same image. Here, DALL-E maintains the relations between the elements in the image, including the girl, the laptop, the headphones, the cat, the city lights in the background, and the night sky with moon and clouds.

dall-e 2 girl laptop cat

Other examples suggest that DALL-E 2 seems to understand depth and dimensionality, a great challenge for algorithms that process 2D images.

Even if the examples on OpenAI’s website were cherry-picked, they are impressive. And the examples shared on Twitter show that DALL-E 2 seems to have found a way to represent and reproduce the relationships between the elements that appear in an image, even when it is “dreaming up” something for the first time.

In fact, to prove how good DALL-E 2 is, Altman took to Twitter and asked users to suggest prompts to feed to the generative model. The results (see the thread below) are fascinating.

The science behind DALL-E 2

DALL-E 2 takes advantage of CLIP and diffusion models, two advanced deep learning techniques created in the past few years. But at its heart, it shares the same concept as all other deep neural networks: representation learning.

Consider an image classification model. The neural network transforms pixel colors into a set of numbers that represent its features. This vector is sometimes also called the “embedding” of the input. Those features are then mapped to the output layer, which contains a probability score for each class of image that the model is supposed to detect. During training, the neural network tries to learn the best feature representations that discriminate between the classes.

Ideally, the machine learning model should be able to learn latent features that remain consistent across different lighting conditions, angles and background environments. But as has often been seen, deep learning models often learn the wrong representations. For example, a neural network might think that green pixels are a feature of the “sheep” class because all the images of sheep it has seen during training contain a lot of grass. Another model that has been trained on pictures of bats taken during the night might consider darkness a feature of all bat pictures and misclassify pictures of bats taken during the day. Other models might become sensitive to objects being centered in the image and placed in front of a certain type of background.

Learning the wrong representations is partly why neural networks are brittle, sensitive to changes in the environment and poor at generalizing beyond their training data. It is also why neural networks trained for one application need to be fine-tuned for other applications — the features of the final layers of the neural network are usually very task-specific and can’t generalize to other applications.

In theory, you could create a huge training dataset that contains all kinds of variations of data that the neural network should be able to handle. But creating and labeling such a dataset would require immense human effort and is practically impossible.

This is the problem that Contrastive Learning-Image Pre-training (CLIP) solves. CLIP trains two neural networks in parallel on images and their captions. One of the networks learns the visual representations in the image and the other learns the representations of the corresponding text. During training, the two networks try to adjust their parameters so that similar images and descriptions produce similar embeddings.

One of the main benefits of CLIP is that it does not need its training data to be labeled for a specific application. It can be trained on the huge number of images and loose descriptions that can be found on the web. Additionally, without the rigid boundaries of classic categories, CLIP can learn more flexible representations and generalize to a wide variety of tasks. For example, if an image is described as “a boy hugging a puppy” and another described as “a boy riding a pony,” the model will be able to learn a more robust representation of what a “boy” is and how it relates to other elements in images.

CLIP has already proven to be very useful for zero-shot and few-shot learning, where a machine learning model is shown on-the-fly to perform tasks that it hasn’t been trained for.

The other machine learning technique used in DALL-E 2 is “diffusion,” a kind of generative model that learns to create images by gradually noising and denoising its training examples. Diffusion models are like autoencoders, which transform input data into an embedding representation and then reproduce the original data from the embedding information.

DALL-E trains a CLIP model on images and captions. It then uses the CLIP model to train the diffusion model. Basically, the diffusion model uses the CLIP model to generate the embeddings for the text prompt and its corresponding image. It then tries to generate the image that corresponds to the text.

Disputes over deep learning and AI research

For the moment, DALL-E 2 will only be made available to a limited number of users who have signed up for the waitlist. Since the release of GPT-2, OpenAI has been reluctant to release its AI models to the public. GPT-3, its most advanced language model, is only available through an API interface. There’s no access to the actual code and parameters of the model.

OpenAI’s policy of not releasing its models to the public has not rested well with the AI community and has attracted criticism from some renowned figures in the field.

DALL-E 2 has also resurfaced some of the longtime disagreements over the preferred approach toward artificial general intelligence. OpenAI’s latest innovation has certainly proven that with the right architecture and inductive biases, you can still squeeze more out of neural networks.

Proponents of pure deep learning approaches jumped on the opportunity to slight their critics, including a recent essay by cognitive scientist Gary Marcus entitled “Deep Learning Is Hitting a Wall.” Marcus endorses a hybrid approach that combines neural networks with symbolic systems.

Based on the examples that have been shared by the OpenAI team, DALL-E 2 seems to manifest some of the common-sense capabilities that have so long been missing in deep learning systems. But it remains to be seen how deep this common-sense and semantic stability goes, and how DALL-E 2 and its successors will deal with more complex concepts such as compositionality.

The DALL-E 2 paper mentions some of the limitations of the model in generating text and complex scenes. Responding to the many tweets directed his way, Marcus pointed out that the DALL-E 2 paper in fact proves some of the points he has been making in his papers and essays.

Some scientists have pointed out that despite the fascinating results of DALL-E 2, some of the key challenges of artificial intelligence remain unsolved. Melanie Mitchell, professor of complexity at the Santa Fe Institute, raised some important questions in a Twitter thread.

Mitchell referred to Bongard problems, a set of challenges that test the understanding of concepts such as sameness, adjacency, numerosity, concavity/convexity and closedness/openness.

“We humans can solve these visual puzzles due to our core knowledge of basic concepts and our abilities of flexible abstraction and analogy,” Mitchell tweeted. “If such an AI system were created, I would be convinced that the field is making real progress on human-level intelligence. Until then, I will admire the impressive products of machine learning and big data, but will not mistake them for progress toward general intelligence.”

The business case for DALL-E 2

Since switching from non-profit to a “capped profit” structure, OpenAI has been trying to find the balance between scientific research and product development. The company’s strategic partnership with Microsoft has given it solid channels to monetize some of its technologies, including GPT-3 and Codex.

In a blog post, Altman suggested a possible DALL-E 2 product launch in the summer. Many analysts are already suggesting applications for DALL-E 2, such as creating graphics for articles (I could certainly use some for mine) and doing basic edits on images. DALL-E 2 will enable more people to express their creativity without the need for special skills with tools.

Altman suggests that advances in AI are taking us toward “a world in which good ideas are the limit for what we can do, not specific skills.”

In any case, the more interesting applications of DALL-E will surface as more and more users tinker with it. For example, the idea for Copilot and Codex emerged as users started using GPT-3 to generate source code for software.

If OpenAI releases a paid API service a la GPT-3, then more and more people will be able to build apps with DALL-E 2 or integrate the technology into existing applications. But as was the case with GPT-3, building a business model around a potential DALL-E 2 product will have its own unique challenges. A lot of it will depend on the costs of training and running DALL-E 2, the details of which have not been published yet.

And as the exclusive license holder to GPT-3’s technology, Microsoft will be the main winner of any innovation built on top of DALL-E 2 because it will be able to do it faster and cheaper. Like GPT-3, DALL-E 2 is a reminder that as the AI community continues to gravitate toward creating larger neural networks trained on ever-larger training datasets, power will continue to be consolidated in a few very wealthy companies that have the financial and technical resources needed for AI research.

Ben Dickson is a software engineer and the founder of TechTalks. He writes about technology, business and politics.

This story originally appeared on Bdtechtalks.com. Copyright 2022

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Learn more about membership.

Categories: General Tags: Amazon Fba Business, Atlanta Business Chronicle'S, Boss Baby Back In Business, Business Administration Degree, Business Attire Women, Business Card Design, Business Cards Templates, Business Casual Dress, Business Casual Outfits, Business Checking Account, Business Credit Card, Business For Sale Near Me, Business Intelligence Platform, Business Lawyer Near Me, Business Loan Calculator, Business Name Ideas, Business Professional Women, Business Spectrum Login, California Business Entity Search, Capital One Spark Business, Carl Weber'S The Family Business, Charlotte Business Journal, Custom Business Cards, Delaware Business Search, Fl Sos Business Search, Florida Business Search, Harvard Business Publishing, Insurance For Small Business, Kelley School Of Business, Maryland Business Express, Maryland Business Search, Moo Business Cards, National Business Furniture, New York Business Search, Ohio Business Gateway, Onedrive For Business, Online Business Ideas, Paramore Misery Business, Risky Business Cast, Small Business Insurance, Spectrum Business Customer Service, Tom Cruise Risky Business, Us Small Business Administration, Verizon Wireless Business, Verizon Wireless Business Login, Virtual Business Address, What Is Business Administration, Women'S Business Casual, Yelp Business Login, Yelp For Business

Post navigation

Annual inflation in Israel set to climb to 4%
TAU names Dan Amiram as College of Management dean

Recent Posts

  • Google Reports (And Fixes) 13 New Chrome Vulnerabilities
  • Ecommerce Discount Strategy: A Nearly Fool-Proof Guide
  • Graphene Battery Startup Nanotech Energy Raises $64 Million
  • IAI Q1 net profit jumps 86%
  • 10 Must-Have Marketing Skills for 2022

Archives

  • May 2022
  • April 2022
  • March 2022
  • February 2022
  • January 2022
  • December 2021
  • November 2021
  • October 2021
  • September 2021
  • August 2021
  • July 2021
  • June 2021
  • May 2021
  • April 2021
  • March 2021
  • February 2021
  • January 2021
  • December 2020
  • November 2020
  • October 2020
  • September 2020
  • August 2020
  • July 2020
  • June 2020
  • May 2020
  • April 2020
  • March 2020
  • February 2020
  • January 2020
  • December 2019
  • November 2019
  • October 2019
  • September 2019
  • August 2019
  • July 2019
  • June 2019
  • April 2019
  • March 2019
  • January 2019
  • December 2018
  • November 2018
  • October 2018
  • December 2016

Categories

  • Business Articles
  • Business News
  • General
  • Google Business
  • Market Reaction
  • Reinsurance News

Visit Now

Business Manager

BL

LP

TL

Intellifluence Trusted Blogger
pterodactyl.info © All Rights Reserved | Magpaper by Theme Palace
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept”, you consent to the use of ALL the cookies.
Cookie settingsACCEPT
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT