For a long time, people have not really thought of computers as creative, or original. To automate mechanical and repetitive tasks, sure, computers are fantastic, but for inspiration and creativity, surely this was the sole domain of humans?
Over the past few years there have been huge advances in artificial intelligence for the purpose of content generation, and this assumption that machines cannot innovate or create, is rapidly eroding.
AI is already being used professionally in many areas of content creation. For example, Natural Language Generation (NLG) is being used by journalists to generate instant articles, marketers to create email campaigns, and advertisers to personalise content.
Alongside text-based content, Computer Vision is regularly being used for image, graphics, and video editing. Most professional image and video editing tools now incorporate AI techniques into their software to automate and improve the quality of the editing process.
This year, image generation models, such as OpenAI’s DALL-E 2, Google’s Imagen and the Stable Diffusion have made leaps and bounds. They can take text prompts, or image prompts, to generate entirely new images with incredible accuracy and fidelity.
In true Blue Peter fashion, here are some examples I made earlier using the open-source Stable Diffusion model, along with their text prompts.
These models, DALL-E 2, Stable Diffusion, Imagen, etc., are known as Diffusion Models and have been gaining a lot of traction. Diffusion Models work by successively adding noise to training data, and then learning to reconstruct the training data by reversing this noising process. After training, the Diffusion Model can be used to generate new data by passing randomly sampled noise through this trained denoising process.
Beyond images, very recently models that can generate video have been released by the big tech companies, including Meta's Make-a-Video and Google's Imagen Video which can generate short videos from text, image or video prompts.
When it comes to music, there’s an entire industry being built around AI applications for creating music, such as AmperMusic, Google Magenta’s NSynthSuper and Sony developed Flow Machines to release a song created by AI called “Daddy’sCar”. AI-based mastering services are also being used for optimizing the listening experience on different devices, and AI is being used to recommend new music in streaming apps.
In the world of digital learning, educational content, such as digital textbooks, lessons and study guides can be generated with the help of AI.
Many types of content that we see in our day-to-day lives are starting to be generated by AI, it’s likely that you have read an article, listened to a piece of music, or watched a film that at least partly involved artificial intelligence in its creation.
Much closer than you would think. Language models, which AI researchers use to understand and generate natural language, have recently made huge strides.
One of the historical problems with processing very long passages of text, is that those language models used struggled to remember how different parts of the text relate to each other, partly due to something called the “vanishing (and exploding) gradient problem”. So, generating a fake tweet is easy, while generating a poem is harder, and then generating an entire novel is much harder still.
However, AI researchers have been building bigger language models with better techniques, using huge amounts of data and vastly more computational power. These language models are much better at understanding and generating larger passages of text.
A great example of this is OpenAI’s GPT-3 model, trained on masses of text, at an estimated cost of $4.6 million. The GPT-3 model has around 175 billion parameters, ten times more than its closest rival. GPT-3 and has been shown to generate surprisingly convincing text, which could fool many readers into thinking it was written by a human.
Right now, we’re currently at the stage where AI can generate a convincing poem or article, but not a whole novel. It wouldn't be surprising if, eventually, language models will be proficient enough to be are writing novels the length of War and Peace.
One person’s definition of quality may differ drastically from another’s, and quality content is highly subjective, especially where art or literature is concerned. So there is no exact measure for "quality content", however other measures are used to estimate the ability of models to generate text or images, such as the bilingual evaluation understudy (BLEU) score for language models, and the Fréchet inception distance (FID) for image generation models. The central idea behind the BLEU score is "the closer a machine translation is to a professional human translation, the better it is", similarly the FID score compares generated images with a set of real ones (the ground truth).
Actually, AI models don’t consider “quality” directly. Most deep learning models are trained on huge datasets of text, images, videos, etc. without any consideration for quality, beyond the quality of the data they are trained on.
That being said, “quality” is something that could be implicitly learned. For example, what is called ‘Reinforcement Learning’ could be used to generate many different content variations on a website, and gradually improve that content based on user feedback or behaviour over time.
This is already a big risk, one that we are underestimating. One danger is that AI models are being used to generate fake news articles and social media posts, that can be used to influence elections or scam consumers.
Another risk is that AI models are being used to generate “deep fakes”. For example, to create fake pornography that use people’s likeness without their consent, or to falsify videos of politicians. Researchers are working on building systems to identify and take down fake content using these same AI techniques. Meta have created the Deepfake Detection Challenge inviting researchers to develop models to detect deep fakes, and more work in this area will be essential to minimize the risks associated with AI generated content.
Absolutely, since deep learning models are trained on historical data, there is a risk that AI models will contain biases against certain demographic groups that have been learned from the data.
For example, language models trained on articles from the internet may show gender stereotypes found in society. So, we’re seeing that new art or content that has been generated using these biased models may not be truly representative.
On the flip side, AI can also be used to reduce bias if used carefully, and ethical AI is an important and active field of research into how to measure and remove these biases.
Content creators should not see the growth of AI as a threat, but rather a great opportunity, to find new, exciting ways to enhance, inspire and expedite the way they create content. We could see AI tools being used by content creators as a type of “AI muse”, generating different content options for the creator to consider and cherry-pick. This is nothing new, and creative professionals have benefitted from using new technology and tools in their work. Many of these industry standard tools already have AI deeply embedded in them, such as Photoshop, whose object selection, neural filters, and content aware filling, to name a few, are based on AI.
In summary, the ability for AI to generate content is improving at an exponential rate and we will likely see an explosion in AI generated content in our daily lives. With this come huge opportunities for individuals and organisations to generate new exciting content, as well as many new challenges and risks. We can hope that the AI research community and creative professionals, will work together to navigate this complex, emerging landscape, and unlock the potential of AI content generation, while mitigating its risks.