Skip to main content

A year ago the idea that it would take seconds to create a photorealistic image of anything you could imagine by typing a few words would have sounded bonkers. 

Text-to-image AI art generators StarryAI and Wombo Dream were fun, but they weren’t realistic. They offered more of a stylized, dream-like quality. Now we have tools like Midjouney, Stable Diffusion, and Dall-E 2 that turn words into any picture you can imagine.

Today, various AI video tools occupy a similar space as StarryAI did in the early days of image prompting. While none can take an idea and magically transform it into a fully produced movie or commercial, that reality is inevitable. The question is, how long will it be before we get there, and what tools can help marketers create video content in the meantime? The answer to the former is “probably a long way off”, but a number of video tools are emerging on the market showing glimpses of how AI can be used to take us from Text-to-Image to Text-to-Video. 

Even Meta is betting on the technology with their yet-released Make-a-Video tool.

Runway ML’s Gen-1: Transform Existing Video with AI

Runway ML is a video editing suite powered by AI from some of the creators of Stable Diffusion, one of the leading text-to-image generators, and they’ve been around for a while. Their newest release, however, is the one that will likely make them famous.

Gen-1 has a range of different features that all revolve around the concept of transforming an existing video by mapping a new visual concept or style overtop of it. Images can be prompted, it seems, in the system, but early beta users are reporting that the best results come from mapping an existing image – typically one generated in Midjourney – over a video to create a new result. 

A walk to the store could become a walk to Mordor in Lord of the Rings, or perhaps something less dangerous, like a quick jaunt on the surface of the moon.

Here’s an early hands-on video from Karen X Cheng, the creative behind the AI-designed cover of Cosmo magazine.

The company also has a cryptic video on Twitter teasing another new release coming up later in March.

Marketer Takeaway: As the technology is rolled out to more and more users, it will become readily apparent that this tool opens up a whole new approach to creative filmmaking. Basic models can be transformed into entire scenes. Take a hodgepodge of stock videos and edit them together into a seamless video narrative in a singular visual style. Experimentation and creative thinking will be key to maximizing its potential in production.

D-ID’s Videos from Photos

On its surface, D-ID might look like another tool that provides a template for talking avatar heads to make sales pitches with computer-generated voices. It does do that, fairly well and economically.

Users can upload a photo, use an existing avatar, or generate an avatar with text-to-image in the platform. Then, the user either types text and selects an AI voice, or uploads an original audio file. The image and speech are then rendered as an animated video of the person speaking. We’re off to the races, the video can be embedded anywhere.

The interesting feature of D-ID is that you can upload a photo of anyone and have it speak. You could upload a photo of your CEO, for example, and use his recorded voice to present personalized one-to-one messages in a B2B scenario, or embed videos into presentations. With tools from companies like ElevenLabs, you can even synthesize your own voice or that of a spokesperson, making it quick and easy to generate videos with just text and a photo. We could have synthesized the real Colonel Sanders voice from archival footage for this example below, but don’t have the rights to use it.

One might also use an AI image generator like Midjourney for the avatar, as we did above.

For an in-depth walkthrough using a photorealistic avatar, here’s a great tutorial on YouTube

Marketer Takeaway: D-ID can rapidly generate internal and external video communications. Try replicating yourself with a photo and a voice clone to make videos of yourself presenting material without actually presenting it. For social media, the platform can be used cleverly to create interesting content if executed strategically and creatively. Editing together multiple characters can also be used to generate conversations and narratives.

The SCS Collaborative AI Art Video

While other AI video tools exist, we’d be remiss to leave out our own mass collaboration experiment, turned around in a week of spare time, that combines over 120 AI art images built in a new Stable Diffusion tool called ControlNet, which generates images around an image prompt – like the Nike logo in this example.

We held an internal contest to encourage staff to learn AI art prompting and combined the results into a video montage with a fully AI-animated opening, and all other visuals you see created by AI.

For the introductory scene, we used an AI video tool called Kaiber, which diffuses one image into another in an animated video output.

While Kaiber currently delivers a dream-like animated feel, that’s exactly how the early image generators started before they refined into more and more lifelike visuals.

Marketer Takeaway: While it’s still early days, over the next few years we’ll see AI play a more important role in video production, lowering the cost of special effects and other post-production techniques, and bringing new tools to the average content creator, elevating the quality of content you might see coming from the latest social influencer you’ve brought onto your brand roster.

Also published on Medium.


SCS surveyed 750 US consumers on how their physical and digital buying habits have changed during the pandemic. These insights and more are presented in Omnichannel Overdrive.

Download the white paper →