Cloudinary is anything but ordinary. Particularly in the innovation department.
The image and video platform, which boasts an intelligent digital asset management (DAM) system trusted by global brands, has been furiously injecting generative AI features across its portfolio. And man, there’s some stuff to talk about.
While some of these features are in the “knock your socks off” category, many are what I might deem practical – reflecting Cloudinary’s clear understanding of its customer needs and its execution of a solid strategy to surface the right kinds of features.
Like other vendors concentrating on AI transformation, this news is the latest in a string of enhancements. I recently dug into their Cloudinary Assets platform, which was unveiled at DAM New York late last year. The API-first offering streamlines every stage of the asset lifecycle, including customizable creative review and approval workflows, portals for controlling asset sharing, and intuitive AI solutions that automate manual tasks.
Pretty slick, and you can read the analysis here.
Now, the company has introduced even more Gen AI advancements with the unveiling of AI Vision. It’s a shot in the arm for developers and brands, giving them unprecedented control and insight into their visual media, and enabling users to edit, optimize, and transform assets at scale with ease and efficiency.
According to Cloudinary, AI Vision does what standard LLMs can’t. It utilizes a generative multimodal large language model to interpret and respond to visual content queries and prompts, driving the automation of critical processes like content moderation, image classification, and custom tagging. This unique combo helps businesses streamline their “content moderation ops” while improving classification capabilities at scale.
Cloudinary also announced several enhancements to its most popular generative AI tools including Generative Enhance, Generative Fill, Generative Restore, and Generative Upscale. According to the company, thousands of users rely on these Gen AI capabilities to make their jobs faster and easier. But there’s more to it than that.
“Managing images at scale isn’t just about speed – it’s about ensuring accuracy, brand compliance, and efficiency across thousands of assets, teams, and touchpoints,” said Nadav Soferman, co-founder and Chief Product Officer at Cloudinary. “AI Vision brings automation and intelligence to these critical workflows, allowing brands to instantly tag, moderate, and transform images with confidence. Combined with our suite of generative AI tools, this means faster go-to-market times, less manual work, and seamless delivery of optimized visuals everywhere.”
Let’s get a first look at AI Vision and some of its features.
The concept of “brand safety” is intuitive enough, but the DAM-centric definition could use a little more context.
Brand safety has its roots in the advertising industry, where marketing messages have often walked a tightrope. While it represents a spectrum of critical concerns, brand safety is generally defined as a set of measures established to protect a company’s image and reputation from negative or damaging influence. This can apply to a wide range of questionable or inappropriate content across multiple categories, including the notorious “Dirty Dozen” as defined by the Interactive Advertising Bureau (think drugs, crime, and the like).
It goes without saying that brand safety is a strategic consideration for enhancing brand equity. According to one study, almost half of marketers are concerned with maintaining these standards to prevent damage to their brand’s image and reputation and eroding consumer trust.
In an age of misinformation, “fake news,” and other hyperbolic content, it’s easy to imagine how quickly and easily a brand’s safety can be tested. A single slip can result in significant consequences, from impacting budgets to painful lawsuits.
As enterprises amass larger caches of visual content – and AI-generated imagery expands the surface area – a digital asset management system has never been more critical in maintaining order amidst the chaos. This is where Cloudinary's investment in more robust AI capabilities, which focus on bringing value to workflows at scale, is filling a critical gap in a market that's moving at warp speed.
Built to address the complexity of managing large-scale visual media libraries while elevating brand safety, AI Vision is Cloudinary’s answer to the conundrum. It brings the power of generative AI to its intelligent DAM, automating media management and enabling precise, scalable, brand-specific content workflows.
Source: Cloudinary website.
What’s worth noting is how AI Vision offers capabilities that go beyond basic automation. One of the most profound is its custom taxonomy and image classification, which allows users to easily categorize, search, and locate assets based on detailed criteria.
By providing a set of tags with specific descriptions, teams can categorize images according to their branding and organizational needs. Search based on a photo’s background color or subject orientation – all without needing to train or fine-tune any tagging models. Demographics can also be built into an automated workflow that can analyze images at scale.
Source: Cloudinary website.
Imagine detecting the presence of specific branding elements within an image to help identify sensitive or inappropriate content. AI Vision’s advanced content moderation and compliance tools make this possible, allowing for more accurate and detailed image analysis. This can be a godsend when processing thousands of assets.
AI Vision also provides a Visual Question Answering (VQA) feature. It’s sort of a “DAMbot” that allows you to ask complex, image-specific queries and receive actionable, precise responses that streamline and improve your media workflows, like generating SEO-ready metadata or descriptive alt text. Very practical – and incredibly useful.
As I noted in my earlier review of its Assets platform, Cloudinary started driving in the AI fast lane as soon as generative hit the mainstream. Since launching its first set of Gen AI tools in 2023, the company has focused on high-utility features that address the real-world needs and pressures of developers and brands.
Along with its AI Vision launch, Cloudinary has also been busy expanding and enriching its broader capabilities, arming users with more creative muscle while reducing the complexity and costs associated with their visual media workflows. This is putting greater control in one place, allowing teams to move faster – and more confidently – than ever before.
Delivering an even more advanced and fine-tuned model, Cloudinary’s Generative Fill – its most popular Gen AI feature – now delivers even greater contextual accuracy. This means better results when expanding an image to fill white space and to fit new aspect ratios.
There are plenty of tools out there for removing backgrounds from images, but most deliver “iffy” results that require additional hand-tuning in Photoshop. Cloudinary has upleveled its own Background Removal and Background Replacement features, providing more precise removal and swapping of backgrounds based on the image’s foreground – even for super complex assets.
Cloudinary offers a cool Gen AI “playground” on its website for exploring some of these features. Like many image-focused AI tools, I was met with some variable results after running it a few times. But overall, I was astounded at how fine-grained the output was based on the detail of the foreground image. It was also incredibly fast.
Source: Cloudinary website.
There’s a new feature called Generative Extract that intelligently isolates specific elements like products, objects, or people from images. This allows you to create layered content that’s optimized for any channel, which is especially powerful when used in combination with Cloudinary’s overlay feature. You can even make a grayscale mask of an extracted area and use it in conjunction with other transformations – a degree of control that image pros will find especially useful.
Another handy feature we’ve seen in other third-party AI platforms is Generative Enhance and Restore. This enables you to remove noise, correct imperfections, automatically sharpen up details, and enhance the overall quality of any image.
Tools like PicWish have done this seamlessly for a couple of years, but Cloudinary has incorporated it into its core offering, allowing you to easily revive old or damaged images while preserving critical details for professional results. Again, you can test drive this yourself via their online playground:
Source: Cloudinary website.
Along the same lines, there’s also a solidly practical feature called Generative Upscale that seamlessly expands image resolution without impacting quality or introducing pixel expansion artifacts. This is a fantastic one for marketing teams that need high-impact images for large-scale use cases, regardless of the original asset's quality.
Source: Cloudinary website.
Finally, Cloudinary has further streamlined video management tasks at scale with its new AI Video Transcription and Chaptering. These tools auto-generate transcripts and chapters on upload using Cloudinary’s Video API and the Video Player Studio in its intelligent DAM. With accessibility being a core objective of digital content and video assets, this step can save significant time and help brands ensure digital equality.
If there’s one area where a DAM really proves its value, it’s governance. As enterprises collect more digital “stuff,” the risk to brand safety increases exponentially. An AI-powered digital asset management system can help provide the guardrails for maintaining brand integrity while avoiding the pitfalls in this “Wild West” of exploding image content.
Cloudinary is really marinating its platform with AI. And by all accounts, it’s paying off. By bringing more of these high-value features into one place, they’re eliminating the constant back-and-forth from different platforms and streamlining the media workflow. I also like how they’re conscious of where humans fit in the loop, and many of these new features are enabling creators while providing control at the right place in the workflow. AI innovation is coming hard and fast, and it’s increasingly difficult to balance UI and human experience within the broader automation potential.
I’ve tested a few of the new upgrades in the sandbox, and they seem to deliver as promised. At the same time, it’s important to note that quality may vary – in other words, we’re still in the “prompting with a rope” game. That said, I came away with better results than I sometimes do using Photoshop or other tools in the Gen AI toolbox. That’s a credit to the more fine-grained output in their unique LLM multi-modality.
The transformation URL API is, as expected, complicated – and caters to developers building composable connections for asset delivery structures. The documentation is incredibly rich and provides domain over multiple dimensions, including image extraction, shape cutouts, and other elements of visual enhancement. There’s a lot of power here.
Source: Cloudinary documentation
With the growing demand for visual content across channels and devices, Cloudinary is really pushing the boundaries with its pure-play solutions and API-first capabilities – which support composable, MACH-centric architectures. This flexibility will appeal to large enterprises looking to tap these expanded capabilities with AI while attaining the compliance benefits of a unified DAM. There’s also the freedom and flexibility of building composable stacks with content management systems and DXP platforms that leverage these intelligent DAM capabilities.
It's also worth noting that Cloudinary is fervently dedicated to developers and content creators. Their solutions for coding and no-code tools are giving businesses more runway to deliver engaging visual experiences in a competitive digital landscape and be fully in control of their vector.
AI Vision is a great step forward. I think we can expect even more streamlined enhancements that build on these native capabilities.
As I said: Cloudinary is anything but ordinary.
May 13-14, 2025 – Frankfurt, Germany
Don't miss the first European edition of our prestigious international conference dedicated to the global content management community! CMS Summit 25 will bring together top-notch speakers, our renowned learning format, and engaging social events. Hear from leading practitioners like Deutsche Bahn, Dr. Oetker, and more – and join customers, agencies, and CMS vendors as we discuss current trends and what's ahead for the content and digital experience fields. Connect and network at the only vendor-neutral, in-person conference focused on CMS. Space is limited for this exclusive event, so book your seats today.
August 5-6, 2025 – Montreal, Canada
We are delighted to present the second annual summer edition of our signature global conference dedicated to the content management community! CMS Connect will be held again in beautiful Montreal, Canada, and feature a unique blend of masterclasses, insightful talks, interactive discussions, impactful learning sessions, and authentic networking opportunities. Join vendors, agencies, and customers from across our industry as we engage and collaborate around the future of content management – and hear from the top thought leaders at the only vendor-neutral, in-person conference exclusively focused on CMS. Space is limited for this event, so book your seats today.