OpenAI Reportedly Prepping to Launch Suno/Udio Rival
For months, the AI-music world has largely been a Suno vs. Udio boxing match, as the two biggest names in the biz have been trading viral hits, feature drops, and fan-fueled remixes like heavyweight contenders.
But now, the juggernaut of mass-adopted generative AI is warming up in the corner: OpenAI.
According to a report from The Information, the company behind video platform Sora and ChatGPT is developing an AI music generator that turns text and sound prompts into full compositions — not unlike what Suno and Udio already do. But insiders say this one could go deeper: integrating musical score data, human-labeled rhythm and harmony, and possibly even instrument-aware layering that reacts to existing audio in real time.
That last part would be new territory. Imagine Udio’s polished pop meets Suno’s creative chaos — then add a model that actually understands how a melody and chord progression work together.
Whisper and Jukebox: OpenAI’s First Forays
OpenAI’s been here before. (Well, sort of.) In 2020, it dropped Jukebox, a wild research model that generated raw audio, complete with vocals and instrumentation, years before most people had even heard of AI songs. It was messy, fascinating, and way too computationally expensive for everyday use. But the ambition was there.
Since then, OpenAI has focused on voice tech (Whisper for transcription, ChatGPT’s voice mode for conversation), leaving music mostly untouched.
The rumored new system reportedly blends text-to-music generation with audio-to-music augmentation (both of which Suno and Udio also do).
The Information reports that OpenAI has tapped students affiliated with Juilliard to help annotate musical scores, though the school itself says it’s not officially involved. If true, it points to a model trained on something deeper than raw audio: real musical structure, the kind you only learn in a conservatory.
OpenAI’s Jukebox Was a Trail Blazer for Suno, Udio, and Others
Back in 2020, OpenAI’s Jukebox was an early prototype for what we’re seeing today. It generated entire songs — verses, choruses, vocals — but with unpredictable fidelity and warped phrasing.
It learned directly from raw audio, not MIDI or symbolic notation.
The results sounded like long-lost demo tapes from alternate universes.
It demanded massive compute power — sometimes hours to render a few seconds of sound.
Now, five years later, OpenAI seems ready to scale that same DNA into something modern: real-time generation, structured control, and a broader creative toolkit that actually competes with the sleek interfaces of Suno, Udio, Producer.AI, and others.
The Bigger Picture: The AI Music Arms Race
If Suno and Udio were the spark, OpenAI just might be the supernova. With its GPT models, Sora video engine, and Microsoft-scale infrastructure, the company has the resources to link text, image, video, and music into a single creative flow.
Picture this: you write a scene in ChatGPT, render it in Sora, and score it instantly with OpenAI’s new music model — no exports, no DAW, no latency. That’s the kind of vertical integration no other platform can match right now.
There’s no public timeline yet… no beta, no waitlist, nada… but make no mistake: when OpenAI decides to make noise, they don’t whisper.