Model Overview
Diffusion Architecture · 2025.1
Tempolor 3.5
Based on a diffusion model and continuous music-audio representations, it generates 44.1kHz stereo high-fidelity music. With an inference real-time factor below 0.1, full-song generation reaches industry-leading speed.
44.1kHz Stereo
Diffusion Architecture
ControlNet Melody Control
Inpaint / Repaint
RTF < 0.1
Overview

Tempolor 3.5 is based on a diffusion model and continuous music-audio representations, supporting the generation of 44.1kHz stereo high-quality music. In its technical roadmap, this version evolves from a paradigm emphasizing semantic-skeleton modeling toward a high-fidelity paradigm centered on continuous acoustic representations, markedly improving sound texture, spatial layering, dynamics and the overall naturalness and completeness of the listening experience.

By introducing controllable-generation capabilities such as ControlNet and Inpaint / Repaint, Tempolor 3.5 further expands editability and controllability in scenarios like melody control, lyric editing and local repainting, providing a technical foundation for finer music creation and interactive editing.

In terms of inference efficiency, the real-time factor is below 0.1, with full-song generation speed at the industry-leading level.

Performance

Tempolor 3.5 has a clear advantage in fine listening detail and atmospheric expression. The model more naturally restores reverb tails, dynamic swells, textural layering and spatial depth, making results more immersive in emotion-driven content such as lyrical, ethereal, suspenseful and epic pieces.

Compared with the previous two generations, 3.5 cares not only about "writing it right" but about "sounding good," making it better suited to film scoring, atmospheric music and brand mood music — scenarios with higher demands on listening completeness.

Especially in vocals, it delivers excellent vocal quality across singing skill, vocal timbre, vocal fidelity and lyric clarity.

* Data as of May 2025

Real-Time Factor (RTF)

Lower is faster
Yue
12
Udio V1.5
1.48
Suno v4
0.84
Mureka v5.5
0.27
DiffRhythm v1.0
0.1
AceStep v1.0
0.063
Tempolor V3.0
0.02

Inference time for 120s of audio

Unit: seconds
Yue
1200
Udio V1.5
177
Suno v4
100
Mureka v5.5
32
DiffRhythm v1.0
12
AceStep v1.0
3.84
Tempolor V3.0
2.5
Tempolor V3.0 Speed
Industry-leading commercial music-generation model
Tempolor V3.0 RTF 0.02
Generates 2 minutes of music in 2.5 seconds

* Based on NVIDIA RTX 4090

Demo
15th National Games AI Theme SongOfficial Theme Song
CinematicAnthem
36Kr WISE AI Theme SongConference Theme Song
CinematicAnthem
Boonie BearsTrailer
CinematicTrailer
The Spinning Washing MachineDigital-Human MV
ElectronicMV
Wind and Moon of My Hometown
0:00 / 0:00
Chinese FolkBallad
街角のソノリティ
0:00 / 0:00
J-PopCity Pop
Wind Over the Hills
0:00 / 0:00
FolkAcoustic
Hearing the Galaxy
0:00 / 0:00
CinematicAmbient