
Tempolor 4.6 is Tempolor's current flagship music-generation model. Architecturally, building on previous model versions, the bottleneck has been redeveloped; in terms of the generation paradigm it stays consistent with real music creation, producing 48kHz stereo high-quality music through a hierarchical, progressive representation system of a musicality codec, a music-semantic codec and a music-audio acoustic codec.
This version decomposes music generation into representation-learning and generation tasks at different levels, achieving coarse-to-fine structured generation: the high level handles musicality and structural organization, the middle level handles semantics and content expression, and the low level handles acoustic detail and high-fidelity reconstruction. It represents the current mainstream paradigm — a hierarchical path 'from high-level semantics to fine-grained acoustics' — balancing musicality and audio quality.
In terms of controllability, beyond ordinary generation, the model further supports precise Remix rewriting, fine-grained audio editing and more.
Tempolor 4.6 establishes a more reference-worthy balance among long-form structural planning, lyric-carrying capacity and audio fidelity. Thanks to multi-layer Codec coordination and the LLM's long-range organization ability, the model maintains thematic-motif coherence and emotional unity across longer generation spans.
This version shows clearer layering and spatial separation in drums, bass, harmony and vocals — not only building a structurally complete framework, but also moving closer to mature delivery standards in fine listening details.
Especially when handling slow-tempo, relaxing styles, the model's emotional expression and arrangement texture are particularly delicate, offering the best solution in the current Tempolor series for brand theme music, commercial-grade demos and complex lyric creation.
Based on Chinese and English test sets (30 each, 60 total), compared against Mureka v9, Suno v5.5 and MiniMax V2.6, covering the Meta Audiobox Aesthetics and SongEval evaluation systems.
| Model | CE↑ Content Enjoyment | CU↑ Content Usefulness | PC↑ Production Complexity | PQ↑ Production Quality |
|---|---|---|---|---|
| Tempolor v4.6 | 7.7251 | 7.9596 | 6.2263 | 8.3291 |
| Suno v5.5 | 7.7156 | 7.9949 | 6.3399 | 8.3184 |
| Mureka v9 | 7.6324 | 7.8275 | 6.5859 | 8.1604 |
| MiniMax V2.6 | 7.6872 | 7.9131 | 6.4197 | 8.2175 |
| Model | Musicality↑ Musicality | Coherence↑ Coherence | Naturalness↑ Naturalness | Memorability↑ Memorability | Clarity↑ Clarity |
|---|---|---|---|---|---|
| Tempolor v4.6 | 4.4419 | 4.5639 | 4.3438 | 4.5710 | 4.4458 |
| Suno v5.5 | 4.3616 | 4.4814 | 4.2565 | 4.4885 | 4.3634 |
| Mureka v9 | 4.4763 | 4.5928 | 4.4167 | 4.5873 | 4.4523 |
| MiniMax V2.6 | 4.2315 | 4.3668 | 4.1447 | 4.3463 | 4.2244 |
Comparison of time to generate 120s of music audio (inference on an Nvidia L20 GPU)