ACE-Step 1.5 for Music Generation
Create stunning music using ACE Step 1.5
ACE-Step 1.5
Music Generation
Text to Audio
1
320
Nodes & Models
WorkflowGraphics
PrimitiveStringMultiline
CheckpointLoaderSimple
ace_step_1.5_turbo_aio.safetensors
ModelSamplingAuraFlow
EmptyAceStep1.5LatentAudio
TextEncodeAceStepAudio1.5
ConditioningZeroOut
KSampler
VAEDecodeAudio
SaveAudioMP3
ACE-Step 1.5 text-to-music generation. Write a style prompt, add lyrics, set your BPM and key, and generate a full song.
ACE-Step 1.5 is an open-source music foundation model with a hybrid architecture: a language model plans song structure and a diffusion transformer renders the audio. Outputs up to 10 minutes of coherent music from a single run. The turbo model included in this workflow generates in seconds on consumer GPUs with under 4GB VRAM.
The two default prompts ship with complete production-ready examples: a full neo-soul track with verse-chorus-bridge structure and detailed lyrics, and a post-punk arrangement with layered electric guitars, live drums, and an angsty male vocal. Both show the level of specificity the model responds to well.
How do you use ACE-Step 1.5 for music generation?
Write a style prompt describing the sound, add lyrics with section labels, set BPM, key, language, and duration, and run. ACE-Step 1.5 generates full-length audio with coherent structure. The LM plans the arrangement; the diffusion model renders audio quality. Runs locally, no cloud API.
Tags (style prompt) Describes the genre, instruments, feel, tempo, and production style of the track. The model reads this for everything that isn't lyrics: arrangement, texture, sound palette, and mood.
Prompting approach that works: Lead with genre: "Neo-Soul:", "Post-Punk:", "Cinematic Orchestral:", "Lo-fi Hip-Hop:" Name the core instruments and how they play: "a live drummer plays a loose, hip-hop influenced pocket," "layered electric guitars, one clean and arpeggiated, the other providing distorted chordal texture." Describe the production feel: "organic," "driving," "warm," "angsty," "effortless groove," "strained quality that builds into an anthemic, shouted chorus." Include arrangement dynamics: "builds from a sparse intro to a full-band chorus," "tension that never fully resolves," "drops to bass and vocals in the bridge."
Lyrics Enter full song lyrics with section labels in brackets: [Intro], [Verse 1], [Chorus], [Bridge], [Outro]. The model reads these labels to structure the arrangement. Instrumental sections can be labeled [Intro - Guitar Riff & Drums] or similar to guide the sound without lyrics.
Leave lyrics empty for purely instrumental generation.
BPM (default: 190) Tempo in beats per minute. 190 is set for the post-punk default. Adjust for your genre: 60-80 for slow ballads, 80-100 for R&B and hip-hop, 120-140 for pop and dance, 160+ for punk and metal.
Duration (default: 120 seconds) Length of the generated track in seconds. ACE-Step 1.5 supports up to ~600 seconds (10 minutes). 120 seconds covers a standard song structure with intro, two verses, chorus, and outro. Set higher for extended compositions or longer instrumental pieces.
Language (default: English) Supports 50+ languages for lyrics and vocal generation. Set the language to match your lyrics for accurate phonetic rendering.
Key and scale (default: E minor) Musical key and scale for the generated track. The model uses this for harmonic coherence across the arrangement.
Time signature (default: 4/4) Standard 4/4 for most genres. Change for waltz (3/4), complex meter (7/8), or compound time (6/8).
CFG scale (default: 2) Controls how closely the output follows the style prompt. Higher values tighten prompt adherence. Lower values allow more interpretive freedom. Start at 2 and increase if the output drifts from the described genre or instrumentation.
Temperature (default: 0.85) Controls output diversity. Lower temperature produces more predictable, consistent results. Higher temperature introduces more variation. 0.85 is a balanced starting point.
Steps (default: 8) The turbo model produces good output at 8 steps. Increase to 12-16 for more refined audio at the cost of generation time.
What is ACE-Step 1.5 music generation good for?
ACE-Step 1.5 is strongest for generating full-length, royalty-free music with coherent song structure from a text prompt. Genre control, lyric integration, BPM and key settings, and local execution make it practical for background music, rapid production drafts, and localized song content across 50+ languages.
Royalty-free background music. Generate original tracks for videos, streams, games, and podcasts without licensing costs. The local execution and Apache 2.0 model means no API costs per generation and full commercial freedom. Set the duration to match your timeline and generate as many variations as needed.
Rapid production drafts. Sketch a track in a target style within seconds, evaluate the arrangement and feel, then rework selected elements in a DAW. The style prompt controls instrumentation and production feel; the BPM and key settings lock the musical foundation. Treat ACE-Step 1.5 as a fast first-pass generator before investing time in production.
Covers and style transfers. Describe an existing song's feel in the style prompt and provide new lyrics to generate a reimagined version. Repaint sections by changing the prompt while keeping the structure. Extend tracks by increasing duration and continuing from where the previous generation ended.
Localized jingles and vocal content. Generate songs with lyrics in your target language for marketing, education, or localization. The 50+ language support covers non-English vocal generation with accurate phonetic output.
Honest notes: the model performs best when the style prompt and lyrics are consistent in genre and mood. Mismatched prompts (upbeat style prompt with melancholic lyrics) can produce incoherent output. For highly specific instrument arrangements, more detailed tagging produces better results. The turbo model is optimized for speed; for maximum audio quality at the cost of generation time, standard ACE-Step 1.5 is available.
How does ACE-Step 1.5 compare to commercial music generation services?
ACE-Step 1.5 generates locally with no per-generation cost, no API limits, and full commercial rights under Apache 2.0. Commercial services offer polished UIs and may have stronger prompt-following on simple inputs. ACE-Step 1.5 trades UI simplicity for full control over BPM, key, time signature, language, and generation parameters that commercial tools don't expose.
Commercial music generation services (Suno, Udio) handle prompting for casual users and produce strong results from short descriptions. For professional or high-volume production where you need precise musical control, local execution, no cost per generation, and the ability to integrate into a ComfyUI pipeline, ACE-Step 1.5 covers those requirements.
For teams running batch generation of background tracks or building automated music pipelines, the local execution model removes the API cost bottleneck. For solo creators wanting polished results from minimal input, commercial services have a lower friction starting point.
FAQ
What is ACE-Step 1.5 and how does it generate music?
ACE-Step 1.5 is an open-source music foundation model with a hybrid architecture. A language model plans the song structure, arrangement, and lyrics placement. A diffusion transformer renders the audio quality. Both run together in one generation pass, producing structured, full-length tracks from a text prompt and lyrics.
How do I write a good style prompt for ACE-Step 1.5?
Lead with genre, then describe instruments and how they play, then add production feel and arrangement dynamics. "Neo-Soul: a warm organic track with a live drummer playing a loose hip-hop pocket, brushed hi-hats, and fingerpicked bass" gives the model enough to build a coherent arrangement. Specific instrument descriptions consistently outperform vague mood words.
What BPM and key settings work for different genres?
Ballads and ambient: 60-80 BPM, minor keys. R&B and hip-hop: 80-100 BPM, minor or Dorian. Pop and indie: 100-130 BPM, major or mixolydian. Dance and house: 120-140 BPM, minor or major. Punk and metal: 160+ BPM, minor keys. Set the key to match the mood: minor keys for tension and melancholy, major for uplift.
Can ACE-Step 1.5 generate vocals and lyrics?
Yes. Enter lyrics with section labels ([Verse 1], [Chorus], [Bridge]) in the lyrics field. Set the language to match your lyrics. The model generates vocals singing the provided lyrics in the described style. Leave the lyrics field empty for purely instrumental generation.
What VRAM does ACE-Step 1.5 need to run?
Under 4GB VRAM using the turbo all-in-one model. The turbo model generates in seconds on consumer GPUs and maintains good audio quality at 8 steps. For maximum quality, the standard model requires more VRAM and takes longer per generation.
How do I run ACE-Step 1.5 online?
You can run ACE-Step 1.5 online through Floyo. No installation, no setup. Open the workflow in your browser, write your prompt and lyrics, and hit run. Free to try.
Read more
_1777893762411.png?width=1400&height=620&quality=80&resize=cover)
_1777893762411.png?width=104&height=104&quality=80&resize=cover)