Workflows

Pricing

Happy Horse 1.1 · Text to Video

Describe a scene in plain language and Happy Horse 1.1 generates a cinematic video with synchronized audio, dialogue, and lip-sync at up to 1080p.

alibaba

dialogue

happy horse 1.1

text to video

_MConverter.eu_Copy of Image to Talking Video - LTX 2.3 + ElevenLabs UGC (1)_1782467465059.webp

_MConverter.eu_Copy of Image to Talking Video - LTX 2.3 + ElevenLabs UGC_1782467465059.webp

_MConverter.eu_Copy of Image to Talking Video - LTX 2.3 + ElevenLabs UGC (2)_1782467653441.webp

Generates in about -- secs

floyoofficial

Nodes & Models

Floyo Partner Nodes

HappyHorse11TextToVideo_floyo

VideoToFrames

ComfyUI-VideoHelperSuite

VHS_VideoCombine

ABOUT THE WORKFLOW

Generate a Video from a Prompt
Write a scene description covering the subject, action, camera movement, lighting, and dialogue. Happy Horse 1.1 generates a video with synchronized audio in a single pass. Choose your resolution, aspect ratio, and duration, then download the MP4.

Partner node. This workflow calls an external API, so each run uses credits from your API wallet. No API key needed. Floyo handles the connection.

Model

Happy Horse 1.1 by Alibaba. A unified text-to-video model that generates synchronized video and audio together, with native multi-shot storytelling, multi-language lip-sync, and strong motion consistency.

HOW IT WORKS

Step 1. Write your prompt
Describe the full scene: who is in it, what they do, how the camera moves, the lighting, the mood, and any dialogue. Be specific. "A detective pushes open a rain-soaked bar door, tracking shot from behind" is better than "a man walks into a bar."

Step 2. Choose your resolution and aspect ratio
Pick 720P for faster previews or 1080P for final output. Set the aspect ratio to match your platform: 16:9 for widescreen, 9:16 for vertical, 1:1 for square.

Step 3. Set the duration
Choose how long the clip runs, from 3 to 15 seconds. Longer durations give the model more time for scene transitions and dialogue.

Step 4. Hit run and download
Happy Horse 1.1 generates the video with native audio and returns an MP4.
Ready for: Premiere · DaVinci Resolve · After Effects · TikTok · YouTube

First time? Leave every setting as-is. The defaults (1080P, 16:9, 10 seconds) are the right starting point for almost everyone.

RECOMMENDED SETTINGS

Quick-start guide. Find the goal that matches yours and copy the settings.

Standard cinematic clip (most people) — 1080P, 16:9, 10 seconds, random seed. The right starting point for almost everyone.
Quick test before committing credits — 720P, shorter duration. Cheaper and faster. Check the motion and framing before running at full resolution.
Vertical for social media — 1080P, 9:16, 5 to 10 seconds. Native portrait output for TikTok, Reels, and Shorts without cropping.
Dialogue-heavy scene — Describe each speaker's appearance distinctly and write dialogue in quotes with speaker attribution. "She turns and says: 'We leave at dawn.'" Keep speakers to two or three for clean lip-sync.
Multi-shot narrative — Describe each scene transition in the prompt using "Scene 1:", "Scene 2:" labels. The model handles shot continuity across cuts.
Reproduce or tweak a result — Lock the seed number. Keeping the same seed with small prompt edits lets you adjust one element at a time.
Audio not matching the scene — Add sound descriptions directly in the prompt. "Crackling fire, distant thunder, low ambient hum" steers the generated audio.

Prompt: Front-load the subject and action. Describe camera movement using cinematic terms ("slow dolly push-in," "handheld tracking shot," "wide establishing shot"). For dialogue, write exact quotes and name the speaker. "The older man says: 'No stops, no looking back'" gives cleaner lip-sync than a vague instruction to "have the characters talk."

LEARN

📹 Videos

Intro to Floyo
ComfyUI 101 Free Course ft. Sebastian Kamph
Floyo 101 for Team Collaboration

✨ Quick links

USE CASES

🎬 Short Film and Concept Scenes
Generate cinematic multi-character scenes with dialogue and synchronized audio to test story ideas before a full production.

📱 Social Media Content
Create scroll-stopping vertical or widescreen clips for TikTok, Reels, and Shorts from a single text prompt, complete with sound.

📺 Ad and Product Spots
Produce quick video drafts for campaigns, product reveals, and explainer content without filming or stock footage.

🎮 Game and Worldbuilding Previsualization
Generate atmosphere-driven scenes to pitch environments, characters, and narrative tone to a team before committing to full asset production.

🎤 Dialogue and Voice Scenes
Create talking-character sequences with native lip-sync in multiple languages for demos, pitches, and storyboard animatics.

WHAT WORKS BEST / WHAT TO AVOID

✅ Works great

Detailed prompts with subject, action, camera, lighting, and mood
Two to three characters with distinct visual descriptions
Cinematic camera terms (dolly, tracking, orbit, push-in)
Dialogue written as direct quotes with speaker attribution

⚠️ May produce softer results

Vague prompts like "a cool scene with people"
More than three speaking characters in a single clip
Rapid complex action with many moving objects
Prompts with no camera or lighting direction

FAQ

What is Happy Horse 1.1?
Happy Horse 1.1 is a unified text-to-video model by Alibaba. It generates video and audio in a single pass using one transformer that processes text, image, and audio together. It supports multi-shot storytelling, native lip-sync in seven languages, and output up to 1080p at durations from 3 to 15 seconds.

Does Happy Horse 1.1 generate audio and dialogue automatically?
Yes. The model generates synchronized audio alongside the video in one pass. This includes ambient sound, sound effects, and spoken dialogue with lip-sync. Describe the sounds and dialogue you want directly in the prompt and the model renders them into the clip.

What languages does the lip-sync support?
Seven: English, Mandarin, Cantonese, Japanese, Korean, German, and French. Write the dialogue in the language you want and the model matches the mouth movement to the speech.

How long can the generated videos be?
Clips run from 3 to 15 seconds per generation. For longer content, generate multiple clips with consistent character descriptions and edit them together in your timeline.

Can Happy Horse 1.1 maintain characters across multiple shots in one clip?
Yes. The model supports multi-shot storytelling and maintains character appearance, lighting, and scene consistency across cuts within a single generation. Describe each shot as a separate scene in your prompt with consistent character details.

Is the output licensed for commercial use?
Happy Horse 1.1 is a proprietary Alibaba model, so commercial use is governed by the platform terms. Review the current terms of service to confirm rights for your specific use case.

How to run Happy Horse 1.1 online?
You can run Happy Horse 1.1 online through Floyo. No installation, no setup, no API key to wire up. Open the workflow in your browser, write your prompt, and hit run. Free to try.

WHY FLOYO?

Floyo is the only platform with team collaboration for ComfyUI in the browser. You run workflows with no install. You share run history, assets, and models across your team. You pay only when you generate. Floyo supports open-source and closed-source models.

A designer runs an edit and likes the result. A teammate opens that exact run from shared history and keeps going. No file handoffs. No version confusion.

For studios and enterprise teams, Floyo adds private workspaces, pooled resources, and a team usage dashboard. Other ComfyUI cloud tools run for one person at a time. Floyo runs for the whole team, with transparent per-generation costs.

Ready to try it?
Write your scene, set the duration, and hit run. The settings are already dialled in.

→ Launch Workflow, Free

Questions? Watch the free course or check the FAQ above.