Create with Alibaba Happy Horse model now! Try here 👉

Pricing

Create with Alibaba Happy Horse model now! Try here 👉

COMMUNITY PAGE

Veo 3.1 - Run in the Browser

Now Available on Floyo

Veo 3.1

Google DeepMind's latest video generation model with native audio. Run directly in your browser on Floyo - no installation, no setup, no API configuration required.

What is Veo 3.1?

Veo 3.1 is Google DeepMind's latest video generation model, released in October 2025. It generates videos with native audio including dialogue, sound effects, and ambient noise. Veo 3.1 is state of the art for text-to-video, image-to-video, and combined audio-video generation with strong physics simulation and prompt adherence. Veo 3.1 supports 8-second base clips extendable to 141 seconds, up to 3 reference images for character consistency, and camera controls for precise framing. You can run Veo 3.1 in your browser on Floyo without any installation.

Base Length

141s

Max Extended

Reference Images

Native

Audio Generation

Open Veo 3.1 Workflow

What's new in Veo 3.1?

Veo 3.1 improves on Veo 3 with richer audio, stronger prompt adherence, and enhanced realism. These upgrades focus on audiovisual quality - videos now capture true-to-life textures with better narrative control across extended scenes.

Native Audio

Generates dialogue, sound effects, and ambient audio synchronized with video. Lip sync works with spoken lines.

Physics Simulation

Improved real-world physics for natural motion. Water, fabric, and object interactions look more realistic.

Prompt Adherence

More accurate responses to your instructions. Complex prompts translate better to final output.

What are the technical specs?

Veo 3.1 generates 8-second video clips at base, extendable to 141 seconds through iterative 7-second extensions. The model produces synchronized native audio including dialogue, ambient sound, and sound effects. It supports text-to-video, image-to-video, and multi-image reference inputs.

Developer	Google DeepMind
Release Date	October 2025
Base Video Length	8 seconds
Extended Video Length	Up to 141 seconds (7s increments)
Audio Generation	Native (dialogue, SFX, ambient)
Input Types	Text, single image, multiple reference images
Reference Images	Up to 3 (ingredients to video)
Camera Controls	Zoom, pan, move (up/down/left/right)
Safety Features	SynthID watermarking, content filtering
API Availability	Gemini API, Vertex AI, Google AI Studio

What features does Veo 3.1 include?

Veo 3.1 introduces creative controls beyond basic text-to-video. You can provide reference images for characters, objects, or styles. The model supports scene extension, first and last frame control for transitions, camera movements, and object manipulation in existing footage.

Ingredients to Video

Use up to 3 reference images for characters, objects, or scenes. The model incorporates them into your final video.

Scene Extension

Extend videos by 7 seconds at a time, up to 141 seconds total. Audio stays consistent across extensions.

First and Last Frame

Provide starting and ending images. Veo generates a smooth video transition between them.

Camera Controls

Control zoom, pan, and camera movement direction for precise framing in your shots.

Style Reference

Upload a style reference image and Veo matches the visual aesthetic in your generated video.

Object Add/Remove

Insert new objects into existing videos or remove unwanted elements. Shadows and lighting adjust automatically.

What can you create with Veo 3.1?

Veo 3.1 handles cinematic scenes, animated sequences, and realistic footage with synchronized audio. You can generate videos from text prompts, transform images into video, extend existing clips, and control camera movements. The model generates dialogue, sound effects, and ambient audio natively.

Some things people are making with it:

Short films with spoken dialogue and ambient sound
Animated sequences in styles like origami, stop-motion, or 2D anime
Product videos and marketing content
Scene transitions using first and last frame control
Character-consistent video sequences

Why run Veo 3.1 through ComfyUI?

ComfyUI gives you node-based control over your generation pipeline. You can chain Veo 3.1 with other models, add preprocessing steps, or build custom workflows that match your specific needs. On Floyo, these workflows run in your browser with no local setup.

The visual workflow approach also makes it easier to reproduce results. You can save workflows, share them, or modify individual nodes without rewriting prompts from scratch.

How do you use Veo 3.1 on Floyo?

Open the Veo 3.1 workflow on Floyo, enter your text prompt, and run the workflow. The video generates on cloud GPUs, so your local machine specs do not matter. For image-to-video, upload your reference images to the appropriate nodes before running.

Go to the Veo 3.1 workflow on Floyo
Enter your prompt in the text input node. Be specific about camera angles, lighting, and audio you want.
If using reference images, upload them to the image input nodes
Click Run and wait for generation to complete
Download your video or extend it with additional prompts

How does Veo 3.1 handle audio?

Veo 3.1 generates audio natively alongside the video. This includes spoken dialogue that syncs with character lip movements, ambient sounds that match the environment, and sound effects tied to on-screen actions. You describe the audio you want in your prompt.

According to Google DeepMind, generating natural spoken audio for shorter speech segments is still an area of active development. Longer dialogue tends to work better than quick one-liners.

Frequently Asked Questions

Is Veo 3.1 free to use?

Veo 3.1 access depends on your Floyo plan. Video generation uses compute credits based on video length and quality settings. Check Floyo's pricing page for current rates.

Who made Veo 3.1?

Veo 3.1 was developed by Google DeepMind. It was announced in October 2025 as an update to Veo 3, with improved audiovisual quality and new creative controls.

How does Veo 3.1 compare to Sora?

Both are state-of-the-art video models. Veo 3.1's main differentiator is native audio generation. Community feedback suggests Veo leads in fine detail and physics simulation, while Sora has different strengths in certain motion styles. Results vary by use case.

How long can Veo 3.1 videos be?

Base generation is 8 seconds. Using scene extension, you can create videos up to 141 seconds by extending in 7-second increments. Each extension continues from the last second of your previous clip.

Do I need a powerful GPU to run Veo 3.1 on Floyo?

No. Floyo runs ComfyUI workflows on cloud infrastructure. Your local machine just needs a browser. The actual video generation happens on remote GPUs.

Can I use my own images with Veo 3.1?

Yes. The ingredients-to-video feature accepts up to 3 reference images for characters, objects, or style. You can also use image-to-video to animate a single image, or provide first and last frames for transition effects.

Ready to try Veo 3.1?

Open the workflow on Floyo and start generating videos with native audio in your browser.

Open Veo 3.1 Workflow

Table of Contents