floyo logo
Powered by
ThinkDiffusion
Pricing
floyo logo
Powered by
ThinkDiffusion
Pricing

Home / Models / Wan 2.7 Video on Floyo

AI VIDEO GENERATION + EDITING

Run Wan 2.7 Video on Floyo

Four models. Text-to-video, image-to-video with first/last frame control, multi-reference generation, and instruction-based video editing. Native audio. Up to 1080p.

Run Alibaba's Wan 2.7 Video through ComfyUI workflows in your browser. No API key, no installs, no local GPU.

Resolution

Up to 1080p

Duration

2 to 15 seconds

Video References

Up to 5

Audio

Native audio-visual sync

Create with Wan 2.7 Now →

No installation. Runs in browser. Updated April 2026.

What You Get

Wan 2.7 is a unified image generation, image editing, and video generation model family from Alibaba Tongyi Lab. The image models use a thinking mode that reasons about composition before generating. They support up to 4K resolution, accurate text rendering, hex color control, and multi-image editing with up to 9 references. The video models add first/last frame control, 9-grid I2V, and instruction-based video editing. Available as ComfyUI nodes on Floyo.

WAN 2.7 WORKFLOWS ON FLOYO

Wan 2.7 Text to Video

What is Wan 2.7 Image?

Wan 2.7 Image is a unified generation-and-editing model from Alibaba Tongyi Lab, released April 1, 2026. It introduces thinking mode, where the model reasons about composition, spatial relationships, and prompt logic before generating. Four variants are available: Text-to-Image (up to 2048x2048), Text-to-Image Pro (up to 4096x4096), Image Edit (up to 2048x2048), and Image Edit Pro (up to 2K enhanced).

The biggest change from previous Wan image capabilities is how the model processes your prompt. Most image models run a single forward pass. Wan 2.7 Image adds a reasoning step first. The model analyzes what you asked for (spatial layout, object relationships, text content) and plans the composition before generating pixels. The trade-off is slightly longer generation time. The payoff is better prompt adherence, especially for complex scenes with multiple elements.

Three persistent problems in AI image generation get direct fixes here. First: all AI faces look the same. Wan 2.7 Image lets you specify bone structure, face shape, and eye style in your prompt to create distinct, believable characters. Second: color drift. A new palette extraction feature lets you input hex codes or drop in a reference image to lock colors to your brand guide. Third: text rendering. Previous models could handle a short headline at best. Wan 2.7 Image renders readable text up to 3,000 tokens, enough to fill an A4 page.

On Floyo, you access Wan 2.7 Image through ComfyUI nodes. Floyo runs the model on H100 NVL GPUs with 94GB VRAM, so you get full-speed generation without needing your own hardware or managing model downloads.

What are Wan 2.7 Image's technical specifications?

Wan 2.7 Image uses a unified generation-and-understanding architecture that maps text and visual semantics into a shared latent space. Four model variants cover text-to-image and image editing at standard and pro tiers. The Pro tier supports 4K output (4096x4096). All variants include thinking mode for improved prompt adherence and composition planning.

Spec Details
DeveloperAlibaba Tongyi Lab
Model VariantsText-to-Image, Text-to-Image Pro, Image Edit, Image Edit Pro
Max Resolution (Standard)2048x2048 (2K)
Max Resolution (Pro)4096x4096 (4K)
Thinking ModeBuilt-in chain-of-thought reasoning (on by default for T2I)
Text RenderingUp to 3,000 tokens of readable text per image
Reference ImagesUp to 9 for editing, style transfer, and multi-reference fusion
Image Set GenerationUp to 12 coherent images per request
Prompt LengthUp to 5,000 characters
Color ControlHex code input and palette extraction from reference images
ComfyUI AccessWan image nodes in ComfyUI (search "Wan" in canvas)
Release DateApril 1, 2026

What is Wan 2.7's thinking mode?

Thinking mode is a reasoning step that runs before image generation. The model analyzes your prompt for spatial relationships, composition logic, and semantic intent, then plans the image layout before producing pixels. It is built into Wan 2.7 Text-to-Image and enabled by default. The result is better prompt adherence, especially for complex multi-element scenes.

This matters most for prompts that describe specific spatial arrangements ("three products arranged left to right in ascending size"), multi-element compositions ("a woman reading in a cafe with rain on the window and warm interior lighting"), and scenes requiring logical consistency ("a reflection in a mirror showing the back of the room"). Single-pass models often lose coherence on these kinds of prompts. Thinking mode reduces those failures.

The trade-off is generation time. Thinking mode adds a reasoning step, so each image takes slightly longer to produce. For simple prompts (a single subject on a plain background), the quality gain is minimal. For complex prompts, the improvement in composition and spatial accuracy is significant.

What are Wan 2.7 Image's key features?

Wan 2.7 Image combines generation and editing in a single model architecture. The feature set targets three long-standing problems in AI image generation: generic faces, unpredictable colors, and broken text rendering. Each feature below is confirmed from Alibaba's official documentation and third-party testing.

Thinking Mode

The model reasons about composition, spatial relationships, and prompt logic before generating. This produces more coherent images for complex prompts with multiple elements, specific layouts, or logical requirements like reflections and shadows. Enabled by default on Text-to-Image.

Text Rendering

Wan 2.7 Image renders readable text in generated images. Signs are legible. Product labels are accurate. Typography in posters and book covers looks designed rather than garbled. The model supports up to 3,000 tokens of text, enough to fill charts, formulas, and dense layouts. This has been the most persistent failure mode in AI image generation, and Wan 2.7 addresses it directly.

Face Personalization

You can specify bone structure, face shape (round, square, oblong), and eye style (narrow, deep-set, wide) directly in your prompt. The result is characters that look like specific, distinct individuals rather than a blended average. This is critical for storyboards, brand personas, and e-commerce models where character consistency matters.

Hex Color Control

Enter specific hex color codes and proportions in your prompt to lock the output to your brand palette. You can also drop in a reference image (a mood board, painting, or screenshot from your design system) and the model extracts the color distribution and applies it to the generated output. This removes a full round of post-production color correction for teams working under brand guidelines.

Multi-Image Editing

Upload up to 9 reference images alongside a text prompt. The model can apply style transfer, swap elements between images, and fuse multiple references into a single output. Identity is preserved where it should be: change a background to a beach sunset, and the face, pose, and clothing stay pixel-perfect while only the background transforms.

Image Set Generation

Generate up to 12 coherent images from a single prompt. The model maintains stylistic consistency across the full set. Use cases include the same character across different scenes (a cat through four seasons), product shots from different angles, storyboard sequences, and social media kits. Structured prompts that describe each image in the set produce the best results.

Click-to-Edit

Select specific areas of an image to add, move, or align elements with pixel-level accuracy. This interactive editing approach gives you precise control over individual parts of the composition without affecting the rest of the image. Available through the Image Edit variants.

What can you create with Wan 2.7 Image?

Wan 2.7 Image covers text-to-image generation, multi-reference image editing, style transfer, image set generation, and interactive click-to-edit. The combination of thinking mode, text rendering, and color control makes it suited for production workflows where brand consistency and prompt accuracy matter more than raw speed.

Capability What It Does Use Case
Text-to-ImageGenerate images from text prompts with thinking mode reasoning. Up to 2K standard, 4K with Pro.Marketing assets, concept art, social media content
Image EditingEdit images with natural language instructions. Upload up to 9 references for style transfer and element fusion.Client revisions, product variant generation, background replacement
Text RenderingGenerate readable text inside images. Supports up to 3,000 tokens including charts and formulas.Product labels, signage, poster design, infographics
Image Set GenerationGenerate up to 12 coherent images from one prompt. Maintains character and style consistency across the set.Storyboards, product catalogs, social media kits, presentation decks
Color ControlLock output to specific hex codes or extract palettes from reference images for brand-accurate colors.Brand campaigns, design system compliance, product photography
Face PersonalizationSpecify bone structure, face shape, and eye style in prompts to generate distinct, individual characters.Character design, e-commerce model generation, avatar creation

How does Wan 2.7 Image compare to other image models?

Wan 2.7 Image leads on instruction following, text rendering accuracy, and multi-reference editing. Midjourney V8 leads on artistic aesthetics. FLUX is faster for simple prompts with strong LoRA support. Seedream produces high visual quality but lacks thinking mode reasoning. Wan 2.7 is the only model in this group with built-in hex color control and image set generation up to 12 images.

Model Max Resolution Text Rendering Thinking Mode API Access
Wan 2.7 4K (4096x4096) Accurate (3,000 tokens) Yes (built-in) Yes
Midjourney V8 Up to 2K Improving, but inconsistent No No
FLUX Up to 2K Moderate No Yes
Seedream Up to 2K Limited No Yes

Source: WaveSpeedAI comparison data, Alibaba official documentation, and third-party reviews as of April 2026. Midjourney does not offer API access. Aesthetic quality is subjective and not captured in this table.

What is Wan 2.7 Video?

Wan 2.7 Video is a video generation model from Alibaba Tongyi Lab that adds bidirectional frame control (first + last frame), 9-grid multi-image I2V, natural language video editing, combined subject and voice reference-to-video, and up to 5 simultaneous video references. It uses a Diffusion Transformer architecture with MoE routing and generates up to 1080p video with native audio.

The biggest change from Wan 2.6 is the number of control inputs the model accepts in a single generation call. Previous versions gave you a prompt and a starting image. Wan 2.7 adds last-frame anchoring, 9-grid image arrays, combined voice and appearance references, natural language editing, and up to 5 simultaneous video references.

Instruction-based video editing is the feature that makes 2.7 feel qualitatively different from a pure generation model. You pass an existing video alongside a natural language instruction ("change the background to a rain-soaked street," "swap the jacket to red") and receive an edited output rather than a full regeneration. Iteration cycles that required re-generating from scratch can now be handled as lightweight edits.

On Floyo, you access Wan 2.7 Video through ComfyUI nodes. Floyo runs the model on H100 NVL GPUs with 94GB VRAM, so you get full-speed generation without needing your own hardware or managing model downloads.

Fair warning: Wan 2.7 Image launched April 1, 2026. Wan 2.7 Video launched on select cloud platforms first. Open weights for local deployment have not been officially confirmed as of this writing. Based on the Wan family's consistent open-source pattern (2.1 and 2.2 were both released under Apache 2.0), open weights are expected within 4 to 8 weeks of cloud launch. Some features like instruction-based editing and video recreation are new and should be considered promising but still maturing.

What are Wan 2.7 Video's technical specifications?

Wan 2.7 Video uses a Diffusion Transformer with MoE (Mixture of Experts) routing and a T5 text encoder. It generates video at up to 1080p resolution for 2 to 15 seconds with native audio. New in 2.7: bidirectional frame control, 9-grid multi-image input, instruction-based editing, combined subject+voice R2V, and support for up to 5 simultaneous video references.

Spec Details
ArchitectureDiffusion Transformer with T5 encoder and MoE routing
ResolutionUp to 1080p (4K reported in some configurations)
Duration2 to 15 seconds (up to 20-30 seconds reported)
Frame Rate24fps
AudioNative audio-visual generation with synchronization
Frame ControlFirst + Last frame (bidirectional, new in 2.7)
Multi-Image Input9-grid layout (3x3 image array, new in 2.7)
Video ReferencesUp to 5 simultaneous (up from 1-2 in 2.6)
Reference-to-VideoCombined subject appearance + voice in a single pass
Video EditingInstruction-based editing via natural language (new in 2.7)
Aspect Ratios16:9 (landscape), 9:16 (portrait), 1:1 (square)
ComfyUI AccessWan video nodes in ComfyUI

What can you create with Wan 2.7 Video?

Wan 2.7 Video covers text-to-video, image-to-video (single and 9-grid), first/last frame video, instruction-based video editing, reference-to-video with combined subject and voice, and video recreation. The editing and multi-reference features are new in 2.7 and change how you iterate on video content.

Capability What It Does Use Case
First/Last FrameDefine both start and end frames. The model generates motion between them, reducing generation cycles by constraining the output space.Storyboarding, looping content, narrative sequences
9-Grid I2VProvide a 3x3 array of 9 images. The model converts them into a continuous video with smooth transitions between panels.Product showcases, character turnarounds, multi-scene storyboards
Video EditingEdit existing video with natural language instructions. Change backgrounds, swap objects, adjust lighting without full regeneration.Post-production adjustments, client revisions, scene variations
Reference-to-VideoCombine a subject appearance reference with a voice reference in one generation pass. No multi-pass pipeline needed.Virtual presenters, character-led campaigns, educational content
Multi-ReferenceUse up to 5 simultaneous video references (up from 1-2 in 2.6) for multi-character scenes and brand consistency.Multi-person product demos, brand ambassador content
Video RecreationProvide a reference video and describe changes. The model preserves the original motion structure while rebuilding the visual layer.Adapting trending formats, converting footage to animation style

What changed from Wan 2.6 to Wan 2.7?

Wan 2.7 adds an entire image generation and editing suite (with thinking mode, 4K output, and text rendering) alongside five major video upgrades over 2.6: bidirectional frame control, 9-grid multi-image I2V, instruction-based video editing, combined subject+voice reference-to-video, and support for up to 5 simultaneous video references.

Feature Wan 2.6 Wan 2.7
Image GenerationBasic T2I and editingThinking mode, 4K, text rendering, hex color, 12-image sets
Frame AnchoringFirst frame onlyFirst + Last frame
Multi-Image InputNot supported9-grid (3x3 image array)
Video EditingNot supportedInstruction-based editing via natural language
Reference TypesVoice OR subject (separate)Voice + subject (combined, single pass)
Video References1 to 2Up to 5

Source: Alibaba official documentation, WaveSpeedAI feature comparisons, and Replicate model listings as of April 2026.

How does Wan 2.7 work?

Wan 2.7 Image uses a unified generation-and-understanding architecture that maps text and visual semantics into a shared latent space. Rather than treating image generation and image comprehension as separate tasks, the model couples them from the start. Wan 2.7 Video uses a Diffusion Transformer with MoE routing, where high-noise and low-noise experts specialize in different phases of the denoising process.

On the image side, the shared latent space means the model does not need to guess what your words mean. Text and image are tightly linked during training on multimodal instructions (text + image inputs together). This pushes the model beyond surface-level pixel fitting into layout planning and composition reasoning. The thinking mode adds an explicit reasoning step on top of this architecture.

On the video side, the bidirectional frame control works by accepting both a first and last frame as constraints. The model generates motion trajectories between the two endpoints, which reduces the number of generation attempts needed to get the right motion path. The 9-grid I2V mode takes a 3x3 arrangement of nine reference images and converts them into a continuous video. The grid reads left-to-right, top-to-bottom, so the panel sequence determines the scene order.

On Floyo, both image and video models run through ComfyUI nodes on H100 NVL GPUs. You can chain Wan 2.7 Image with Wan 2.7 Video in the same workflow. Generate a character image with thinking mode, then animate it with first/last frame control, then upscale the result. All in one pipeline, all in your browser.

Frequently Asked Questions

Common questions about running Wan 2.7 on Floyo.

Is Wan 2.7 free to use on Floyo?

You can try it with Floyo's free tier, which gives you 20 minutes of GPU time per day. Previous Wan versions (2.1, 2.2) were open-source under Apache 2.0, so there was no additional API cost beyond GPU time. Pricing for 2.7 depends on how it is made available on Floyo (open-source model vs. API node). Floyo gives $1 in free API credits on signup for API-based models.

How do I run Wan 2.7 without installing anything?

Open Floyo in your browser, find a Wan 2.7 workflow (search "Wan 2.7" in the template library), and click Run. Floyo handles the GPU, the ComfyUI environment, and the model weights. No local install, no Python setup, no API key required.

Who made Wan 2.7?

Alibaba Tongyi Lab, the same team behind the entire Wan model series. Wan 2.7 Image was released April 1, 2026. Wan 2.7 Video launched on cloud platforms in late March 2026. Earlier versions in the series were open-sourced on GitHub and HuggingFace.

What is Wan 2.7's thinking mode?

A reasoning step where the model analyzes composition, spatial relationships, and prompt logic before generating an image. It produces more coherent compositions and better prompt adherence, especially for complex multi-element scenes. It is built into Text-to-Image and enabled by default. The trade-off is slightly longer generation time.

Can Wan 2.7 render readable text in images?

Yes. Wan 2.7 Image has significantly improved text rendering compared to previous generations and most competitors. Signs, labels, and typography are readable and accurate. The model supports up to 3,000 tokens of text per image, enough to fill charts, formulas, and dense page layouts.

How does Wan 2.7 Image compare to Midjourney V8?

Wan 2.7 leads on instruction following, text rendering accuracy, and multi-reference editing. Midjourney V8 leads on artistic aesthetics. Wan 2.7 supports 4K output (Pro tier), hex color control, and API access. Midjourney does not offer API access. If your workflow requires precise text in images or brand-accurate colors, Wan 2.7 has the edge. If you want the strongest artistic "vibe," Midjourney is still hard to beat.

Can I combine Wan 2.7 with other AI models in one workflow?

Yes. Floyo runs ComfyUI, which lets you chain multiple models in a single workflow. Generate a character with Wan 2.7 Image using thinking mode, animate it with Wan 2.7 Video using first/last frame control, then upscale the result. All in one pipeline, all in your browser.

Does Wan 2.7 Video generate audio with video?

Yes. Wan 2.7 Video includes native audio-visual generation. Audio and video are synchronized during generation. The enhanced reference-to-video mode can also combine a voice reference with a subject appearance reference in a single pass.

What is the maximum resolution for Wan 2.7?

For images: up to 2048x2048 (standard) or 4096x4096 (Pro tier). For video: up to 1080p at 24fps, with 4K reported in some configurations. Video duration ranges from 2 to 15 seconds, with up to 20-30 seconds reported in extended modes.

Are open weights available for Wan 2.7?

As of this writing, Wan 2.7 launched on cloud platforms and via API first. Open weights have not been officially confirmed for the 2.7 series. Based on the Wan family's pattern (2.1 and 2.2 were both open-sourced under Apache 2.0), open weights are expected within 4 to 8 weeks of the cloud launch. On Floyo, you can run Wan 2.7 through ComfyUI without needing local weights.

Wan 2.7 is Now Live on Floyo

Thinking mode image generation, 4K output, text rendering, first/last frame video, and native audio. Run it in your browser.

Explore Wan 2.7 Workflow → Browse All Models

Related Reading

Film and Animation Workflows on Floyo

Setting Up an AI Production Pipeline for Your Studio

Top AI Models on Floyo

Last updated: April 2026. Specs from Alibaba Tongyi Lab official documentation, Alibaba Cloud press release, WaveSpeedAI model listings, Replicate model documentation, and third-party reviews.

Wan 2.7 - Text to Video

Alibaba

Audio

Text to Video

Wan 2.7

Generate video from a text prompt using Alibaba's Wan 2.7 model. Set your resolution, aspect ratio, and duration, then hit Run. Audio input supported.

Wan 2.7 - Text to Video

Generate video from a text prompt using Alibaba's Wan 2.7 model. Set your resolution, aspect ratio, and duration, then hit Run. Audio input supported.

Table of Contents