API

Pricing

Workflows

API

Pricing

Fish Audio S2 TTS - Expressive Text to Speech

Expressive Text to Speech

API

audio generation

expressive tts

text to speech

voice synthesis

243

Generates in about 3 secs

floyoofficial

Nodes & Models

Floyo Partner Nodes

FishAudioTTSAdvanced_floyo

Ver Private

Comm Use

ComfyUI_Comfyroll_CustomNodes

CR Prompt Text

ComfyUI Official

WorkflowGraphics

SaveAudio

Fish Audio S2 text-to-speech with fine-grained emotion and tone control built into the prompt.

Write your script, drop in emotion tags like [happy], [whispering], or [professional broadcast tone], and the model reads it back with those qualities baked in. No post-production. No separate audio editing pass. What you write is what you hear.

Runs the S2 Pro FP8 model. One input field, one audio file out.

How do you control emotion and tone in Fish Audio S2 TTS?

Add emotion tags directly in your text prompt. Wrap a tag like [angry], [laughing], or [soft tone] before the line you want affected. The model adjusts delivery at that exact point. You can stack multiple tags through a single script to shift tone sentence by sentence.

Text prompt This is your script. Drop emotion tags anywhere in the line to change how it's read. [happy] Welcome back! sounds warm and upbeat. [whispers] Don't tell anyone. drops to near-silent delivery. Mix them freely across sentences.

Here's a quick reference for what's available:

Basic emotions: [angry] [sad] [excited] [happy] [fearful] [surprised] [satisfied] [nervous] [confused] [curious]

Tone and delivery: [whispering] [soft tone] [shouting] [professional broadcast tone] [pitch up] [pitch down] [in a hurry tone]

Sound effects: [laughing] [sobbing] [sighing] [chuckling] [inhale] [exhale] [pause] [short pause] [clearing throat]

Volume: [loud] [low volume] [whisper in small voice]

Tags apply from where you place them until the next tag. Experiment. The same sentence reads completely differently with [excited] vs [serious] in front of it.

Temperature / top-p Default is 0.8 for both. Want more variation between runs? Push toward 1.0. Need consistent, predictable output for a repeating character voice? Drop toward 0.5. The tradeoff: higher values are more expressive, lower values are more stable.

Repetition penalty Default: 1.1. If the output stutters or repeats a phrase, nudge this up slightly. If it sounds clipped or unnatural, pull it back toward 1.0.

Seed Set to a fixed number to reproduce a specific output. Leave on randomize when exploring delivery options.

What is Fish Audio S2 TTS good for?

Fish Audio S2 is built for scripted speech that needs emotional range. Think character dialogue, narration with mood shifts, interactive fiction, social content, or any use case where flat monotone delivery breaks the experience.

Script work where tone carries the scene. A horror narration needs [fearful] delivery. A kids' story needs [excited] and [delighted]. Fish S2 handles the shift without needing multiple voice actors or a separate editing pass.

Podcast-style content where you want a polished, broadcast-quality voice with controlled pacing via [pause] and [short pause] tags.

Not the right choice for: long-form audiobooks where consistent voice identity matters across hours of output. For that, a cloning-based TTS with a reference audio is a better fit.

FAQ

What emotion tags does Fish Audio S2 support? Over 60 tags across six categories: basic emotions, advanced emotions, tone and delivery, sound effects, volume control, and dynamic effects. Drop any tag in brackets directly before the text it should affect. Tags apply inline, so you can shift tone multiple times within a single line.

How do I make Fish Audio S2 sound consistent across runs? Set a fixed seed number and keep your temperature and top-p values stable. With the same seed and settings, the output is reproducible. Switch to randomize when you're exploring delivery options.

Can I combine multiple emotion tags in one script? Yes. Tags carry forward from where you place them until the next tag. [excited] This is great news! [soft tone] But here's the catch. shifts delivery mid-script. Chain as many as your script needs.

What's the difference between temperature and top-p in TTS? Both affect variation in output. Temperature controls how adventurous the model's choices are. Top-p limits which options are on the table. Lower values: stable, predictable delivery. Higher values: more expressive variation between runs.

How do I run Fish Audio S2 TTS online? You can run Fish Audio S2 TTS online through Floyo. No installation, no setup. Open the workflow in your browser, type your script with emotion tags, and hit run. Free to try.

Discover more workflows

You might like these too.

floyoofficial

162

audiodit

audio generation

longcat

text to speech

tts

Turn text into spoken audio with LongCat AudioDiT 3.5B, Meituan's open-source diffusion TTS model. Clean voice quality in English and Chinese, no setup.

LongCat AudioDiT for TTS

Turn text into spoken audio with LongCat AudioDiT 3.5B, Meituan's open-source diffusion TTS model. Clean voice quality in English and Chinese, no setup.

floyoofficial

25.2k

AiVideo

API

image to video

video generation

wan 2.5

Wan 2.5: Image to Video with Audio

Qwen Image Edit 2509: Build a LoRA Dataset

floyoofficial

6.8k

character consistency

Dataset

Image to Image

LoRA

Qwen Image Edit 2509

Create Character LoRA Dataset

Qwen Image Edit 2509: Build a LoRA Dataset

Create Character LoRA Dataset

pixelworld_ai

2.4k

Character Sheet

Flux

Image to Image

Kontext

Create a character sheet with multiple poses and expressions from a single image!

Image to Character Sheet with Kontext

Create a character sheet with multiple poses and expressions from a single image!

Z-Image Turbo: Fast Image Generation in Seconds

floyoofficial

21.9k

Marketing

Photography

Production

Text2Image

Z-Image Turbo

Fast Image Generation in Seconds

Z-Image Turbo: Fast Image Generation in Seconds

Fast Image Generation in Seconds

floyoofficial

14.6k

VFX

Video2Video

Video Production

Wan2.6

Wan 2.6 Reference to Video

floyoofficial

14.6k

API

gemini 3 pro

Image2Image

typography

Google just released Nano Banana Pro, and honestly, it's a pretty big step up from the original Nano Banana. The main thing? It can actually put legible text in images now. Like, real text that you can read, not the garbled nonsense most AI models spit out.

Nano Banana Pro: Generate & Edit Images