ComfyUI-Step_Audio_EditX_TTS

Author Saganaki22

https://github.com/Saganaki22/ComfyUI-Step_Audio_EditX_TTS

Last updated

2025-12-04

Run hundreds of ComfyUI nodes and workflows in your browser.

Native nodes for ComfyUI, Step Audio EditX provides cutting-edge zero-shot voice cloning and advanced audio editing capabilities. Users can manipulate voice characteristics such as emotion, style, and speed, enhancing the versatility of audio outputs.

Zero-Shot Voice Cloning: Clone any voice using just a short audio reference, enabling diverse applications from gaming to voiceovers.
Advanced Audio Editing: Modify existing audio to adjust emotions, styles, and speeds, while also incorporating effects and noise reduction.
Modular Workflow: The system's design allows for distinct cloning and editing processes, streamlining user interactions and enhancing productivity.

Context

Step Audio EditX is an innovative extension for ComfyUI that integrates advanced voice cloning and audio editing functionalities. Its primary aim is to empower users to create high-quality audio outputs by manipulating voice characteristics seamlessly within the ComfyUI environment.

Key Features & Benefits

This tool offers practical features such as zero-shot voice cloning, which allows users to generate speech in a cloned voice from a brief audio sample. Additionally, it provides extensive audio editing capabilities, enabling users to refine audio quality by adjusting emotional tone, speaking style, and other characteristics. The native integration into ComfyUI ensures a smooth user experience without the need for additional programming languages or complex setups.

Advanced Functionalities

Step Audio EditX supports advanced capabilities such as smart chunking for long-form content, allowing users to input extensive text while maintaining coherent audio output. It also features iterative editing, which enables multiple passes for stronger effects, enhancing the overall quality and expressiveness of the audio generated.

Practical Benefits

By utilizing Step Audio EditX, users can significantly improve their workflow and control over audio outputs in ComfyUI. The tool enhances quality and efficiency, allowing for quick adjustments and iterations that lead to professional-grade audio without extensive manual editing.

Credits/Acknowledgments

The tool is developed by StepFun AI, with the model available on Hugging Face. It is integrated into ComfyUI by the community, and the project is licensed under the MIT license. Contributions and feedback are encouraged to improve the tool further.

Inner Nodes

StepAudio_AudioEdit

StepAudio_VoiceClone

Run workflows with this node

Community workflows using ComfyUI-Step_Audio_EditX_TTS.

floyoofficial

291

Audio2Audio

Audio Editing

Step Audio EditX

Voice Cloning

Upload a voice sample, transcribe it automatically with Whisper, then use Step-Audio EditX to clone that voice speaking your custom script. No trigger word needed.

Step Audio EditX for Voice Cloning

Upload a voice sample, transcribe it automatically with Whisper, then use Step-Audio EditX to clone that voice speaking your custom script. No trigger word needed.

floyoofficial

218

Audio2Audio

Step Audio EditX

Voice Editing

Edit existing voice recordings with Step-Audio EditX. Change emotion, dialect, or style. Whisper transcribes your audio so you describe the edit, not the source.

Step Audio EditX for Voice Editing

Edit existing voice recordings with Step-Audio EditX. Change emotion, dialect, or style. Whisper transcribes your audio so you describe the edit, not the source.

Author

Saganaki22