ComfyUI-Speaker-Isolation – ComfyUI Node

A custom node designed for ComfyUI, this tool facilitates speaker diarization by extracting distinct audio tracks for individual speakers from a single audio input. It utilizes pyannote.audio to identify and isolate up to four speakers, providing a comprehensive summary of the diarization process.

Processes a single audio input to separate speaker tracks.
Outputs individual audio files for each detected speaker while preserving the original audio's duration.
Generates a summary detailing the number of speakers and their active durations.

Context

This tool serves as a specialized node within ComfyUI, aimed at enhancing audio processing workflows by isolating speaker voices from a mixed audio source. Its primary function is to facilitate speaker diarization, making it easier for users to analyze conversations or multi-speaker recordings.

Key Features & Benefits

The node accepts a single audio file and employs pyannote.audio for speaker identification. It can output up to four separate audio tracks, each corresponding to a recognized speaker, ensuring that the original audio's duration is maintained with silence during inactive periods. Additionally, it provides a summary of the diarization results, which is crucial for understanding the composition of the audio content.

Advanced Functionalities

This tool is equipped to handle various operational scenarios, including the ability to run on different devices (CPU or GPU) based on user preference. It also supports offline functionality by caching models after the initial download, allowing users to operate without a continuous internet connection once the models are set up.

Practical Benefits

By integrating this node into their workflows, users can significantly enhance their control over audio content, streamline the process of analyzing multi-speaker recordings, and improve the overall quality of their audio projects. The ability to isolate speakers not only aids in clearer audio analysis but also enhances the efficiency of editing and processing tasks within ComfyUI.

Credits/Acknowledgments

This tool is based on contributions from the pyannote.audio library, with the original authors and contributors acknowledged in the respective repository. Users must comply with licensing agreements for the models utilized, which are hosted on Hugging Face.