k2-fsa

non-profit

https://github.com/k2-fsa/

AI & ML interests

FSA/FST algorithms, differentiable, with PyTorch compatibility. Automatic speech recognition

Recent Activity

yfyeung authored a paper 4 days ago

UAT: Unified Audio-Text Diffusion for Audio Generation, Editing, and Captioning

csukuangfj updated a Space 5 days ago

k2-fsa/automatic-speech-recognition

csukuangfj updated a Space 13 days ago

k2-fsa/text-to-speech

View all activity

authored a paper 4 days ago

UAT: Unified Audio-Text Diffusion for Audio Generation, Editing, and Captioning

Paper • 2606.04939 • Published 5 days ago

updated a Space 5 days ago

Automatic Speech Recognition

Transcribe audio files to text with Next‑gen Kaldi

updated a Space 13 days ago

tts Text To Speech

Text-to-speech (TTS) with Next-gen Kaldi

in k2-fsa/web-assembly-en-tts-pocket 21 days ago

stuck at 100% downalind data

#1 opened 21 days ago by

authored a paper 21 days ago

Evaluating the Expressive Appropriateness of Speech in Rich Contexts

Paper • 2605.09413 • Published 29 days ago • 5

updated 5 Spaces 26 days ago

Web Assembly Zh En Tts Zipvoice

ZipVoice voice cloning

Web Assembly En Tts Pocket

Pocket TTS English voice cloning

Wasm Speech Enhancement Gtcrn

WebAssembly speech enhancement

Web Assembly Vad Sherpa Onnx

Detect spoken segments from your microphone in real time

Web Assembly Ten Vad Sherpa Onnx

Detect speech activity in real time from your microphone

authored a paper 26 days ago

WavCube: Unifying Speech Representation for Understanding and Generation via Semantic-Acoustic Joint Modeling

Paper • 2605.06407 • Published May 7

updated a model about 1 month ago

k2-fsa/OmniVoice

Text-to-Speech • 0.6B • Updated May 7 • 2.4M • 997

updated a dataset about 2 months ago

k2-fsa/TTS_eval_datasets

Viewer • Updated Apr 22 • 5.36k • 389 • 3

updated a model about 2 months ago

k2-fsa/TTS_eval_models

Text-to-Speech • Updated Apr 22 • 3

in k2-fsa/TTS_eval_models about 2 months ago

Add library_name, paper links, and citation

#1 opened about 2 months ago by

in k2-fsa/OpenDialog about 2 months ago

Add task categories, language tags, and project links

#2 opened about 2 months ago by

authored a paper about 2 months ago

Representation-Regularized Convolutional Audio Transformer for Audio Understanding

Paper • 2601.21612 • Published Jan 29 • 1

updated a Space about 2 months ago

OmniVoice

High-quality voice cloning TTS for 600+ languages

submitted a paper to Daily Papers 4 months ago

DreamActor-M2: Universal Character Image Animation via Spatiotemporal In-Context Learning

Paper • 2601.21716 • Published Jan 29 • 13

authored a paper 5 months ago

SLAM-LLM: A Modular, Open-Source Multimodal Large Language Model Framework and Best Practice for Speech, Language, Audio and Music Processing

Paper • 2601.09385 • Published Jan 14