# LinaCodec **Repository Path**: xiaotian666/LinaCodec ## Basic Information - **Project Name**: LinaCodec - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-04-16 - **Last Updated**: 2026-04-16 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README ## Linacodec: Highly compressive audio tokenizer for speech models.
Linacodec is an audio tokenizer that compresses audio into just 12.5 tokens per second (171 bps) and decodes to 48khz audio. https://github.com/user-attachments/assets/77094afd-2d5d-490e-b283-9100e74a69a4 ### Key benefits * Compression: 12.5 tokens/sec (60x more compressed than DAC). * Audio Quality: 48khz output (much clearer then 16khz/24khz which is the standard). * Encoder Speed: 200x realtime. * Decoder Speed: 400x realtime(even faster with batching) * Many Tasks: Indirectly even supports voice conversion, audio super-resolution, and audio denoising! ### Why is this even useful? Audio tokenizers directly contribute to speed, quality, and capability of TTS/ASR models. LinaCodec massively improves upon previous codecs in these areas. * Inference Speed: Enables TTS models to run 800x realtime, 8x faster than [MiraTTS](https://github.com/ysharma3501)! * Fast training: High-quality TTS models can be trained in less then 1 day. * Versatile: Works for both Text-to-Speech and Speech-to-Text unlike most other codecs. ### Comparisons | Model | Total Tokens/Sec | Sample Rate | | :--- | :--- | :--- | | Linacodec | 12.5 | 48khz | | DAC | 774 | 44.1khz | | EnCodec | 300 | 24khz | | Xcodec2 | 50 | 16khz | | Mimi | 200 | 24khz | Lower tokens/sec means faster models and higher sample rate means more clarity. ### Usage Simple 1 line installation: ``` pip install git+https://github.com/ysharma3501/LinaCodec.git ``` Reconstruction ```python from IPython.display import Audio from linacodec.codec import LinaCodec ## load model lina_tokenizer = LinaCodec() ## will download YatharthS/LinaCodec from huggingface ## get speech tokens and global embedding speech_tokens, global_embedding = lina_tokenizer.encode("your_audio_path.wav") ## decode them into 48khz audio audio = lina_tokenizer.decode(speech_tokens, global_embedding) ## display audio display(Audio(audio.cpu(), rate=48000)) ``` Voice conversion ```python ## Assuming you have loaded model source_wav = "source_wav.wav" ## the content you want reference_wav = "reference_wav.wav" ## the timbre(style) you want ## convert voice audio = lina_tokenizer.convert_voice(source_wav, reference_wav) ## display audio display(Audio(audio.cpu(), rate=48000)) ``` Audio super resolution ```python ## get speech tokens and global embedding from 24khz wav speech_tokens, global_embedding = lina_tokenizer.encode("your_audio_path.wav") ## decode them into 48khz audio(upsamples from 24khz-->48khz) audio = lina_tokenizer.decode(speech_tokens, global_embedding) ## display audio display(Audio(audio.cpu(), rate=48000)) ``` ### Notes This is heavily based of [kanade-tokenizer](https://github.com/frothywater/kanade-tokenizer) so massive thanks to them! The key novel parts I added are: 1. Dual-Path Vocos Decoder: Enables high-quality 48kHz reconstruction from original 24khz vocos using only 30 hours of training data (compared to the typical hundreds of hours). 2. Distilled WavLM Base+: Increased encoder speed while being similar quality. 3. Snake based upsampling: Used custom upsampling block to upscale features based off snake activation from [BigVGAN](https://github.com/NVIDIA/BigVGAN). ## Next steps - [x] Release code and model - [ ] Release article on how kanade and Lina work so well at rates of 12.5 t/s compared to others. - [ ] Possible paper on how these techniques can easily work on any codec. Stars and Likes would be appreciated if found helpful, thank you. Model link: https://huggingface.co/YatharthS/LinaCodec Email: yatharthsharma3501@gmail.com