# LinaCodec **Repository Path**: xiaotian666/LinaCodec ## Basic Information - **Project Name**: LinaCodec - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-04-16 - **Last Updated**: 2026-04-16 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README ## Linacodec: Highly compressive audio tokenizer for speech models.

Hugging Face Model

Linacodec is an audio tokenizer that compresses audio into just 12.5 tokens per second (171 bps) and decodes to 48khz audio. https://github.com/user-attachments/assets/77094afd-2d5d-490e-b283-9100e74a69a4 ### Key benefits * Compression: 12.5 tokens/sec (60x more compressed than DAC). * Audio Quality: 48khz output (much clearer then 16khz/24khz which is the standard). * Encoder Speed: 200x realtime. * Decoder Speed: 400x realtime(even faster with batching) * Many Tasks: Indirectly even supports voice conversion, audio super-resolution, and audio denoising! ### Why is this even useful? Audio tokenizers directly contribute to speed, quality, and capability of TTS/ASR models. LinaCodec massively improves upon previous codecs in these areas. * Inference Speed: Enables TTS models to run 800x realtime, 8x faster than [MiraTTS](https://github.com/ysharma3501)! * Fast training: High-quality TTS models can be trained in less then 1 day. * Versatile: Works for both Text-to-Speech and Speech-to-Text unlike most other codecs. ### Comparisons | Model | Total Tokens/Sec | Sample Rate | | :--- | :--- | :--- | | Linacodec | 12.5 | 48khz | | DAC | 774 | 44.1khz | | EnCodec | 300 | 24khz | | Xcodec2 | 50 | 16khz | | Mimi | 200 | 24khz | Lower tokens/sec means faster models and higher sample rate means more clarity. ### Usage Simple 1 line installation: ``` pip install git+https://github.com/ysharma3501/LinaCodec.git ``` Reconstruction ```python from IPython.display import Audio from linacodec.codec import LinaCodec ## load model lina_tokenizer = LinaCodec() ## will download YatharthS/LinaCodec from huggingface ## get speech tokens and global embedding speech_tokens, global_embedding = lina_tokenizer.encode("your_audio_path.wav") ## decode them into 48khz audio audio = lina_tokenizer.decode(speech_tokens, global_embedding) ## display audio display(Audio(audio.cpu(), rate=48000)) ``` Voice conversion ```python ## Assuming you have loaded model source_wav = "source_wav.wav" ## the content you want reference_wav = "reference_wav.wav" ## the timbre(style) you want ## convert voice audio = lina_tokenizer.convert_voice(source_wav, reference_wav) ## display audio display(Audio(audio.cpu(), rate=48000)) ``` Audio super resolution ```python ## get speech tokens and global embedding from 24khz wav speech_tokens, global_embedding = lina_tokenizer.encode("your_audio_path.wav") ## decode them into 48khz audio(upsamples from 24khz-->48khz) audio = lina_tokenizer.decode(speech_tokens, global_embedding) ## display audio display(Audio(audio.cpu(), rate=48000)) ``` ### Notes This is heavily based of [kanade-tokenizer](https://github.com/frothywater/kanade-tokenizer) so massive thanks to them! The key novel parts I added are: 1. Dual-Path Vocos Decoder: Enables high-quality 48kHz reconstruction from original 24khz vocos using only 30 hours of training data (compared to the typical hundreds of hours). 2. Distilled WavLM Base+: Increased encoder speed while being similar quality. 3. Snake based upsampling: Used custom upsampling block to upscale features based off snake activation from [BigVGAN](https://github.com/NVIDIA/BigVGAN). ## Next steps - [x] Release code and model - [ ] Release article on how kanade and Lina work so well at rates of 12.5 t/s compared to others. - [ ] Possible paper on how these techniques can easily work on any codec. Stars and Likes would be appreciated if found helpful, thank you. Model link: https://huggingface.co/YatharthS/LinaCodec Email: yatharthsharma3501@gmail.com