|
|
--- |
|
|
license: other |
|
|
license_name: lfm1.0 |
|
|
license_link: LICENSE |
|
|
tags: |
|
|
- audio |
|
|
- liquid |
|
|
- lfm2 |
|
|
- edge |
|
|
- llama.cpp |
|
|
- gguf |
|
|
base_model: |
|
|
- LiquidAI/LFM2-Audio-1.5B |
|
|
--- |
|
|
|
|
|
# LFM2-Audio-1.5B |
|
|
|
|
|
This example demonstrates the **LFM2-Audio-1.5B** audio model. |
|
|
|
|
|
Link to HF: [LiquidAI/LFM2-Audio-1.5B](https://huggingface.co/LiquidAI/LFM2-Audio-1.5B). |
|
|
|
|
|
The model supports following modes |
|
|
- ASR: |
|
|
- input `audio.wav`, output `text` |
|
|
- TTS: |
|
|
- input `text`, output `audio.wav` |
|
|
- interleaved: |
|
|
- input `text` or `audio.wav`, output `text` and `audio.wav` |
|
|
|
|
|
## GGUFS |
|
|
|
|
|
There are total 3 GGUFs for this model. |
|
|
|
|
|
Set `$CKPT` to path to the path containing downloaded GGUFs. |
|
|
Set `$INPUT_WAV` to path to input wav file. |
|
|
|
|
|
```console |
|
|
export CKPT=/data/playground/checkpoints/LFM2-Audio-1.5B-GGUF |
|
|
export INPUT_WAV=/tmp/input.wav |
|
|
export OUTPUT_WAV=/tmp/output.wav |
|
|
``` |
|
|
|
|
|
```console |
|
|
(cd $CKPT && ls *.gguf) |
|
|
audiodecoder-LFM2-Audio-1.5B-Q8_0.gguf LFM2-Audio-1.5B-Q8_0.gguf mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf |
|
|
``` |
|
|
|
|
|
Optionally, float16 ggufs can be downloaded and used by replacing `Q8_0` with `F16`. |
|
|
|
|
|
## Binaries |
|
|
|
|
|
`runners` folder contain runners for ubuntu-x64 and macos-arm64. |
|
|
```console |
|
|
runners |
|
|
βββ macos-arm64 |
|
|
βΒ Β βββ bin |
|
|
βΒ Β βββ llama-lfm2-audio |
|
|
βΒ Β βββ llama-mtmd-cli |
|
|
βββ ubuntu-x64 |
|
|
βββ bin |
|
|
βββ llama-lfm2-audio |
|
|
βββ llama-mtmd-cli |
|
|
``` |
|
|
|
|
|
## Run using `llama-lfm2-audio` |
|
|
|
|
|
There are 3 supported modes |
|
|
- ASR |
|
|
- TTS |
|
|
- interleaved |
|
|
|
|
|
The mode is defined by system prompt. There are limitations on system prompt and binary will check for them and raise an error if needed. |
|
|
|
|
|
### ASR |
|
|
|
|
|
ASR requires `-sys "Perform ASR."` and `--audio audio.wav` for input. It will print text to console |
|
|
|
|
|
```console |
|
|
bin/llama-lfm2-audio -m $CKPT/LFM2-Audio-1.5B-Q8_0.gguf --mmproj $CKPT/mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf -mv $CKPT/audiodecoder-LFM2-Audio-1.5B-Q8_0.gguf -sys "Perform ASR." --audio $INPUT_WAV |
|
|
``` |
|
|
|
|
|
### TTS |
|
|
|
|
|
TTS requires `-sys "Perform TTS."`, `-p "What is this obsession people have with books?"` for input, and `--output output.wav` for output. It will save audio to `output.wav`. |
|
|
```console |
|
|
bin/llama-lfm2-audio -m $CKPT/LFM2-Audio-1.5B-Q8_0.gguf --mmproj $CKPT/mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf -mv $CKPT/audiodecoder-LFM2-Audio-1.5B-Q8_0.gguf -sys "Perform TTS." -p "What is this obsession people have with books?" --output $OUTPUT_WAV |
|
|
``` |
|
|
|
|
|
### Interleaved |
|
|
|
|
|
Interleaved produces both, text and audio as output, and can consume text or audio as input. |
|
|
|
|
|
```console |
|
|
bin/llama-lfm2-audio -m $CKPT/LFM2-Audio-1.5B-Q8_0.gguf --mmproj $CKPT/mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf -mv $CKPT/audiodecoder-LFM2-Audio-1.5B-Q8_0.gguf -sys "Respond with interleaved text and audio." --audio $INPUT_WAV --output $OUTPUT_WAV |
|
|
``` |
|
|
|
|
|
## Run ASR using `llama-mtmd-cli` |
|
|
|
|
|
Build `llama-mtmd-cli` following the standard build procedure. |
|
|
|
|
|
```console |
|
|
bin/llama-mtmd-cli -m $CKPT/LFM2-Audio-1.5B-Q8_0.gguf --mmproj $CKPT/mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf -p "<__media__>" -sys "Perform ASR." --audio $INPUT_WAV |
|
|
``` |
|
|
|
|
|
### Debug |
|
|
|
|
|
For reproducible results set `--temp 0`. |
|
|
|