LFM2-Audio-1.5B-GGUF / README.md

tarek-liquid

Upload LICENSE and README

4872567 verified 3 months ago

preview code

raw

history blame

3.14 kB

metadata

license: other
license_name: lfm1.0
license_link: LICENSE
tags:
  - audio
  - liquid
  - lfm2
  - edge
  - llama.cpp
  - gguf
base_model:
  - LiquidAI/LFM2-Audio-1.5B

LFM2-Audio-1.5B

This example demonstrates the LFM2-Audio-1.5B audio model.

Link to HF: LiquidAI/LFM2-Audio-1.5B.

The model supports following modes

ASR:
- input audio.wav, output text
TTS:
- input text, output audio.wav
interleaved:
- input text or audio.wav, output text and audio.wav

GGUFS

There are total 3 GGUFs for this model.

Set $CKPT to path to the path containing downloaded GGUFs. Set $INPUT_WAV to path to input wav file.

export CKPT=/data/playground/checkpoints/LFM2-Audio-1.5B-GGUF
export INPUT_WAV=/tmp/input.wav
export OUTPUT_WAV=/tmp/output.wav

(cd $CKPT && ls *.gguf)
audiodecoder-LFM2-Audio-1.5B-Q8_0.gguf  LFM2-Audio-1.5B-Q8_0.gguf  mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf

Optionally, float16 ggufs can be downloaded and used by replacing Q8_0 with F16.

Binaries

runners folder contain runners for ubuntu-x64 and macos-arm64.

runners
├── macos-arm64
│   └── bin
│       ├── llama-lfm2-audio
│       └── llama-mtmd-cli
└── ubuntu-x64
    └── bin
        ├── llama-lfm2-audio
        └── llama-mtmd-cli

Run using `llama-lfm2-audio`

There are 3 supported modes

ASR
TTS
interleaved

The mode is defined by system prompt. There are limitations on system prompt and binary will check for them and raise an error if needed.

ASR

ASR requires -sys "Perform ASR." and --audio audio.wav for input. It will print text to console

bin/llama-lfm2-audio -m $CKPT/LFM2-Audio-1.5B-Q8_0.gguf --mmproj $CKPT/mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf -mv $CKPT/audiodecoder-LFM2-Audio-1.5B-Q8_0.gguf -sys "Perform ASR." --audio $INPUT_WAV

TTS

TTS requires -sys "Perform TTS.", -p "What is this obsession people have with books?" for input, and --output output.wav for output. It will save audio to output.wav.

bin/llama-lfm2-audio -m $CKPT/LFM2-Audio-1.5B-Q8_0.gguf --mmproj $CKPT/mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf -mv $CKPT/audiodecoder-LFM2-Audio-1.5B-Q8_0.gguf -sys "Perform TTS." -p "What is this obsession people have with books?" --output $OUTPUT_WAV

Interleaved

Interleaved produces both, text and audio as output, and can consume text or audio as input.

bin/llama-lfm2-audio -m $CKPT/LFM2-Audio-1.5B-Q8_0.gguf --mmproj $CKPT/mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf -mv $CKPT/audiodecoder-LFM2-Audio-1.5B-Q8_0.gguf -sys "Respond with interleaved text and audio." --audio $INPUT_WAV --output $OUTPUT_WAV

Run ASR using `llama-mtmd-cli`

Build llama-mtmd-cli following the standard build procedure.

bin/llama-mtmd-cli -m $CKPT/LFM2-Audio-1.5B-Q8_0.gguf --mmproj $CKPT/mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf -p "<__media__>" -sys "Perform ASR." --audio $INPUT_WAV

Debug

For reproducible results set --temp 0.