tarek-liquid's picture
Upload LICENSE and README
4872567 verified
|
raw
history blame
3.14 kB
metadata
license: other
license_name: lfm1.0
license_link: LICENSE
tags:
  - audio
  - liquid
  - lfm2
  - edge
  - llama.cpp
  - gguf
base_model:
  - LiquidAI/LFM2-Audio-1.5B

LFM2-Audio-1.5B

This example demonstrates the LFM2-Audio-1.5B audio model.

Link to HF: LiquidAI/LFM2-Audio-1.5B.

The model supports following modes

  • ASR:
    • input audio.wav, output text
  • TTS:
    • input text, output audio.wav
  • interleaved:
    • input text or audio.wav, output text and audio.wav

GGUFS

There are total 3 GGUFs for this model.

Set $CKPT to path to the path containing downloaded GGUFs. Set $INPUT_WAV to path to input wav file.

export CKPT=/data/playground/checkpoints/LFM2-Audio-1.5B-GGUF
export INPUT_WAV=/tmp/input.wav
export OUTPUT_WAV=/tmp/output.wav
(cd $CKPT && ls *.gguf)
audiodecoder-LFM2-Audio-1.5B-Q8_0.gguf  LFM2-Audio-1.5B-Q8_0.gguf  mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf

Optionally, float16 ggufs can be downloaded and used by replacing Q8_0 with F16.

Binaries

runners folder contain runners for ubuntu-x64 and macos-arm64.

runners
β”œβ”€β”€ macos-arm64
β”‚   └── bin
β”‚       β”œβ”€β”€ llama-lfm2-audio
β”‚       └── llama-mtmd-cli
└── ubuntu-x64
    └── bin
        β”œβ”€β”€ llama-lfm2-audio
        └── llama-mtmd-cli

Run using llama-lfm2-audio

There are 3 supported modes

  • ASR
  • TTS
  • interleaved

The mode is defined by system prompt. There are limitations on system prompt and binary will check for them and raise an error if needed.

ASR

ASR requires -sys "Perform ASR." and --audio audio.wav for input. It will print text to console

bin/llama-lfm2-audio -m $CKPT/LFM2-Audio-1.5B-Q8_0.gguf --mmproj $CKPT/mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf -mv $CKPT/audiodecoder-LFM2-Audio-1.5B-Q8_0.gguf -sys "Perform ASR." --audio $INPUT_WAV

TTS

TTS requires -sys "Perform TTS.", -p "What is this obsession people have with books?" for input, and --output output.wav for output. It will save audio to output.wav.

bin/llama-lfm2-audio -m $CKPT/LFM2-Audio-1.5B-Q8_0.gguf --mmproj $CKPT/mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf -mv $CKPT/audiodecoder-LFM2-Audio-1.5B-Q8_0.gguf -sys "Perform TTS." -p "What is this obsession people have with books?" --output $OUTPUT_WAV

Interleaved

Interleaved produces both, text and audio as output, and can consume text or audio as input.

bin/llama-lfm2-audio -m $CKPT/LFM2-Audio-1.5B-Q8_0.gguf --mmproj $CKPT/mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf -mv $CKPT/audiodecoder-LFM2-Audio-1.5B-Q8_0.gguf -sys "Respond with interleaved text and audio." --audio $INPUT_WAV --output $OUTPUT_WAV

Run ASR using llama-mtmd-cli

Build llama-mtmd-cli following the standard build procedure.

bin/llama-mtmd-cli -m $CKPT/LFM2-Audio-1.5B-Q8_0.gguf --mmproj $CKPT/mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf -p "<__media__>" -sys "Perform ASR." --audio $INPUT_WAV

Debug

For reproducible results set --temp 0.