LiquidAI
/

LFM2-Audio-1.5B-GGUF

Model card Files Files and versions

LFM2-Audio-1.5B-GGUF / README.md

tarek-liquid's picture

Upload LICENSE and README

4872567 verified 3 months ago

|

3.14 kB

	---
	license: other
	license_name: lfm1.0
	license_link: LICENSE
	tags:
	- audio
	- liquid
	- lfm2
	- edge
	- llama.cpp
	- gguf
	base_model:
	- LiquidAI/LFM2-Audio-1.5B
	---

	# LFM2-Audio-1.5B

	This example demonstrates the LFM2-Audio-1.5B audio model.

	Link to HF: [LiquidAI/LFM2-Audio-1.5B](https://huggingface.co/LiquidAI/LFM2-Audio-1.5B).

	The model supports following modes
	- ASR:
	- input `audio.wav`, output `text`
	- TTS:
	- input `text`, output `audio.wav`
	- interleaved:
	- input `text` or `audio.wav`, output `text` and `audio.wav`

	## GGUFS

	There are total 3 GGUFs for this model.

	Set `$CKPT` to path to the path containing downloaded GGUFs.
	Set `$INPUT_WAV` to path to input wav file.

	```console
	export CKPT=/data/playground/checkpoints/LFM2-Audio-1.5B-GGUF
	export INPUT_WAV=/tmp/input.wav
	export OUTPUT_WAV=/tmp/output.wav
	```

	```console
	(cd $CKPT && ls *.gguf)
	audiodecoder-LFM2-Audio-1.5B-Q8_0.gguf LFM2-Audio-1.5B-Q8_0.gguf mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf
	```

	Optionally, float16 ggufs can be downloaded and used by replacing `Q8_0` with `F16`.

	## Binaries

	`runners` folder contain runners for ubuntu-x64 and macos-arm64.
	```console
	runners
	├── macos-arm64
	│ └── bin
	│ ├── llama-lfm2-audio
	│ └── llama-mtmd-cli
	└── ubuntu-x64
	└── bin
	├── llama-lfm2-audio
	└── llama-mtmd-cli
	```

	## Run using `llama-lfm2-audio`

	There are 3 supported modes
	- ASR
	- TTS
	- interleaved

	The mode is defined by system prompt. There are limitations on system prompt and binary will check for them and raise an error if needed.

	### ASR

	ASR requires `-sys "Perform ASR."` and `--audio audio.wav` for input. It will print text to console

	```console
	bin/llama-lfm2-audio -m $CKPT/LFM2-Audio-1.5B-Q8_0.gguf --mmproj $CKPT/mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf -mv $CKPT/audiodecoder-LFM2-Audio-1.5B-Q8_0.gguf -sys "Perform ASR." --audio $INPUT_WAV
	```

	### TTS

	TTS requires `-sys "Perform TTS."`, `-p "What is this obsession people have with books?"` for input, and `--output output.wav` for output. It will save audio to `output.wav`.
	```console
	bin/llama-lfm2-audio -m $CKPT/LFM2-Audio-1.5B-Q8_0.gguf --mmproj $CKPT/mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf -mv $CKPT/audiodecoder-LFM2-Audio-1.5B-Q8_0.gguf -sys "Perform TTS." -p "What is this obsession people have with books?" --output $OUTPUT_WAV
	```

	### Interleaved

	Interleaved produces both, text and audio as output, and can consume text or audio as input.

	```console
	bin/llama-lfm2-audio -m $CKPT/LFM2-Audio-1.5B-Q8_0.gguf --mmproj $CKPT/mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf -mv $CKPT/audiodecoder-LFM2-Audio-1.5B-Q8_0.gguf -sys "Respond with interleaved text and audio." --audio $INPUT_WAV --output $OUTPUT_WAV
	```

	## Run ASR using `llama-mtmd-cli`

	Build `llama-mtmd-cli` following the standard build procedure.

	```console
	bin/llama-mtmd-cli -m $CKPT/LFM2-Audio-1.5B-Q8_0.gguf --mmproj $CKPT/mmproj-audioencoder-LFM2-Audio-1.5B-Q8_0.gguf -p "<__media__>" -sys "Perform ASR." --audio $INPUT_WAV
	```

	### Debug

	For reproducible results set `--temp 0`.