File size: 3,514 Bytes
2648b51
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cca0bff
2648b51
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
---
pipeline_tag: image-text-to-text
library_name: transformers
tags:
- custom_code
- radiology
- medical
- MLLM
- RRG
- llava-rad
language:
- en
base_model:
- lmsys/vicuna-7b-v1.5
- microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224
base_model_relation: merge
---
# LlaVA-Rad 

LlaVA-Rad is a 7 billion parameter small multimodal model trained to produce findings given an input chest X-ray. Its architecture follows that of [LLaVA](https://arxiv.org/abs/2310.03744) and [LLaVA-Med](https://arxiv.org/abs/2306.00890), differing in the use of a specialized chest X-ray image encoder, BiomedCLIP-CXR, built with the [BiomedCLIP](https://arxiv.org/abs/2303.00915) framework. LLaVA-Rad offers outstanding performance at relatively small model size.

πŸ“Œ Note: For original model weights, refer to [microsoft/llava-rad](https://huggingface.co/microsoft/llava-rad). 

πŸ“ƒ Original paper: [Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation](https://arxiv.org/abs/2403.08002).

***
# πŸ”¬ Experimental Usage in Libra's repo

This model checkpoint is intended for **experimental** use and can be tested directly within the [**Libra repository**](https://github.com/X-iZhang/Libra).

For better benchmarking, we recommend using the official test set from [X-iZhang/MIMIC-CXR-RRG](https://huggingface.co/datasets/X-iZhang/MIMIC-CXR-RRG).


### Use-case:

```python
# πŸ§ͺ Inference example following the official LlaVA-Rad setup
from libra.eval import libra_eval

image_file = "https://openi.nlm.nih.gov/imgs/512/253/253/CXR253_IM-1045-1001.png"
model_path = "X-iZhang/libra-llava-rad"

answer = libra_eval(
    model_path=model_path,
    image_file=image_file,
    query="Describe the findings of the chest x-ray.\n",
    conv_mode="v1",          # Use default version
    temperature=0.0,         # Use greedy decoding
    max_new_tokens=1024,
)

# βœ… Expected output
print(answer)
# > Frontal and lateral chest radiographs demonstrate a moderate left
# > pneumothorax.  The right lung is clear.  The cardiomediastinal and hilar
# > contours are normal.
```
> [!WARNING]
> - LLaVA-Rad outputs are formatted as structured report text with line breaks (`\n`) intentionally preserved.
> - When performing automatic evaluation (e.g., ROUGE, BLEU, RadGraph), make sure to normalise or flatten the text if required.

## πŸ“š Learn More


For a deeper dive into the methodology, theoretical insights, and performance benchmarks of the Libra framework, please see the following resources:

- πŸ”— **Project Website**: [Libra v1.0](https://x-izhang.github.io/Libra_v1.0/)  
- πŸ“„ **Paper**: [arXiv:2411.19378](https://arxiv.org/abs/2411.19378)  
- πŸ’» **Code Repository**: [X-iZhang/Libra (GitHub)](https://github.com/X-iZhang/Libra)
- πŸ“· **Related Project**: [CCD – Clinical Change Detection](https://x-izhang.github.io/CCD/); see technical details in the paper [here](https://arxiv.org/abs/2509.23379).

---

### Disclaimer

This implementation is intended strictly for research and benchmarking purposes.
It is not validated for clinical use, and any application in real-world diagnosis or treatment is strongly discouraged.

If any use case is found to violate these intended purposes (e.g., clinical deployment, misleading medical claims),
the maintainers reserve the right to remove related code, models, or access permissions without prior notice.

### License
[MSRLA](https://huggingface.co/microsoft/llava-rad/blob/main/LICENSE) license.

---