Safetensors
Japanese
qwen3_vl

Qwen3-VL-4B-Instruct LoRA - FBKINGDOM Text Recognition

๋ณธ ๋ชจ๋ธ์€ Qwen/Qwen3-VL-4B-Instruct๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ FBKINGDOM ํ…์ŠคํŠธ(์ด๋ฏธ์ง€)๋ฅผ ํžˆ๋ผ๊ฐ€๋‚˜๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ํƒœ์Šคํฌ์— ํŠนํ™”๋˜๋„๋ก LoRA ๋ฏธ์„ธ ์กฐ์ •(Fine-tuning)์„ ๊ฑฐ์นœ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.

Colab์œผ๋กœ ์‚ฌ์šฉํ•ด๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฐ”๋กœ๊ฐ€๊ธฐ

๐Ÿ“Œ ์ฃผ์š” ํŠน์ง• ๋ฐ ํ•œ๊ณ„์  (Key Characteristics & Limitations)

  • ์ž์ฒด ์ƒ์„ฑ ๋ฐ์ดํ„ฐ์…‹ ํ™œ์šฉ: Font๋ฅผ ์ด์šฉํ•œ ์ž์ฒด ์ƒ์„ฑ ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•™์Šต๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
  • ๋ฌธ์žฅ ๊ธธ์ด์— ๋”ฐ๋ฅธ ์„ฑ๋Šฅ ํŽธ์ฐจ: ์งง์€ ๋ฌธ์žฅ์—์„œ๋Š” 100%์— ๋‹ฌํ•˜๋Š” ์ตœ๊ณ  ์„ฑ๋Šฅ์„ ๋ณด์ด๋‚˜, ๊ธด ๋ฌธ์žฅ(20์ž ์ด์ƒ)์—์„œ๋Š” ๋ฌธ๋งฅ ํŒŒ์•…์˜ ๋ณต์žก๋„๋กœ ์ธํ•ด ์ •ํ™•๋„๊ฐ€ ํ•˜๋ฝํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ๋ฌธ๋งฅ ๊ธฐ๋ฐ˜ ๊ธฐํ˜ธ ์ธ์‹: ๋ชจ์–‘์ด ๋™์ผํ•œ ๊ธฐํ˜ธ(์˜ˆ: ใฏ๊ฐ€ ha, pa, wa๋กœ ์ฝํžˆ๋Š” ๊ฒฝ์šฐ)๋ฅผ ๋ฌธ๋งฅ์— ๋”ฐ๋ผ ๊ตฌ๋ถ„ํ•˜๋„๋ก ํ•™์Šต๋˜์—ˆ์œผ๋‚˜, ๋ชจํ˜ธ์„ฑ์ด ๋†’์€ ๋ฌธ์žฅ์—์„œ๋Š” ๊ฐ„ํ˜น ๋ณ€ํ™˜ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ“Š ๋ชจ๋ธ ์„ฑ๋Šฅ ํ‰๊ฐ€ (Evaluation Results)

์ด 391๊ฐœ์˜ ๊ฒ€์ฆ ๋ฐ์ดํ„ฐ์…‹(Validation Set)์„ ๋Œ€์ƒ์œผ๋กœ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•œ ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค.

1. ์ „๋ฐ˜์  ์„ฑ๋Šฅ (Overall Metrics)

์ „์ฒด ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ Exact Match(์ •ํ™•ํžˆ ์ผ์น˜ํ•œ ๋น„์œจ)๋Š” 59.8%, Character Accuracy(๊ธ€์ž ๋‹จ์œ„ ์ •ํ™•๋„)๋Š” **82.1%**๋ฅผ ๊ธฐ๋กํ–ˆ์Šต๋‹ˆ๋‹ค.

  • Total Samples: 391
  • Exact Match (์ •๋‹ต๊ณผ 100% ์ผ์น˜): 234๊ฐœ (59.85%)
  • Char Accuracy (๋ฌธ์ž ๋‹จ์œ„ ์ •ํ™•๋„): 82.10%
  • Ambiguous Exact (๋ชจํ˜ธํ•œ ๋ฌธ์ž ํฌํ•จ ์‹œ ์ •ํ™•๋„): 196/345 (56.8%)
Overall Metrics

2. ๋ฌธ์žฅ ๊ธธ์ด์— ๋”ฐ๋ฅธ ์„ฑ๋Šฅ (Performance by Sequence Length)

๋ฌธ์žฅ์˜ ๊ธธ์ด์— ๋”ฐ๋ผ ๋ชจ๋ธ์˜ ์˜ˆ์ธก ์ •ํ™•๋„(Exact Match)๊ฐ€ ํฌ๊ฒŒ ๋‹ฌ๋ผ์ง€๋Š” ๊ฒฝํ–ฅ์„ ๋ณด์ž…๋‹ˆ๋‹ค. ์งง์€ ๋ฌธ์žฅ์—์„œ๋Š” ์˜ค๋‹ต์ด ์ „ํ˜€ ๋ฐœ์ƒํ•˜์ง€ ์•Š์•˜์œผ๋‚˜, ๋ฌธ์žฅ์ด ๊ธธ์–ด์งˆ์ˆ˜๋ก ์ •ํ™•๋„๊ฐ€ ์ ์ฐจ ๊ฐ์†Œํ•ฉ๋‹ˆ๋‹ค.

๋ฌธ์žฅ ๊ธธ์ด (Length) ๋ฐ์ดํ„ฐ ๊ฐœ์ˆ˜ (Total) ์ •๋‹ต ๊ฐœ์ˆ˜ (Exact) ์ •ํ™•๋„ (Accuracy)
Short 50 50 100.0%
Medium 81 72 88.9%
Long (20์ž+) 260 112 43.1%
Performance by Length

โš™๏ธ ํ•™์Šต ํ™˜๊ฒฝ (Training Configuration)

  • Base Model: Qwen/Qwen3-VL-4B-Instruct
  • Method: LoRA (Rank=64, Alpha=128, Dropout=0.05)
  • Max Sequence Length: 512
  • Epochs: 7
  • Learning Rate: 3e-5 (Cosine Scheduler with 10% Warmup)
  • Attention Implementation: Flash Attention 2
Downloads last month
26
Safetensors
Model size
4B params
Tensor type
F16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Genticca/FBK_Qwen3-VL-4B

Finetuned
(205)
this model
Quantizations
1 model

Datasets used to train Genticca/FBK_Qwen3-VL-4B