๐ŸŽญ Seoul Culture Event Recommendation RAG System (LoRA Adapter)

LoRA Architecture

๐Ÿ“– Model Overview

์ด ๋ชจ๋ธ์€ ์„œ์šธ์‹œ ๋ฌธํ™” ํ–‰์‚ฌ ์ •๋ณด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์‚ฌ์šฉ์ž์—๊ฒŒ ๋งž์ถคํ˜• ๋‹ต๋ณ€์„ ์ œ๊ณตํ•˜๋Š” RAG(Retrieval-Augmented Generation) ์‹œ์Šคํ…œ์„ ์œ„ํ•ด ๋ฏธ์„ธ์กฐ์ •(Fine-tuning)๋œ Llama-3 ๊ธฐ๋ฐ˜์˜ LoRA Adapter์ž…๋‹ˆ๋‹ค.

์‚ฌ์šฉ์ž์˜ ์งˆ๋ฌธ์— ๋Œ€ํ•ด ๋‹จ์ˆœํžˆ LLM์˜ ๋‚ด๋ถ€ ์ง€์‹์œผ๋กœ ๋‹ต๋ณ€ํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ, ์™ธ๋ถ€ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค(culture.csv)์—์„œ ์‹ค์‹œ๊ฐ„์œผ๋กœ ๊ฒ€์ƒ‰๋œ(Retrieved) ์ •ํ™•ํ•œ ํ–‰์‚ฌ ์ •๋ณด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๋‹ต๋ณ€์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

  • Base Model: beomi/Llama-3-Open-Ko-8B
  • Fine-tuning Method: QLoRA (4-bit quantization + LoRA)
  • Target Task: ์„œ์šธ ๋ฌธํ™” ํ–‰์‚ฌ ์ถ”์ฒœ ๋ฐ ์ •๋ณด ์•ˆ๋‚ด (Instruction Following)
  • Dataset Source: ์„œ์šธ์‹œ ๋ฌธํ™” ํ–‰์‚ฌ ์ •๋ณด (4,275๊ฐœ ๋ฐ์ดํ„ฐ)

๐Ÿ› ๏ธ How to Use

์ด ๋ชจ๋ธ์€ LoRA Adapter์ด๋ฏ€๋กœ, ๋ฐ˜๋“œ์‹œ ๋ฒ ์ด์Šค ๋ชจ๋ธ๊ณผ ํ•จ๊ป˜ ๋กœ๋“œํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

1. Install Dependencies

pip install torch transformers peft bitsandbytes pandas accelerate

2. Run Inference Code

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel

1. Base Model Load (4-bit Quantization for efficiency)

base_model_id = "beomi/Llama-3-Open-Ko-8B" bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.float16 )

base_model = AutoModelForCausalLM.from_pretrained( base_model_id, quantization_config=bnb_config, device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained(base_model_id)

2. Load LoRA Adapter

โš ๏ธ [YOUR_HUGGINGFACE_ID/YOUR_MODEL_NAME] ๋ถ€๋ถ„์„ ๋ณธ์ธ ๋ชจ๋ธ ID๋กœ ์ˆ˜์ •ํ•˜์„ธ์š”!

adapter_model_id = "YOUR_HUGGINGFACE_ID/YOUR_MODEL_NAME" model = PeftModel.from_pretrained(base_model, adapter_model_id)

3. Define Inference Function

def generate_response(context, persona, query): input_text = f"Context: {context} Persona: [{persona}] ์งˆ๋ฌธ: [{query}]" prompt = f"""### Instruction: {persona} ํ†ค์œผ๋กœ ๋‹ต๋ณ€ํ•ด ์ฃผ์„ธ์š”.

Input:

{input_text}

Output:

""" inputs = tokenizer(prompt, return_tensors="pt").to(model.device) with torch.no_grad(): outputs = model.generate( **inputs, max_new_tokens=300, temperature=0.7, top_p=0.9, eos_token_id=tokenizer.eos_token_id ) return tokenizer.decode(outputs, skip_special_tokens=True).split("### Output:")[-1].strip()

4. Test

context_data = "[์„œ์šธ ํ•ธ๋“œ ๋ฉ”์ด๋“œ ํŽ˜์–ด 2025, ์‚ฌ์ „ ์˜ˆ๋งค: 8,000์›, ์„œ์šธ ์‚ผ์„ฑ๋™ ์ฝ”์—‘์Šค 1์ธต Bํ™€]" print(generate_response(context_data, "์นœ์ ˆํ•œ ๋ฌธํ™” ๊ฐ€์ด๋“œ", "๊ฐ•๋‚จ๊ตฌ์—์„œ ์—ด๋ฆฌ๋Š” ํ•ธ๋“œ๋ฉ”์ด๋“œ ํ–‰์‚ฌ ์•Œ๋ ค์ค˜."))


๐Ÿ“Š Training Details

Dataset

  • Source: ์„œ์šธ์—ด๋ฆฐ๋ฐ์ดํ„ฐ๊ด‘์žฅ (๋ฌธํ™” ํ–‰์‚ฌ ์ •๋ณด)
  • Size: 4,275 rows (Original CSV), Augmented for training
  • Format: Instruction (ํŽ˜๋ฅด์†Œ๋‚˜ ์ง€์‹œ), Input (Context + Question), Output (Answer)
  • Preprocessing: UTF-8 / CP949 ์ธ์ฝ”๋”ฉ ์ž๋™ ์ฒ˜๋ฆฌ, ๊ฒฐ์ธก์น˜ ์ œ๊ฑฐ, 512 ํ† ํฐ ๊ธธ์ด ์ œํ•œ

Hyperparameters

Parameter Value Reason
LoRA Rank (r) 32 ์ดˆ๊ธฐ Rank 8/16 ์‹คํ—˜ ๊ฒฐ๊ณผ, Rank 32์—์„œ Loss 1.51๋กœ ๊ฐ€์žฅ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ ๊ธฐ๋ก
LoRA Alpha 32 Scaling factor
Target Modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj ๋ชจ๋“  Linear Layer ํ•™์Šต์œผ๋กœ ์ถ”๋ก  ๋Šฅ๋ ฅ ๊ทน๋Œ€ํ™”
Batch Size 1 Colab T4 GPU ๋ฉ”๋ชจ๋ฆฌ ์ œ์•ฝ ๊ทน๋ณต
Gradient Accumulation 8 ์‹ค์ œ Batch Size 8 ํšจ๊ณผ (์•ˆ์ •์  ์ˆ˜๋ ด)
Learning Rate 2e-4 -
Epochs 3 -
Optimizer paged_adamw_8bit ๋ฉ”๋ชจ๋ฆฌ ํšจ์œจ ์ตœ์ ํ™”

Performance Metrics

  • Validation Loss: 1.5152 (Rank 32 ๊ธฐ์ค€)
    • Rank 8 Loss: 2.1565
    • Rank 16 Loss: 1.7950
    • Rank 32 Loss: 1.5152 (Best)
  • ROUGE-L Score: 0.83 (๋†’์€ ๋‹ต๋ณ€ ์ •ํ™•๋„ ๋ฐ ์ผ๊ด€์„ฑ ํ™•์ธ)
  • Temperature Analysis: Temperature 0.5~0.7 ๊ตฌ๊ฐ„์—์„œ ROUGE ์ ์ˆ˜ ์ตœ์ ํ™” ํ™•์ธ (๋„ˆ๋ฌด ๋‚ฎ์œผ๋ฉด ๋ฐ˜๋ณต, ๋„ˆ๋ฌด ๋†’์œผ๋ฉด ํ™˜๊ฐ ๋ฐœ์ƒ)
Rank Validation Loss
8 2.1565
16 1.7950
32 1.5152

โš ๏ธ Limitations & Future Work

  1. Hallucination (ํ™˜๊ฐ ํ˜„์ƒ):
    • ๋ชจ๋ธ์ด ํ•™์Šตํ•˜์ง€ ์•Š์€ ์™ธ๋ถ€ ์ •๋ณด(Unknown Data)์— ๋Œ€ํ•ด ์•ฝ 50%์˜ ํ™˜๊ฐ ๋ฐœ์ƒ๋ฅ ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.
    • RAG ์‹œ์Šคํ…œ์—์„œ ๊ฒ€์ƒ‰๋˜์ง€ ์•Š์€ ์ •๋ณด์— ๋Œ€ํ•ด "์•Œ ์ˆ˜ ์—†์Œ"์ด๋ผ๊ณ  ๋‹ต๋ณ€ํ•˜๋„๋ก ์ถ”๊ฐ€ ํ•™์Šต(Negative Sampling)์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
  2. Hardware Constraints:
    • Colab T4 (free tier) ํ™˜๊ฒฝ์˜ ์ œ์•ฝ์œผ๋กœ ์ธํ•ด Batch Size๋ฅผ 1๋กœ ์„ค์ •ํ–ˆ์Šต๋‹ˆ๋‹ค. ๋” ํฐ VRAM ํ™˜๊ฒฝ์—์„œ Full Fine-tuning ์‹œ ์„ฑ๋Šฅ ํ–ฅ์ƒ์ด ๊ธฐ๋Œ€๋ฉ๋‹ˆ๋‹ค.
  3. Scope:
    • ํ˜„์žฌ ์„œ์šธ์‹œ ๋ฐ์ดํ„ฐ์— ํŠนํ™”๋˜์–ด ์žˆ์–ด, ํƒ€ ์ง€์—ญ ํ–‰์‚ฌ๋‚˜ ์ผ๋ฐ˜ ์ƒ์‹ ์งˆ๋ฌธ์—๋Š” ๋‹ต๋ณ€ ํ’ˆ์งˆ์ด ๋‚ฎ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ“œ Citation & License

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for hushpond/llama-3-seoul-culture-lora-rag

Finetuned
(20)
this model