PyTorch Book - Sentiment Analysis Model

πŸ“š λͺ¨λΈ μ„€λͺ… (Model Description)

이 λͺ¨λΈμ€ μ˜ν™” 리뷰에 λŒ€ν•œ 감성 뢄석(Sentiment Analysis)을 μˆ˜ν–‰ν•©λ‹ˆλ‹€. HuggingFace Transformers 라이브러리의 DistilBERT λͺ¨λΈμ„ 기반으둜 IMDb λ°μ΄ν„°μ…‹μ—μ„œ ν•™μŠ΅λ˜μ—ˆμŠ΅λ‹ˆλ‹€.

This model performs sentiment analysis on movie reviews. Based on DistilBERT from HuggingFace Transformers, fine-tuned on the IMDb dataset.

🎯 ν•™μŠ΅ 데이터 (Training Data)

  • Dataset: IMDb Movie Reviews
  • Size: 25,000 training samples
  • Classes: 2 (Positive / Negative)
  • Language: English

πŸš€ μ‚¬μš© 방법 (Usage)

Python

from transformers import pipeline

# νŒŒμ΄ν”„λΌμΈ 생성
classifier = pipeline("sentiment-analysis", model="aiegoo/pytorch-book")

# 감성 뢄석 μˆ˜ν–‰
result = classifier("This movie is amazing!")
print(result)
# [{'label': 'POSITIVE', 'score': 0.9998}]

직접 λͺ¨λΈ λ‘œλ“œ

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("aiegoo/pytorch-book")
model = AutoModelForSequenceClassification.from_pretrained("aiegoo/pytorch-book")

# 토큰화 및 예츑
inputs = tokenizer("I love this movie!", return_tensors="pt")
outputs = model(**inputs)

πŸ“Š μ„±λŠ₯ (Performance)

  • Accuracy: ~92% (on test set)
  • F1 Score: ~0.91
  • Model Size: 67M parameters (DistilBERT)

πŸ—οΈ λͺ¨λΈ μ•„ν‚€ν…μ²˜ (Model Architecture)

  • Base Model: distilbert-base-uncased-finetuned-sst-2-english
  • Type: Sequence Classification
  • Framework: PyTorch + Transformers

πŸ“ ν•™μŠ΅ κ³Όμ • (Training Process)

  1. ν† ν¬λ‚˜μ΄μ €: BERT WordPiece tokenizer
  2. μ „μ²˜λ¦¬: μ†Œλ¬Έμž λ³€ν™˜, μ΅œλŒ€ 512 토큰
  3. 데이터셋: IMDb 25,000 samples
  4. 배치 크기: 16
  5. μ΅œμ ν™”: AdamW

πŸŽ“ ꡐ윑 λͺ©μ  (Educational Purpose)

이 λͺ¨λΈμ€ PyTorch Book ν•™μŠ΅ κ³Όμ •μ˜ μΌλΆ€λ‘œ μƒμ„±λ˜μ—ˆμŠ΅λ‹ˆλ‹€:

  • Week 2, Day 6: HuggingFace Transformers
  • Topic: Tokenizer, Dataset, Pre-trained Models
  • Environment: Local Jupyter + Google Colab

This model was created as part of the PyTorch Book learning curriculum.

⚠️ μ œν•œμ‚¬ν•­ (Limitations)

  • μ˜μ–΄ ν…μŠ€νŠΈμ— μ΅œμ ν™”λ¨ (Optimized for English text)
  • μ˜ν™” 리뷰 도메인에 νŠΉν™”λ¨ (Specialized for movie review domain)
  • κΈ΄ ν…μŠ€νŠΈλŠ” 512 ν† ν°μœΌλ‘œ 잘림 (Long text truncated to 512 tokens)

πŸ“„ λΌμ΄μ„ μŠ€ (License)

MIT License - 자유둭게 μ‚¬μš© κ°€λŠ₯ν•©λ‹ˆλ‹€.

πŸ”— κ΄€λ ¨ 링크 (Related Links)

πŸ‘€ μ œμž‘μž (Created by)

  • Author: aiegoo
  • Course: AI Track - Week 02, Day 6
  • Date: November 2025

Created with ❀️ for learning PyTorch and Transformers

Downloads last month
22
Safetensors
Model size
67M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train aiegoo/pytorch-book