my-QA-model

This model is a fine-tuned version of distilbert-base-uncased on an SQuAD v1.1 dataset.

Model description

This is a transformer-based extractive Question Answering (QA) model fine-tuned on the Stanford Question Answering Dataset (SQuAD v1.1).
It takes a context paragraph and a natural language question as input and returns the most probable span in the text that answers the question.

Architecture: DistilBERT
Dataset: SQuAD v1.1 (~100k question-answer pairs)
Task Type: Extractive Question Answering
Training Objective: Predict start and end token positions of the answer span
Evaluation Metrics: Exact Match (EM) and F1 Score

Intended uses & limitations

This model is designed for extractive question answering where the answer exists within a provided context.
It can be applied in reading comprehension tasks, chatbots, document search, automated quiz generation, educational tools, and research on transformer-based QA systems.

However, the model has limitations:

It can only answer questions if the answer is present in the given text.
It struggles with multi-hop reasoning, abstract inference, and answers requiring outside knowledge.
Ambiguous or vague questions may result in incorrect spans.
Performance may degrade on domains that differ significantly from Wikipedia (SQuAD’s source).
It may reflect biases in the training data.

Training and evaluation data

The model was fine-tuned on the Stanford Question Answering Dataset (SQuAD v1.1), a large-scale reading comprehension dataset consisting of over 100,000 question–answer pairs on Wikipedia articles.

Training set: ~87,599 examples
Validation set: ~10,570 examples
Each example contains a context paragraph, a question, and the corresponding answer span within the paragraph.

Evaluation was performed on the SQuAD v1.1 validation set using Exact Match (EM) and F1 score metrics.

Training procedure

Base Model: A pre-trained transformer model Distibert-base-uncased from Hugging Face.
Tokenization: Used the model's corresponding tokenizer with:
- max_length=384
- truncation='only_second'
- stride=128 for sliding window over long contexts
Optimization:
- Optimizer: AdamW
- Learning rate: 3e-5
- Weight decay: 0.01
- Batch size: 16–32 (depending on GPU memory)
- Epochs: 2–3
Loss Function: Cross-entropy loss over start and end token positions.
Evaluation: Computed Exact Match (EM) and F1 score after each epoch.
Checkpointing: Best model saved based on highest F1 score on validation set.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 3
mixed_precision_training: Native AMP

Training results

The model achieved the following results on the SQuAD v1.1 validation set:

Metric	Score
Exact Match (EM)	51%
F1 Score	70.2%
Training Loss (final)	0.64%

These results are comparable to other transformer-based models fine-tuned on SQuAD , demonstrating strong extractive question answering capabilities.

Framework versions

Transformers 4.55.0
Pytorch 2.6.0+cu124
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Safetensors

Model size

66.4M params

Tensor type

F32

Model tree for PredatorAlpha/my-QA-model

Base model

distilbert/distilbert-base-uncased

Finetuned

(10555)

this model

PredatorAlpha
/

my-QA-model