my-QA-model

This model is a fine-tuned version of distilbert-base-uncased on an SQuAD v1.1 dataset.

Model description

This is a transformer-based extractive Question Answering (QA) model fine-tuned on the Stanford Question Answering Dataset (SQuAD v1.1).
It takes a context paragraph and a natural language question as input and returns the most probable span in the text that answers the question.

  • Architecture: DistilBERT
  • Dataset: SQuAD v1.1 (~100k question-answer pairs)
  • Task Type: Extractive Question Answering
  • Training Objective: Predict start and end token positions of the answer span
  • Evaluation Metrics: Exact Match (EM) and F1 Score

Intended uses & limitations

This model is designed for extractive question answering where the answer exists within a provided context.
It can be applied in reading comprehension tasks, chatbots, document search, automated quiz generation, educational tools, and research on transformer-based QA systems.

However, the model has limitations:

  • It can only answer questions if the answer is present in the given text.
  • It struggles with multi-hop reasoning, abstract inference, and answers requiring outside knowledge.
  • Ambiguous or vague questions may result in incorrect spans.
  • Performance may degrade on domains that differ significantly from Wikipedia (SQuAD’s source).
  • It may reflect biases in the training data.

Training and evaluation data

The model was fine-tuned on the Stanford Question Answering Dataset (SQuAD v1.1), a large-scale reading comprehension dataset consisting of over 100,000 question–answer pairs on Wikipedia articles.

  • Training set: ~87,599 examples
  • Validation set: ~10,570 examples
  • Each example contains a context paragraph, a question, and the corresponding answer span within the paragraph.

Evaluation was performed on the SQuAD v1.1 validation set using Exact Match (EM) and F1 score metrics.

Training procedure

  1. Base Model: A pre-trained transformer model Distibert-base-uncased from Hugging Face.
  2. Tokenization: Used the model's corresponding tokenizer with:
    • max_length=384
    • truncation='only_second'
    • stride=128 for sliding window over long contexts
  3. Optimization:
    • Optimizer: AdamW
    • Learning rate: 3e-5
    • Weight decay: 0.01
    • Batch size: 16–32 (depending on GPU memory)
    • Epochs: 2–3
  4. Loss Function: Cross-entropy loss over start and end token positions.
  5. Evaluation: Computed Exact Match (EM) and F1 score after each epoch.
  6. Checkpointing: Best model saved based on highest F1 score on validation set.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 3
  • mixed_precision_training: Native AMP

Training results

The model achieved the following results on the SQuAD v1.1 validation set:

Metric Score
Exact Match (EM) 51%
F1 Score 70.2%
Training Loss (final) 0.64%

These results are comparable to other transformer-based models fine-tuned on SQuAD , demonstrating strong extractive question answering capabilities.

Framework versions

  • Transformers 4.55.0
  • Pytorch 2.6.0+cu124
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
1
Safetensors
Model size
66.4M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for PredatorAlpha/my-QA-model

Finetuned
(10555)
this model

Dataset used to train PredatorAlpha/my-QA-model

Evaluation results