SentenceTransformer based on sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2

This is a sentence-transformers model finetuned from sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 64, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("PrabalAryal/Sentence_Transformer_v0.0.3")
# Run inference
sentences = [
    'De verlichting van de gang werkt niet',
    'Donker',
    'Het raamglas is gebroken',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Binary Classification

Metric Value
cosine_accuracy 0.9692
cosine_accuracy_threshold 0.7461
cosine_f1 0.969
cosine_f1_threshold 0.7461
cosine_precision 0.9773
cosine_recall 0.9608
cosine_ap 0.9925
cosine_mcc 0.9386

Training Details

Training Dataset

Unnamed Dataset

  • Size: 33,054 training samples
  • Columns: sentence_0, sentence_1, and label
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 label
    type string string float
    details
    • min: 3 tokens
    • mean: 7.39 tokens
    • max: 18 tokens
    • min: 3 tokens
    • mean: 6.78 tokens
    • max: 16 tokens
    • min: 0.0
    • mean: 0.51
    • max: 1.0
  • Samples:
    sentence_0 sentence_1 label
    Defect Lek 1.0
    Dakbedekking Weggewaaid 1.0
    Het slot werkt niet, de deur is geblokkeerd druppende kraan 0.0
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • num_train_epochs: 10
  • fp16: True
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Epoch Step Training Loss cosine_ap
0.1992 103 - 0.7450
0.3985 206 - 0.8186
0.5977 309 - 0.8716
0.7969 412 - 0.9074
0.9671 500 4.302 -
0.9961 515 - 0.9307
1.0 517 - 0.9315
1.1954 618 - 0.9434
1.3946 721 - 0.9555
1.5938 824 - 0.9637
1.7930 927 - 0.9676
1.9342 1000 3.5031 -
1.9923 1030 - 0.9725
2.0 1034 - 0.9731
2.1915 1133 - 0.9741
2.3907 1236 - 0.9753
2.5899 1339 - 0.9802
2.7892 1442 - 0.9806
2.9014 1500 3.2728 -
2.9884 1545 - 0.9843
3.0 1551 - 0.9846
3.1876 1648 - 0.9839
3.3868 1751 - 0.9844
3.5861 1854 - 0.9847
3.7853 1957 - 0.9868
3.8685 2000 3.1567 -
3.9845 2060 - 0.9882
4.0 2068 - 0.9876
4.1838 2163 - 0.9880
4.3830 2266 - 0.9880
4.5822 2369 - 0.9887
4.7814 2472 - 0.9887
4.8356 2500 3.0525 -
4.9807 2575 - 0.9899
5.0 2585 - 0.9901
5.1799 2678 - 0.9896
5.3791 2781 - 0.9894
5.5783 2884 - 0.9904
5.7776 2987 - 0.9906
5.8027 3000 3.0061 -
5.9768 3090 - 0.9910
6.0 3102 - 0.9911
6.1760 3193 - 0.9904
6.3752 3296 - 0.9907
6.5745 3399 - 0.9915
6.7698 3500 2.9548 -
6.7737 3502 - 0.9915
6.9729 3605 - 0.9917
7.0 3619 - 0.9917
7.1721 3708 - 0.9912
7.3714 3811 - 0.9915
7.5706 3914 - 0.9916
7.7369 4000 2.9023 -
7.7698 4017 - 0.9917
7.9691 4120 - 0.9919
8.0 4136 - 0.9921
8.1683 4223 - 0.9919
8.3675 4326 - 0.9919
8.5667 4429 - 0.9925

Framework Versions

  • Python: 3.11.13
  • Sentence Transformers: 4.1.0
  • Transformers: 4.53.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.9.0
  • Datasets: 4.4.1
  • Tokenizers: 0.21.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
1
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for PrabalAryal/Sentence_Transformer_v0.0.3

Papers for PrabalAryal/Sentence_Transformer_v0.0.3

Evaluation results