cs-job-resume-model / README.md
hetbhagatji09's picture
Add new SentenceTransformer model
8cb538a verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - dense
  - generated_from_trainer
  - dataset_size:480
  - loss:MultipleNegativesRankingLoss
base_model: sentence-transformers/all-mpnet-base-v2
widget:
  - source_sentence: >-
      Backend Developer required. Looking for expertise in Python, Django, REST
      APIs, Databases, Caching.Python;Django;REST APIs;Databases;Caching
    sentences:
      - >-
        Summary: 2+ years experience. Skills: React, PostgreSQL, Docker,
        MongoDB, REST, Unit Testing. Projects: Worked on a project that
        implemented React and PostgreSQL to deliver production-ready features,
        collaborated in Agile teams. Experience: 2 years developing systems
        using React, PostgreSQL, Docker, MongoDB.React;PostgreSQL;Docker;MongoDB
      - >-
        Summary: 6+ years experience. Skills: Android SDK, Swift, iOS SDK,
        Kotlin, CI/CD, APIs. Projects: Worked on a project that implemented
        Android SDK and Swift to deliver production-ready features, collaborated
        in Agile teams. Experience: 6 years developing systems using Android
        SDK, Swift, iOS SDK, Kotlin.Android SDK;Swift;iOS SDK;Kotlin
      - >-
        Summary: Experience in Caching, Python, Django and related
        tools.Caching;Python;Django
  - source_sentence: >-
      Backend Developer required. Looking for expertise in Python, Django, REST
      APIs, Databases, Caching.Python;Django;REST APIs;Databases;Caching
    sentences:
      - >-
        Summary: Experience in Excel, ETL, PowerBI and related
        tools.Excel;ETL;PowerBI
      - >-
        Summary: 4+ years experience. Skills: Jenkins, Terraform, Grafana,
        Prometheus, TDD, Git. Projects: Worked on a project that implemented
        Jenkins and Terraform to deliver production-ready features, collaborated
        in Agile teams. Experience: 4 years developing systems using Jenkins,
        Terraform, Grafana, Prometheus.Jenkins;Terraform;Grafana;Prometheus
      - >-
        Summary: Experience in Python, REST APIs, Databases and related
        tools.Python;REST APIs;Databases
  - source_sentence: >-
      Mobile Engineer required. We are looking for an engineer with 1+ years of
      experience. Responsibilities include building and maintaining systems
      using REST APIs, Objective-C, Android SDK, iOS SDK. Familiarity with
      Linux, APIs is a plus. Experience with scalable systems and good
      engineering practices required.REST APIs;Objective-C;Android SDK;iOS SDK
    sentences:
      - >-
        Summary: 5+ years experience. Skills: Spark, ETL, TensorFlow,
        Kubernetes, CI/CD, Linux. Projects: Worked on a project that implemented
        Spark and ETL to deliver production-ready features, collaborated in
        Agile teams. Experience: 5 years developing systems using Spark, ETL,
        TensorFlow, Kubernetes.Spark;ETL;TensorFlow;Kubernetes
      - >-
        Summary: 1+ years experience. Skills: TensorFlow, Spark, Kubernetes,
        PyTorch, TDD, Unit Testing. Projects: Worked on a project that
        implemented TensorFlow and Spark to deliver production-ready features,
        collaborated in Agile teams. Experience: 1 years developing systems
        using TensorFlow, Spark, Kubernetes,
        PyTorch.TensorFlow;Spark;Kubernetes;PyTorch
      - >-
        Summary: 2+ years experience. Skills: Python, Django, CI/CD, Node.js,
        Agile, Linux. Projects: Worked on a project that implemented Python and
        Django to deliver production-ready features, collaborated in Agile
        teams. Experience: 2 years developing systems using Python, Django,
        CI/CD, Node.js.Python;Django;CI/CD;Node.js
  - source_sentence: >-
      DevOps Engineer required. We are looking for an engineer with 5+ years of
      experience. Responsibilities include building and maintaining systems
      using Grafana, Docker, Prometheus, Terraform. Familiarity with APIs, CI/CD
      is a plus. Experience with scalable systems and good engineering practices
      required.Grafana;Docker;Prometheus;Terraform
    sentences:
      - >-
        Summary: Experience in SQL, PostgreSQL, Optimization and related
        tools.SQL;PostgreSQL;Optimization
      - >-
        Summary: 5+ years experience. Skills: Java, React Native, Objective-C,
        Flutter, APIs, Unit Testing. Projects: Worked on a project that
        implemented Java and React Native to deliver production-ready features,
        collaborated in Agile teams. Experience: 5 years developing systems
        using Java, React Native, Objective-C, Flutter.Java;React
        Native;Objective-C;Flutter
      - >-
        Summary: 7+ years experience. Skills: CI/CD, Grafana, Ansible, GCP,
        APIs, REST. Projects: Worked on a project that implemented CI/CD and
        Grafana to deliver production-ready features, collaborated in Agile
        teams. Experience: 7 years developing systems using CI/CD, Grafana,
        Ansible, GCP.CI/CD;Grafana;Ansible;GCP
  - source_sentence: >-
      Full Stack Engineer required. We are looking for an engineer with 1+ years
      of experience. Responsibilities include building and maintaining systems
      using Python, Express, React, JavaScript. Familiarity with Unit Testing,
      Agile is a plus. Experience with scalable systems and good engineering
      practices required.Python;Express;React;JavaScript
    sentences:
      - >-
        Summary: 6+ years experience. Skills: SASS, TypeScript, Tailwind,
        JavaScript, REST, APIs. Projects: Worked on a project that implemented
        SASS and TypeScript to deliver production-ready features, collaborated
        in Agile teams. Experience: 6 years developing systems using SASS,
        TypeScript, Tailwind, JavaScript.SASS;TypeScript;Tailwind;JavaScript
      - >-
        Summary: 5+ years experience. Skills: Express, CI/CD, React, JavaScript,
        Git, APIs. Projects: Worked on a project that implemented Express and
        CI/CD to deliver production-ready features, collaborated in Agile teams.
        Experience: 5 years developing systems using Express, CI/CD, React,
        JavaScript.Express;CI/CD;React;JavaScript
      - >-
        Summary: Experience in Android Studio, Kotlin, Java and related
        tools.Android Studio;Kotlin;Java
datasets:
  - hetbhagatji09/job-resume-embedding-finetuning
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
  - cosine_accuracy
model-index:
  - name: SentenceTransformer based on sentence-transformers/all-mpnet-base-v2
    results:
      - task:
          type: triplet
          name: Triplet
        dataset:
          name: ai job validation
          type: ai-job-validation
        metrics:
          - type: cosine_accuracy
            value: 0.75
            name: Cosine Accuracy
      - task:
          type: triplet
          name: Triplet
        dataset:
          name: ai job test
          type: ai-job-test
        metrics:
          - type: cosine_accuracy
            value: 0.7333333492279053
            name: Cosine Accuracy

SentenceTransformer based on sentence-transformers/all-mpnet-base-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-mpnet-base-v2 on the job-resume-embedding-finetuning dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 384, 'do_lower_case': False, 'architecture': 'MPNetModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("hetbhagatji09/cs-job-resume-model")
# Run inference
queries = [
    "Full Stack Engineer required. We are looking for an engineer with 1+ years of experience. Responsibilities include building and maintaining systems using Python, Express, React, JavaScript. Familiarity with Unit Testing, Agile is a plus. Experience with scalable systems and good engineering practices required.Python;Express;React;JavaScript",
]
documents = [
    'Summary: 5+ years experience. Skills: Express, CI/CD, React, JavaScript, Git, APIs. Projects: Worked on a project that implemented Express and CI/CD to deliver production-ready features, collaborated in Agile teams. Experience: 5 years developing systems using Express, CI/CD, React, JavaScript.Express;CI/CD;React;JavaScript',
    'Summary: Experience in Android Studio, Kotlin, Java and related tools.Android Studio;Kotlin;Java',
    'Summary: 6+ years experience. Skills: SASS, TypeScript, Tailwind, JavaScript, REST, APIs. Projects: Worked on a project that implemented SASS and TypeScript to deliver production-ready features, collaborated in Agile teams. Experience: 6 years developing systems using SASS, TypeScript, Tailwind, JavaScript.SASS;TypeScript;Tailwind;JavaScript',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 768] [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[0.7931, 0.3914, 0.7911]])

Evaluation

Metrics

Triplet

Metric ai-job-validation ai-job-test
cosine_accuracy 0.75 0.7333

Training Details

Training Dataset

job-resume-embedding-finetuning

  • Dataset: job-resume-embedding-finetuning at d15c797
  • Size: 480 training samples
  • Columns: query, job_description_pos, and job_description_neg
  • Approximate statistics based on the first 480 samples:
    query job_description_pos job_description_neg
    type string string string
    details
    • min: 33 tokens
    • mean: 67.54 tokens
    • max: 85 tokens
    • min: 22 tokens
    • mean: 76.39 tokens
    • max: 113 tokens
    • min: 22 tokens
    • mean: 76.92 tokens
    • max: 113 tokens
  • Samples:
    query job_description_pos job_description_neg
    Frontend Developer required. We are looking for an engineer with 5+ years of experience. Responsibilities include building and maintaining systems using CSS, SASS, Tailwind, React. Familiarity with APIs, Unit Testing is a plus. Experience with scalable systems and good engineering practices required.CSS;SASS;Tailwind;React Summary: 2+ years experience. Skills: Flutter, Kotlin, REST APIs, iOS SDK, TDD, APIs. Projects: Worked on a project that implemented Flutter and Kotlin to deliver production-ready features, collaborated in Agile teams. Experience: 2 years developing systems using Flutter, Kotlin, REST APIs, iOS SDK.Flutter;Kotlin;REST APIs;iOS SDK Summary: 2+ years experience. Skills: Spark, NumPy, ETL, PyTorch, Agile, Linux. Projects: Worked on a project that implemented Spark and NumPy to deliver production-ready features, collaborated in Agile teams. Experience: 2 years developing systems using Spark, NumPy, ETL, PyTorch.Spark;NumPy;ETL;PyTorch
    React Native Developer required. We are looking for an engineer with 4+ years of experience. Responsibilities include building and maintaining systems using Flutter, Android SDK, Objective-C, Kotlin. Familiarity with Unit Testing, REST is a plus. Experience with scalable systems and good engineering practices required.Flutter;Android SDK;Objective-C;Kotlin Summary: 5+ years experience. Skills: Prometheus, Jenkins, CI/CD, Terraform, Git, CI/CD. Projects: Worked on a project that implemented Prometheus and Jenkins to deliver production-ready features, collaborated in Agile teams. Experience: 5 years developing systems using Prometheus, Jenkins, CI/CD, Terraform.Prometheus;Jenkins;CI/CD;Terraform Summary: 5+ years experience. Skills: Flask, REST APIs, Python, SQL, Unit Testing, TDD. Projects: Worked on a project that implemented Flask and REST APIs to deliver production-ready features, collaborated in Agile teams. Experience: 5 years developing systems using Flask, REST APIs, Python, SQL.Flask;REST APIs;Python;SQL
    Data Analyst required. Looking for expertise in SQL, PowerBI, Excel, Visualization, ETL.SQL;PowerBI;Excel;Visualization;ETL Summary: Experience in PowerBI, Excel, Visualization and related tools.PowerBI;Excel;Visualization Summary: 1+ years experience. Skills: Docker, MySQL, Django, Kubernetes, TDD, Agile. Projects: Worked on a project that implemented Docker and MySQL to deliver production-ready features, collaborated in Agile teams. Experience: 1 years developing systems using Docker, MySQL, Django, Kubernetes.Docker;MySQL;Django;Kubernetes
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Evaluation Dataset

job-resume-embedding-finetuning

  • Dataset: job-resume-embedding-finetuning at d15c797
  • Size: 60 evaluation samples
  • Columns: query, job_description_pos, and job_description_neg
  • Approximate statistics based on the first 60 samples:
    query job_description_pos job_description_neg
    type string string string
    details
    • min: 33 tokens
    • mean: 68.63 tokens
    • max: 83 tokens
    • min: 22 tokens
    • mean: 77.83 tokens
    • max: 105 tokens
    • min: 22 tokens
    • mean: 76.6 tokens
    • max: 100 tokens
  • Samples:
    query job_description_pos job_description_neg
    JavaScript Engineer required. We are looking for an engineer with 3+ years of experience. Responsibilities include building and maintaining systems using HTML, React, CSS, JavaScript. Familiarity with APIs, REST is a plus. Experience with scalable systems and good engineering practices required.HTML;React;CSS;JavaScript Summary: 7+ years experience. Skills: React, Babel, HTML, Tailwind, Git, TDD. Projects: Worked on a project that implemented React and Babel to deliver production-ready features, collaborated in Agile teams. Experience: 7 years developing systems using React, Babel, HTML, Tailwind.React;Babel;HTML;Tailwind Summary: 7+ years experience. Skills: Flask, Python, Django, PostgreSQL, APIs, Linux. Projects: Worked on a project that implemented Flask and Python to deliver production-ready features, collaborated in Agile teams. Experience: 7 years developing systems using Flask, Python, Django, PostgreSQL.Flask;Python;Django;PostgreSQL
    Android Developer required. Looking for expertise in Kotlin, Java, Android Studio, XML, Jetpack.Kotlin;Java;Android Studio;XML;Jetpack Summary: Experience in Jetpack, XML, Android Studio and related tools.Jetpack;XML;Android Studio Summary: 3+ years experience. Skills: Node.js, Python, PostgreSQL, Docker, Git, REST. Projects: Worked on a project that implemented Node.js and Python to deliver production-ready features, collaborated in Agile teams. Experience: 3 years developing systems using Node.js, Python, PostgreSQL, Docker.Node.js;Python;PostgreSQL;Docker
    Backend Developer required. Looking for expertise in Python, Django, REST APIs, Databases, Caching.Python;Django;REST APIs;Databases;Caching Summary: Experience in Django, Caching, Databases and related tools.Django;Caching;Databases Summary: 2+ years experience. Skills: Grafana, AWS, Docker, CI/CD, CI/CD, Linux. Projects: Worked on a project that implemented Grafana and AWS to deliver production-ready features, collaborated in Agile teams. Experience: 2 years developing systems using Grafana, AWS, Docker, CI/CD.Grafana;AWS;Docker;CI/CD
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • learning_rate: 2e-05
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step ai-job-validation_cosine_accuracy ai-job-test_cosine_accuracy
-1 -1 0.75 0.7333

Framework Versions

  • Python: 3.12.12
  • Sentence Transformers: 5.1.2
  • Transformers: 4.57.2
  • PyTorch: 2.9.0+cu126
  • Accelerate: 1.12.0
  • Datasets: 4.0.0
  • Tokenizers: 0.22.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}