nvidia
/

llama-nemotron-embed-1b-v2

@@ -17,7 +17,7 @@ library_name: transformers
 ### **Description**
-The Llama Nemotron Retriever Embedding 1B model is optimized for **multilingual and cross-lingual** text question-answering retrieval with **support for long documents (up to 8192 tokens) and dynamic embedding size (Matryoshka Embeddings)**. This model was evaluated on 26 languages: English, Arabic, Bengali, Chinese, Czech, Danish, Dutch, Finnish, French, German, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Norwegian, Persian, Polish, Portuguese, Russian, Spanish, Swedish, Thai, and Turkish.
 In addition to enabling multilingual and cross-lingual question-answering retrieval, this model reduces the data storage footprint by 35x through dynamic embedding sizing and support for longer token length, making it feasible to handle large-scale datasets efficiently.
@@ -25,14 +25,14 @@ An embedding model is a crucial component of a text retrieval system, as it tran
 This model is ready for commercial use.
-The Llama Nemotron Retriever Embedding 1B model is a part of the NVIDIA NeMo Retriever collection of NIM, which provide state-of-the-art, commercially-ready models and microservices, optimized for the lowest latency and highest throughput. It features a production-ready information retrieval pipeline with enterprise support. The models that form the core of this solution have been trained using responsibly selected, auditable data sources. With multiple pre-trained models available as starting points, developers can also readily customize them for domain-specific use cases, such as information technology, human resource help assistants, and research & development research assistants.
 We are excited to announce the open sourcing of this commercial embedding model. For users interested in deploying this model in production environments, it is also available via the model API in NVIDIA Inference Microservices (NIM) at [llama-nemotron-embed-1b-v2](https://build.nvidia.com/nvidia/llama-3_2-nv-embedqa-1b-v2).
 ### **Intended use**
-The Llama Nemotron Retriever Embedding 1B model is most suitable for users who want to build a multilingual question-and-answer application over a large text corpus, leveraging the latest dense retrieval technologies.
 ### **License/Terms of use**
@@ -43,7 +43,7 @@ Use of this model is governed by the [NVIDIA Open Model License Agreement](https
 **Architecture Type:** Transformer
 **Network Architecture:** Fine-tuned Llama3.2 1B Retriever
-This NeMo Retriever embedding model is a transformer encoder - a fine-tuned version of Llama3.2 1b, with 16 layers and an embedding size of 2048, which is trained on public datasets. The AdamW optimizer is employed incorporating 100 warm up steps and 5e-6 learning rate with WarmupDecayLR scheduler. Embedding models for text retrieval are typically trained using a bi-encoder architecture. This involves encoding a pair of sentences (for example, query and chunked passages) independently using the embedding model. Contrastive learning is used to maximize the similarity between the query and the passage that contains the answer, while minimizing the similarity between the query and sampled negative passages not useful to answer the question.
 ### **Input**
@@ -139,13 +139,13 @@ print(scores.tolist())
 ### **Software Integration**
-**Runtime Engine:** NeMo Retriever embedding NIM
 **Supported Hardware Microarchitecture Compatibility**: NVIDIA Ampere, NVIDIA Hopper, NVIDIA Lovelace
 **Supported Operating System(s):** Linux
 ### **Model Version(s)**
-Llama Nemotron Retriever Embedding 1B v2
 Short Name: llama-nemotron-embed-1b-v2
 ## **Training Dataset & Evaluation**

 ### **Description**
+The Llama Nemotron Embedding 1B model is optimized for **multilingual and cross-lingual** text question-answering retrieval with **support for long documents (up to 8192 tokens) and dynamic embedding size (Matryoshka Embeddings)**. This model was evaluated on 26 languages: English, Arabic, Bengali, Chinese, Czech, Danish, Dutch, Finnish, French, German, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Norwegian, Persian, Polish, Portuguese, Russian, Spanish, Swedish, Thai, and Turkish.
 In addition to enabling multilingual and cross-lingual question-answering retrieval, this model reduces the data storage footprint by 35x through dynamic embedding sizing and support for longer token length, making it feasible to handle large-scale datasets efficiently.
 This model is ready for commercial use.
+The Llama Nemotron Embedding 1B model is a part of the NVIDIA NeMo Retriever collection of NIM, which provide state-of-the-art, commercially-ready models and microservices, optimized for the lowest latency and highest throughput. It features a production-ready information retrieval pipeline with enterprise support. The models that form the core of this solution have been trained using responsibly selected, auditable data sources. With multiple pre-trained models available as starting points, developers can also readily customize them for domain-specific use cases, such as information technology, human resource help assistants, and research & development research assistants.
 We are excited to announce the open sourcing of this commercial embedding model. For users interested in deploying this model in production environments, it is also available via the model API in NVIDIA Inference Microservices (NIM) at [llama-nemotron-embed-1b-v2](https://build.nvidia.com/nvidia/llama-3_2-nv-embedqa-1b-v2).
 ### **Intended use**
+The Llama Nemotron Embedding 1B model is most suitable for users who want to build a multilingual question-and-answer application over a large text corpus, leveraging the latest dense retrieval technologies.
 ### **License/Terms of use**
 **Architecture Type:** Transformer
 **Network Architecture:** Fine-tuned Llama3.2 1B Retriever
+This NeMo embedding model is a transformer encoder - a fine-tuned version of Llama3.2 1b, with 16 layers and an embedding size of 2048, which is trained on public datasets. The AdamW optimizer is employed incorporating 100 warm up steps and 5e-6 learning rate with WarmupDecayLR scheduler. Embedding models for text retrieval are typically trained using a bi-encoder architecture. This involves encoding a pair of sentences (for example, query and chunked passages) independently using the embedding model. Contrastive learning is used to maximize the similarity between the query and the passage that contains the answer, while minimizing the similarity between the query and sampled negative passages not useful to answer the question.
 ### **Input**
 ### **Software Integration**
+**Runtime Engine:** Llama Nemotron embedding NIM
 **Supported Hardware Microarchitecture Compatibility**: NVIDIA Ampere, NVIDIA Hopper, NVIDIA Lovelace
 **Supported Operating System(s):** Linux
 ### **Model Version(s)**
+Llama Nemotron Embedding 1B v2
 Short Name: llama-nemotron-embed-1b-v2
 ## **Training Dataset & Evaluation**