--- pipeline_tag: text-generation inference: false license: apache-2.0 datasets: - public-qiskit - synthetic-qiskit metrics: - code_eval library_name: transformers tags: - code - granite - qiskit base_model: - ibm-granite/granite-3.2-8b-instruct ---  # granite-3.2-8b-qiskit ## Model Summary **granite-3.2-8b-qiskit** is a 8B parameter model extend pretrained and fine tuned on top of granite3.1-8b-base using Qiskit code and instruction data to improve capabilities at writing high-quality and non-deprecated Qiskit code. We used only data with the following licenses: Apache 2.0, MIT, the Unlicense, Mulan PSL Version 2, BSD-2, BSD-3, and Creative Commons Attribution 4.0. The model has been trained with **Qiskit version 2.0**, ensuring compatibility with its APIs and syntax. - **Developers:** IBM Quantum & IBM Research - **GitHub Repository:** Pending - **Related Papers:** [Qiskit Code Assistant: Training LLMs for generating Quantum Computing Code](https://arxiv.org/abs/2405.19495) and [Qiskit HumanEval: An Evaluation Benchmark For Quantum Code Generative Models](https://arxiv.org/abs/2406.14712) - **Release Date**: 06-03-2025 - **License:** apache-2.0 ## Usage ### Intended use This model is designed for generating quantum computing code using Qiskit. Both quantum computing practitionners and new Qiskit users can use this model as an assistant for building Qiskit code or responding to Qiskit coding related instructions and questions. ### Generation This is a simple example of how to use **granite-3.2-8b-qiskit** model. ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer device = "cuda" # or "cpu" model_path = "qiskit/granite-3.2-8b-qiskit" tokenizer = AutoTokenizer.from_pretrained(model_path) # drop device_map if running on CPU model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device) model.eval() # change input text as desired chat = [ { "role": "user", "content": "Build a random circuit with 5 qubits" }, ] chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True) # tokenize the text input_tokens = tokenizer(input_text, return_tensors="pt") # move tokenized inputs to device for i in input_tokens: input_tokens[i] = input_tokens[i].to(device) # generate output tokens output = model.generate(**input_tokens, max_new_tokens=128) # decode output tokens into text output = tokenizer.batch_decode(output) # loop over the batch to print, in this example the batch size is 1 for i in output: print(i) ``` ### Comparison of Qiskit models across benchmarks
## Training Data - **Data Collection and Filtering:** Our code data is sourced from a combination of publicly available datasets (e.g., Code available on