You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Quantum Assistant: Specialization of Multimodal Models for Quantum Computing

License Dataset Models Demo Qiskit GitHub W&B TensorBoard

The first multimodal Vision-Language Model specialized for quantum computing with Qiskit

Model Description

This model is a fine-tuned version of Qwen3-VL-8B-Instruct specialized for quantum computing tasks using Qiskit 2.0. This model can interpret visual representations of quantum computing: circuit diagrams, Bloch spheres, and measurement histograms.

The model was trained using Rank-Stabilized Low-Rank Adaptation (rsLoRA) with rank 64 for 1 epoch on the Quantum Assistant Dataset, achieving significant improvements on multimodal quantum code generation tasks.

Key Capabilities

  • Code Generation: Generate complete Qiskit code from natural language descriptions
  • Function Completion: Complete function bodies from signatures and docstrings
  • Visual Understanding: Interpret quantum circuit diagrams, Bloch spheres, and histograms
  • Conceptual Explanations: Answer questions about quantum computing theory
  • Qiskit 2.0 Compliant: Uses modern APIs (SamplerV2, EstimatorV2, generate_preset_pass_manager)

Evaluation Results

Evaluation was conducted on three complementary benchmarks: Qiskit HumanEval (151 function completion problems), Qiskit HumanEval Hard (151 code generation problems), and the synthetic test set (1,290 samples). Models were served via vLLM on A100 80GB PCIe with greedy decoding (temperature 0).

Consolidated Results

Model Qiskit HumanEval Synthetic Dataset
QHE QHE Hard Func. Compl. Code Gen. QA Text Multimodal
Fine-tuned
  Qwen3-VL-FT (r32, 2ep) 43.71% 28.48% 56.96% 44.36% 38.02% 45.45% 63.39%
  Qwen3-VL-FT (r32, 1ep) 40.40% 29.14% 51.55% 41.91% 37.31% 42.49% 57.14%
  Qwen3-VL-FT (r64, 1ep) 38.41% 22.52% 52.84% 42.89% 38.24% 42.66% 60.71%
Specialized (IBM)
  Qwen2.5-Coder-14B-Qiskit 49.01% 25.17% 47.48% 25.51% 19.46% 36.19%
Baseline
  Qwen3-VL-8B-Instruct 32.45% 11.92% 38.92% 25.98% 20.66% 30.24% 37.50%
  InternVL3.5-8B-MPO 20.53% 9.27% 32.47% 19.61% 25.81% 21.85% 36.16%
  Ministral-3-8B-Instruct-2512 17.88% 11.26% 29.12% 21.81% 20.50% 20.98% 36.61%

QHE: Qiskit HumanEval (function completion) · QHE Hard: code generation · †Qwen2.5-Coder-14B-Qiskit evaluated only on text samples (55% of synthetic dataset)

Key Improvements

Metric Improvement vs Baseline
Qiskit HumanEval Pass@1 +11.26 pp (32.45% → 43.71%)
Qiskit HumanEval Hard Pass@1 +16.56 pp (11.92% → 28.48%)
Multimodal Code Pass@1 +25.89 pp (37.50% → 63.39%)
Text-only Code Pass@1 +15.21 pp (30.24% → 45.45%)

Multimodal Advantage

The most significant differential is in multimodal samples: the fine-tuned model achieves 63.39% Pass@1 on image-based code generation vs 45.45% on text-only (+17.94 pp), validating that training on visual-textual samples develops domain-specific visual understanding capabilities.

Combined Results
Evaluation results: (a) Qiskit HumanEval benchmarks, (b) visual content impact, (c) synthetic dataset, (d) fine-tuning gains

Performance by Category

Category Heatmap
Performance heatmap by thematic category (Pass@1 %). Red line separates fine-tuned models (left) from baselines (right)

Category Pass@1 vs Baseline
quantum_info_and_operators 65.76% +37.74 pp
circuits_and_gates 61.47% +25.46 pp
hardware_and_providers 57.94% +50.00 pp
transpilation_and_compilation 57.43% +38.62 pp
algorithms_and_applications 52.27% +28.18 pp
noise_and_error_mitigation 42.86% +38.10 pp
primitives_and_execution 32.18% +21.84 pp

Training Strategy

The experimental strategy was organized in two phases: PEFT technique selection and hyperparameter optimization.

Phase 1: PEFT Variant Comparison

Five LoRA variants were compared with controlled configuration (r=16, α=32, 1 epoch):

Variant Eval Loss ↓ Eval Accuracy ↑ Runtime (s)
rsLoRA 0.622 0.818 1,060
DoRA 0.622 0.818 2,307
rsLoRA (frozen aligner) 0.623 0.817 1,057
LoRA (vanilla) 0.646 0.812 1,056
PiSSA 0.657 0.812 1,172
OLoRA 0.742 0.794 1,067

PEFT Comparison
Comparison of PEFT variants: (a) validation loss, (b) token accuracy, (c) training time

Key findings:

  • rsLoRA and DoRA achieved equivalent performance (Eval Loss 0.622)
  • DoRA has 2.18× computational overhead (2,307s vs 1,060s) due to magnitude-direction decomposition
  • rsLoRA selected for optimal performance-efficiency trade-off

PEFT Training Curves
Convergence curves of validation loss for PEFT variants

Phase 2: Rank and Epoch Optimization

With rsLoRA selected, the impact of adapter rank and training duration was investigated:

Configuration Eval Loss ↓ Eval Accuracy ↑ Notes
r=32, 1 epoch 0.607 0.821 Optimal trade-off
r=64, 1 epoch 0.609 0.822 Marginal improvement
r=16, 1 epoch 0.622 0.818 Baseline rsLoRA
r=32, 2 epochs 0.638 0.825 Slight overfitting
r=128, 3 epochs 0.789 0.822 Severe overfitting

Rank Comparison
Impact of adapter rank on validation loss

Overfitting Analysis
Overfitting analysis: (a) r32-2ep configuration, (b) r128-3ep configuration

Conclusions: rsLoRA with r=32 and 1-2 epochs maximizes generalization while avoiding memorization of the synthetic dataset.

Model Collection

This model is part of the Quantum Assistant collection. All models are merged versions ready for inference:

Model Configuration Description
Qwen3-VL-8B-rslora-r32-2 rsLoRA r=32, 2 epochs Best overall performance
Qwen3-VL-8B-rslora-r32 rsLoRA r=32, 1 epoch Best generalization
Qwen3-VL-8B-rslora-r64 rsLoRA r=64, 1 epoch Higher capacity
Qwen3-VL-8B-rslora-r128 rsLoRA r=128, 1 epoch Maximum capacity
Qwen3-VL-8B-lora LoRA r=16, 1 epoch Vanilla LoRA
Qwen3-VL-8B-dora DoRA r=16, 1 epoch Magnitude-direction decomposition
Qwen3-VL-8B-pissa PiSSA r=16, 1 epoch SVD initialization
Qwen3-VL-8B-olora OLoRA r=16, 1 epoch QR orthonormal initialization
Qwen3-VL-8B-rslora-frozen rsLoRA r=16, frozen aligner Ablation study
Qwen3-VL-8B-rslora rsLoRA r=16, 1 epoch Baseline rsLoRA

Usage

With vLLM

python -m vllm.entrypoints.openai.api_server \
    --host 0.0.0.0 \
    --port 8000 \
    --model samuellimabraz/Qwen3-VL-8B-rslora-r64 \
    --gpu-memory-utilization 0.92 \
    --max-model-len 12288 \
    --max-num-seqs 16 \
    --max-num-batched-tokens 49152 \
    --enable-chunked-prefill \
    --enable-prefix-caching

With Transformers

from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
from qwen_vl_utils import process_vision_info

model = Qwen3VLForConditionalGeneration.from_pretrained(
    "samuellimabraz/Qwen3-VL-8B-rslora-r64",
    torch_dtype="auto",
    device_map="auto"
)
processor = AutoProcessor.from_pretrained("samuellimabraz/Qwen3-VL-8B-rslora-r64")

messages = [
    {"role": "system", "content": "You are a quantum computing expert assistant specializing in Qiskit."},
    {"role": "user", "content": "Create a function that builds a 3-qubit GHZ state and returns the circuit."}
]

messages_with_image = [
    {"role": "system", "content": "You are a quantum computing expert assistant specializing in Qiskit."},
    {"role": "user", "content": [
        {"type": "image", "image": "path/to/circuit.png"},
        {"type": "text", "text": "Implement the quantum circuit shown in the image."}
    ]}
]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt"
).to(model.device)

generated_ids = model.generate(**inputs, max_new_tokens=1024)
output = processor.batch_decode(
    generated_ids[:, inputs.input_ids.shape[1]:], 
    skip_special_tokens=True
)[0]
print(output)

Training Details

Dataset

  • Training Data: Quantum Assistant Dataset
  • Train Samples: 5,837 (45.1% multimodal)
  • Validation Samples: 1,239 (45.2% multimodal)
  • Task Distribution: 30% function completion, 32% code generation, 38% QA
  • Categories: 7 quantum computing domains

Training Configuration

Parameter Value
Base Model Qwen/Qwen3-VL-8B-Instruct
PEFT Method rsLoRA (Rank-Stabilized LoRA)
Rank (r) 64
Alpha (α) 128
Dropout 0.10
Target Modules all-linear
Learning Rate 2e-4
LR Scheduler Cosine
Weight Decay 0.05
Warmup Steps 10
Epochs 1
Batch Size 32
Precision bfloat16
Framework ms-swift

Freezing Strategy

Component Status
Vision Encoder (ViT) ❄️ Frozen
Vision-Language Aligner 🔥 Trainable
Language Model (LLM) 🔥 Trainable

Training Infrastructure

  • GPU: NVIDIA RTX PRO 6000 Blackwell Server Edition (96GB VRAM)
  • Training Time: ~18.0 minutes (1 epoch)
  • Tracking: Weights & Biases | TensorBoard

System Prompt

You are a quantum computing expert assistant specializing in Qiskit.
Provide accurate, clear, and well-structured responses about quantum computing concepts,
algorithms, and code implementation. Use Qiskit 2.0 best practices.

Intended Uses & Limitations

Intended Uses

  • Educational assistance: Learning quantum computing concepts with Qiskit
  • Code generation: Creating Qiskit circuits from descriptions or diagrams
  • Documentation: Understanding quantum circuit visualizations
  • Research prototyping: Rapid development of quantum algorithms

Limitations

  1. Domain specificity: Optimized for Qiskit 2.0; may generate deprecated APIs for older versions
  2. Dataset size: Trained on 5,837 samples; may underperform on rare edge cases
  3. Category imbalance: Better performance on circuits_and_gates than primitives_and_execution
  4. Hardware specifics: Limited coverage of IBM Quantum hardware-specific optimizations
  5. Execution: Generated code requires verification before running on real quantum hardware

Bias and Risks

  • Model may perpetuate patterns from training data
  • Visual understanding limited to common diagram styles in Qiskit documentation
  • May generate syntactically correct but logically incorrect quantum algorithms
  • Should not be used for production quantum computing without human review

Citation

If you use this model in your research, please cite:

@misc{braz2025quantumassistant,
  title={Quantum Assistant: Especializa{\c{c}}{\~a}o de Modelos Multimodais para Computa{\c{c}}{\~a}o Qu{\^a}ntica},
  author={Braz, Samuel Lima and Leite, Jo{\~a}o Paulo Reus Rodrigues},
  year={2025},
  institution={Universidade Federal de Itajub{\'a} (UNIFEI)},
  url={https://github.com/samuellimabraz/quantum-assistant}
}

Related Resources

Acknowledgments

  • IBM Quantum and Qiskit team for open-source documentation
  • Qwen Team for the base model
  • UNIFEI (Universidade Federal de Itajubá) for academic support
  • Advisor: Prof. João Paulo Reus Rodrigues Leite

License

This model is released under the Apache 2.0 License.

Downloads last month
36
Safetensors
Model size
9B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for samuellimabraz/Qwen3-VL-8B-rslora-r64

Finetuned
(112)
this model

Dataset used to train samuellimabraz/Qwen3-VL-8B-rslora-r64

Collection including samuellimabraz/Qwen3-VL-8B-rslora-r64

Evaluation results

  • Pass@1 (Function Completion) on Qiskit HumanEval
    self-reported
    38.410
  • Pass@1 (Code Generation) on Qiskit HumanEval Hard
    self-reported
    22.520
  • Pass@1 (Function Completion) on Quantum Assistant Synthetic
    self-reported
    52.840
  • Pass@1 (Code Generation) on Quantum Assistant Synthetic
    self-reported
    42.890
  • Pass@1 (Multimodal) on Quantum Assistant Synthetic
    self-reported
    60.710
  • ROUGE-L (QA) on Quantum Assistant Synthetic
    self-reported
    38.240