CrossEncoder based on cross-encoder/nli-deberta-v3-base

This is a Cross Encoder model finetuned from cross-encoder/nli-deberta-v3-base on the json dataset using the sentence-transformers library. It computes scores for pairs of texts, which can be used for text pair classification.

Model Details

Model Description

  • Model Type: Cross Encoder
  • Base model: cross-encoder/nli-deberta-v3-base
  • Maximum Sequence Length: 512 tokens
  • Number of Output Labels: 3 labels
  • Training Dataset:
    • json

Model Sources

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import CrossEncoder

# Download from the 🤗 Hub
model = CrossEncoder("software-si/kitchen-ita-nli-deberta")
# Get scores for pairs of texts
pairs = [
    ['piano cottura due zone operative, con forno a gas,', 'la cucina ha un forno integrato'],
    ['unità di cottura a induzione, sistema con forno a gas, ampiezza di novanta centimetri, con quattro piastre cottura,', 'la cucina è profonda 90 cm'],
    ['modulo cucina funzionamento a induzione, dimensione teglie di gn1/1 due moduli di cottura, dotata di forno a gas,', 'la teglie del forno hanno dimensione gn1/1'],
    ['unità di cottura modulo con forno a gas, con teglie di gn1/1 piano a induzione,', 'la cucina ha un vano aperto'],
    ['unità di cottura misura 90 centimetri di profondità, modulo disposto su vano chiuso, operativa a gas, sei bruciatori separati,', 'la cucina ha un forno integrato'],
]
scores = model.predict(pairs)
print(scores.shape)
# (5, 3)

Training Details

Training Dataset

json

  • Dataset: json
  • Size: 463,254 training samples
  • Columns: premises, hypothesis, and labels
  • Approximate statistics based on the first 1000 samples:
    premises hypothesis labels
    type string string int
    details
    • min: 37 characters
    • mean: 103.4 characters
    • max: 155 characters
    • min: 12 characters
    • mean: 33.04 characters
    • max: 50 characters
    • 0: ~48.70%
    • 1: ~47.10%
    • 2: ~4.20%
  • Samples:
    premises hypothesis labels
    modulo cucina vano sottostante con forno elettrico, due zone operative, con piastre tonde, le zone cottura disponibili sono due 1
    modulo cucina profondità utile 70 cm, vano sottostante con forno elettrico, dispositivo a induzione, due moduli di cottura, la cottura della cucina è a gas 0
    unità di cottura disposta su vano con ante, funziona a induzione, profondità utile 90 cm, la cucina misura novante centimetri di profondità 1
  • Loss: CrossEntropyLoss

Evaluation Dataset

json

  • Dataset: json
  • Size: 138,976 evaluation samples
  • Columns: premises, hypothesis, and labels
  • Approximate statistics based on the first 1000 samples:
    premises hypothesis labels
    type string string int
    details
    • min: 46 characters
    • mean: 101.55 characters
    • max: 154 characters
    • min: 12 characters
    • mean: 33.34 characters
    • max: 50 characters
    • 0: ~45.90%
    • 1: ~47.90%
    • 2: ~6.20%
  • Samples:
    premises hypothesis labels
    piano cottura due zone operative, con forno a gas, la cucina ha un forno integrato 0
    unità di cottura a induzione, sistema con forno a gas, ampiezza di novanta centimetri, con quattro piastre cottura, la cucina è profonda 90 cm 1
    modulo cucina funzionamento a induzione, dimensione teglie di gn1/1 due moduli di cottura, dotata di forno a gas, la teglie del forno hanno dimensione gn1/1 1
  • Loss: CrossEntropyLoss

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • learning_rate: 1e-05
  • num_train_epochs: 1
  • warmup_steps: 46325
  • bf16: True
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 1e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 46325
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss Validation Loss
0.0345 500 1.9008 1.5107
0.0691 1000 0.8958 0.6929
0.1036 1500 0.5841 0.4844
0.1382 2000 0.4403 0.3719
0.1727 2500 0.3578 0.2772
0.2072 3000 0.2732 0.2048
0.2418 3500 0.2117 0.1658
0.2763 4000 0.1717 0.1290
0.3108 4500 0.1444 0.1118
0.3454 5000 0.1283 0.1053
0.3799 5500 0.1136 0.1067
0.4145 6000 0.1066 0.0932
0.4490 6500 0.0987 0.0774
0.4835 7000 0.0864 0.0848
0.5181 7500 0.0849 0.0744
0.5526 8000 0.0796 0.0578
0.5871 8500 0.0671 0.0604
0.6217 9000 0.0656 0.0514
0.6562 9500 0.0609 0.0473

Framework Versions

  • Python: 3.12.3
  • Sentence Transformers: 5.1.1
  • Transformers: 4.56.2
  • PyTorch: 2.8.0+cu128
  • Accelerate: 1.10.1
  • Datasets: 4.1.1
  • Tokenizers: 0.22.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
-
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for software-si/kitchen-ita-nli-deberta

Finetuned
(8)
this model

Paper for software-si/kitchen-ita-nli-deberta