Matryoshka Representation Learning
Paper
• 2205.13147 • Published
• 25
This is a sentence-transformers model finetuned from Snowflake/snowflake-arctic-embed-m-v2.0. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: GteModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'How does the text suggest addressing the social aspects related to low- and middle-income transport users in the context of zero-emission vehicle initiatives?',
'(b)\n\nmeasures intended to accelerate the uptake of zero-emission vehicles or to provide financial support for the deployment of fully interoperable refuelling and recharging infrastructure for zero-emission vehicles, or measures to encourage a shift to public transport and improve multimodality, or to provide financial support in order to address social aspects concerning low- and middle-income transport users;\n\n(c)\n\nto finance their Social Climate Plan in accordance with Article 15 of Regulation (EU) 2023/955;\n\n(d)',
'If the planned change is implemented notwithstanding the first and second subparagraphs, or if an unplanned change has taken place pursuant to which the AIFM’s management of the AIF no longer complies with this Directive or the AIFM otherwise no longer complies with this Directive, the competent authorities of the Member State of reference of the AIFM shall take all due measures in accordance with Article 46, including, if necessary, the express prohibition of marketing of the AIF.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
InformationRetrievalEvaluator| Metric | Value |
|---|---|
| cosine_accuracy@1 | 0.7059 |
| cosine_accuracy@3 | 0.9068 |
| cosine_accuracy@5 | 0.9448 |
| cosine_accuracy@10 | 0.9731 |
| cosine_precision@1 | 0.7059 |
| cosine_precision@3 | 0.3023 |
| cosine_precision@5 | 0.189 |
| cosine_precision@10 | 0.0973 |
| cosine_recall@1 | 0.7059 |
| cosine_recall@3 | 0.9068 |
| cosine_recall@5 | 0.9448 |
| cosine_recall@10 | 0.9731 |
| cosine_ndcg@10 | 0.8513 |
| cosine_mrr@10 | 0.8109 |
| cosine_map@100 | 0.8123 |
sentence_0 and sentence_1| sentence_0 | sentence_1 | |
|---|---|---|
| type | string | string |
| details |
|
|
| sentence_0 | sentence_1 |
|---|---|
What is the maximum allowable reduction in excise duty for mixtures used as motor fuels containing biodiesel in Italy until 30 June 2004? |
for waste oils which are reused as fuel, either directly after recovery or following a recycling process for waste oils, and where the reuse is subject to duty. |
What are the minimum indicative share percentages for the years 2023 to 2030, and how do these percentages relate to the interconnectivity levels of the Member States? |
Such indicative shares may, in each year, amount to at least 5 % from 2023 to 2026 and at least 10 % from 2027 to 2030, or, where lower, to the level of interconnectivity of the Member State concerned in any given year. |
What is the significance of the one-month period mentioned in the context? |
one month after its notification, in accordance with the arrangements provided for in Article 23. |
MatryoshkaLoss with these parameters:{
"loss": "MultipleNegativesRankingLoss",
"matryoshka_dims": [
768,
512,
256,
128,
64
],
"matryoshka_weights": [
1,
1,
1,
1,
1
],
"n_dims_per_step": -1
}
eval_strategy: stepsnum_train_epochs: 4fp16: Truemulti_dataset_batch_sampler: round_robinoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 8per_device_eval_batch_size: 8per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 4max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robin| Epoch | Step | Training Loss | cosine_ndcg@10 |
|---|---|---|---|
| 0.0863 | 500 | 0.225 | - |
| 0.1726 | 1000 | 0.1337 | - |
| 0.2589 | 1500 | 0.1195 | - |
| 0.3452 | 2000 | 0.0803 | - |
| 0.4316 | 2500 | 0.0775 | - |
| 0.5179 | 3000 | 0.0714 | - |
| 0.6042 | 3500 | 0.0852 | - |
| 0.6905 | 4000 | 0.0718 | - |
| 0.7768 | 4500 | 0.0499 | - |
| 0.8631 | 5000 | 0.0665 | 0.8371 |
| 0.9494 | 5500 | 0.0674 | - |
| 1.0 | 5793 | - | 0.8416 |
| 1.0357 | 6000 | 0.0538 | - |
| 1.1220 | 6500 | 0.0606 | - |
| 1.2084 | 7000 | 0.0294 | - |
| 1.2947 | 7500 | 0.0129 | - |
| 1.3810 | 8000 | 0.0101 | - |
| 1.4673 | 8500 | 0.0072 | - |
| 1.5536 | 9000 | 0.0211 | - |
| 1.6399 | 9500 | 0.0133 | - |
| 1.7262 | 10000 | 0.0063 | 0.8513 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
Snowflake/snowflake-arctic-embed-m-v2.0