--- library_name: transformers tags: - e-commerce - query-generation license: mit datasets: - smartcat/Amazon-2023-GenQ language: - en metrics: - rouge base_model: - BeIR/query-gen-msmarco-t5-base-v1 pipeline_tag: text2text-generation --- # Model Card for T5-GenQ-T-v1 🤖 ✨ 🔍 Generate precise, realistic user-focused search queries from product text 🛒 🚀 📊 ### Model Description - **Model Name:** Fine-Tuned Query-Generation Model - **Model type:** Text-to-Text Transformer - **Finetuned from model:** [BeIR/query-gen-msmarco-t5-base-v1](https://huggingface.co/BeIR/query-gen-msmarco-t5-base-v1) - **Dataset**: [smartcat/Amazon-2023-GenQ](https://huggingface.co/datasets/smartcat/Amazon-2023-GenQ) - **Primary Use Case**: Generating accurate and relevant search queries from item descriptions - **Repository:** [smartcat-labs/product2query](https://github.com/smartcat-labs/product2query) ### Model variations
| Model | ROUGE-1 | ROUGE-2 | ROUGE-L | ROUGE-Lsum |
|---|---|---|---|---|
| T5-GenQ-T-v1 | 75.2151 | 54.8735 | 74.5142 | 74.5262 |
| T5-GenQ-TD-v1 | 78.2570 | 58.9586 | 77.5308 | 77.5466 |
| T5-GenQ-TDE-v1 | 76.9075 | 57.0980 | 76.1464 | 76.1502 |
| T5-GenQ-TDC-v1 (best) | 80.0754 | 61.5974 | 79.3557 | 79.3427 |
| Model | ROUGE-1 | ROUGE-2 | ROUGE-L | ROUGE-Lsum |
|---|---|---|---|---|
| T5-GenQ-T-v1 | 73.11 | 52.27 | 72.51 | 72.51 |
| query-gen-msmarco-t5-base-v1 | 40.34 | 19.52 | 39.21 | 39.21 |
| Input Text | Target Query | Before Fine-tuning | After Fine-tuning |
|---|---|---|---|
| PANDORA Jewelry Crossover Pave Triple Band Ring for Women - Sterling Silver with Cubic Zirconia | PANDORA Crossover Triple Band Ring | what is pandora jewelry | Pandora crossover ring |
| SAYOYO Baby Sneakers Leather Baby Shoes Crib Shoes Toddler Soft Sole Sneakers | SAYOYO Baby Sneakers | what kind of shoes are baby sneakers | baby leather sneakers |
| 5 PCS Strap Replacement Compatible with Xiaomi Mi Band 3/4, Bands Xiaomi Mi Band 4 Smart Watch Wristbands Replacement Accessories Strap Bracelets for Mi Fit 3 Straps | Replacement Straps for Xiaomi Mi Band 3/4p | what is the strap on a xiaomi smartwatch | Xiaomi Mi Fit 3 replacement bands |
| Backpacker Ladies' Solid Flannel Shirt | ladies flannel shirt | what kind of shirt is a backpacker | women's flannel shirt |
| Epoch | Step | Loss | Grad Norm | Learning Rate | Eval Loss | ROUGE-1 | ROUGE-2 | ROUGE-L | ROUGE-Lsum |
|---|---|---|---|---|---|---|---|---|---|
| 1.0 | 4285 | 0.9465 | 6.7834 | 4.9e-05 | 0.7644 | 73.1872 | 52.2019 | 72.5199 | 72.5183 |
| 2.0 | 8570 | 0.8076 | 4.9071 | 4.2e-05 | 0.7268 | 73.9182 | 53.1365 | 73.2551 | 73.2570 |
| 3.0 | 12855 | 0.7485 | 4.4814 | 3.5e-05 | 0.7160 | 74.4752 | 53.8076 | 73.7712 | 73.7792 |
| 4.0 | 17140 | 0.7082 | 5.3145 | 2.8e-05 | 0.7023 | 74.7628 | 54.3316 | 74.0811 | 74.0790 |
| 5.0 | 21425 | 0.6788 | 4.4266 | 2.1e-05 | 0.7013 | 74.9437 | 54.5630 | 74.2637 | 74.2668 |
| 6.0 | 25710 | 0.6561 | 5.2897 | 1.4e-05 | 0.6998 | 75.0834 | 54.7163 | 74.3907 | 74.3977 |
| 7.0 | 29995 | 0.6396 | 3.5197 | 7.0e-06 | 0.7005 | 75.2151 | 54.8735 | 74.5142 | 74.5262 |
| 8.0 | 34280 | 0.6278 | 4.4625 | 0.0 | 0.7016 | 75.1899 | 54.8423 | 74.4695 | 74.4801 |
![]() |
The checkpoint-29995 (T5-GenQ-T-v1) model outperforms query-gen-msmarco-t5-base-v1 across all ROUGE metrics. The largest performance gap is in ROUGE2, where checkpoint-29995 achieves 52.27, whereas query-gen-msmarco-t5-base-v1 scores 19.52. ROUGE1, ROUGEL, and ROUGELSUM scores are very similar in both trends, with checkpoint-29995 consistently scoring above 72, while query-gen-msmarco-t5-base-v1 stays below 41. |
![]() |
```T5-GenQ-T-v1``` - Higher concentration of high ROUGE scores, especially near 100%, indicating strong text overlap with references. ```query-gen-msmarco-t5-base-v1``` – more spread-out distribution, with multiple peaks at 10-40%, suggesting greater variability but lower precision. ROUGE-1 & ROUGE-L: ```T5-GenQ-T-v1``` peaks at 100%, while ```query-gen-msmarco-t5-base-v1``` has lower, broader peaks. ROUGE-2: ```query-gen-msmarco-t5-base-v1``` has a high density at 0%, indicating many low-overlap outputs. |
![]() |
```T5-GenQ-T-v1``` – higher concentration of high ROUGE scores, especially near 100%, indicating strong text overlap with references. ```query-gen-msmarco-t5-base-v1``` – more spread-out distribution, with peaks in the 10-40% range, suggesting greater variability but lower precision. ROUGE-1 & ROUGE-L: ```T5-GenQ-T-v1``` shows a rising trend towards higher scores, while ```query-gen-msmarco-t5-base-v1``` has multiple peaks at lower scores. ROUGE-2: ```query-gen-msmarco-t5-base-v1``` has a high concentration of low-score outputs, whereas ```T5-GenQ-T-v1``` achieves more high-scoring outputs. |
![]() |
This visualization analyzes average ROUGE scores and score differences across different query sizes. High ROUGE Scores for Most Sizes (3-9 words). ROUGE-1, ROUGE-2, ROUGE-L, and ROUGE-LSUM scores remain consistently high across most word sizes. Sharp Spike at Size 2: A large positive score difference at 2 words, suggesting strong alignment for very short phrases. Stable Score Differences (Sizes 3-9): After the initial spike at size 2, score differences stay close to zero, indicating consistent performance across phrase lengths. |
![]() |
This histogram visualizes the distribution of cosine similarity scores, which measure the semantic similarity between paired texts. The majority of similarity scores cluster near 1.0, indicating that most text pairs are highly similar. A gradual increase in frequency is observed as similarity scores rise, with a sharp peak at 1.0. Lower similarity scores (0.0–0.4) are rare, suggesting fewer instances of dissimilar text pairs. |
![]() |
This scatter plot matrix compares semantic similarity (cosine similarity) with ROUGE scores, showing their correlation. Higher similarity → Higher ROUGE scores, indicating strong n-gram overlap in semantically similar texts. ROUGE-1 & ROUGE-L show the strongest correlation, while ROUGE-2 has more variability. Low-similarity outliers exist, where texts share words but differ semantically. |