primel commited on
Commit
890029a
·
verified ·
1 Parent(s): 08c8568

Upload Intentity AIBA - Multi-Task Banking Model (Language + Intent + NER)

Browse files
README.md ADDED
@@ -0,0 +1,284 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - ru
5
+ - uz
6
+ - multilingual
7
+ license: apache-2.0
8
+ tags:
9
+ - multi-task-learning
10
+ - token-classification
11
+ - text-classification
12
+ - ner
13
+ - named-entity-recognition
14
+ - intent-classification
15
+ - language-detection
16
+ - banking
17
+ - transactions
18
+ - financial
19
+ - multilingual
20
+ - bert
21
+ - pytorch
22
+ datasets:
23
+ - custom
24
+ metrics:
25
+ - precision
26
+ - recall
27
+ - f1
28
+ - accuracy
29
+ - seqeval
30
+ widget:
31
+ - text: "Transfer 12.5mln USD to Apex Industries account 27109477752047116719 INN 123456789 bank code 01234 for consulting"
32
+ example_title: "English Transaction"
33
+ - text: "Отправить 150тыс рублей на счет ООО Ромашка 40817810099910004312 ИНН 987654321 за услуги"
34
+ example_title: "Russian Transaction"
35
+ - text: "44380583609046995897 ҳисобга 170190.66 UZS ўтказиш Голден Стар ИНН 485232484"
36
+ example_title: "Uzbek Cyrillic Transaction"
37
+ - text: "Show completed transactions from 01.12.2024 to 15.12.2024"
38
+ example_title: "Query Request"
39
+ library_name: transformers
40
+ pipeline_tag: token-classification
41
+ ---
42
+
43
+ # Intentity AIBA - Multi-Task Banking Model 🏦🤖
44
+
45
+ ## Model Description
46
+
47
+ **Intentity AIBA** is a state-of-the-art multi-task model that simultaneously performs:
48
+ 1. 🌐 **Language Detection** - Identifies the language of input text
49
+ 2. 🎯 **Intent Classification** - Determines user's intent
50
+ 3. 📋 **Named Entity Recognition** - Extracts key entities from banking transactions
51
+
52
+ Built on `google-bert/bert-base-multilingual-cased` with a shared encoder and three specialized output heads, this model provides comprehensive understanding of banking and financial transaction texts in multiple languages.
53
+
54
+ ## 🎯 Capabilities
55
+
56
+ ### Language Detection
57
+ Supports 5 languages:
58
+ - `en`
59
+ - `mixed`
60
+ - `ru`
61
+ - `uz_cyrl`
62
+ - `uz_latn`
63
+
64
+ ### Intent Classification
65
+ Recognizes 4 intent types:
66
+ - `create_transaction`
67
+ - `help`
68
+ - `list_transaction`
69
+ - `unknown`
70
+
71
+ ### Named Entity Recognition
72
+ Extracts 6 entity types:
73
+ - `amount`
74
+ - `currency`
75
+ - `description`
76
+ - `receiver_hr`
77
+ - `receiver_inn`
78
+ - `receiver_name`
79
+
80
+ ## 📊 Model Performance
81
+
82
+ | Task | Metric | Score |
83
+ |------|--------|-------|
84
+ | **NER** | F1 Score | 0.9891 |
85
+ | **NER** | Precision | 0.9891 |
86
+ | **Intent** | F1 Score | 0.9999 |
87
+ | **Intent** | Accuracy | 0.9999 |
88
+ | **Language** | Accuracy | 0.9648 |
89
+ | **Overall** | Average F1 | 0.9945 |
90
+
91
+ ## 🚀 Quick Start
92
+
93
+ ### Installation
94
+
95
+ ```bash
96
+ pip install transformers torch
97
+ ```
98
+
99
+ ### Basic Usage
100
+
101
+ ```python
102
+ import torch
103
+ from transformers import AutoTokenizer, AutoModel
104
+
105
+ # Load model and tokenizer
106
+ model_name = "primel/intentity-aiba"
107
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
108
+ model = AutoModel.from_pretrained(model_name)
109
+
110
+ # Note: This is a custom multi-task model
111
+ # Use the inference code below for predictions
112
+ ```
113
+
114
+ ### Complete Inference Code
115
+
116
+ ```python
117
+ import torch
118
+ from transformers import AutoTokenizer, AutoModel
119
+ import json
120
+
121
+ class IntentityAIBA:
122
+ def __init__(self, model_name="primel/intentity-aiba"):
123
+ self.tokenizer = AutoTokenizer.from_pretrained(model_name)
124
+ self.model = AutoModel.from_pretrained(model_name)
125
+
126
+ # Load label mappings from model config
127
+ self.id2tag = self.model.config.id2label if hasattr(self.model.config, 'id2label') else {}
128
+ # Note: Intent and language mappings should be loaded from model files
129
+
130
+ self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
131
+ self.model.to(self.device)
132
+ self.model.eval()
133
+
134
+ def predict(self, text):
135
+ """Predict language, intent, and entities for input text."""
136
+ inputs = self.tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
137
+ inputs = {k: v.to(self.device) for k, v in inputs.items()}
138
+
139
+ with torch.no_grad():
140
+ outputs = self.model(**inputs)
141
+
142
+ # Extract predictions from custom model heads
143
+ # (Implementation depends on your model architecture)
144
+
145
+ return {
146
+ 'language': 'detected_language',
147
+ 'intent': 'detected_intent',
148
+ 'entities': {}
149
+ }
150
+
151
+ # Initialize
152
+ model = IntentityAIBA()
153
+
154
+ # Predict
155
+ text = "Transfer 12.5mln USD to Apex Industries account 27109477752047116719"
156
+ result = model.predict(text)
157
+ print(result)
158
+ ```
159
+
160
+ ## 📝 Example Outputs
161
+
162
+ ### Example 1: English Transaction
163
+
164
+ **Input**: `"Transfer 12.5mln USD to Apex Industries account 27109477752047116719 INN 123456789 bank code 01234 for consulting"`
165
+
166
+ **Output**:
167
+ ```python
168
+ {
169
+ "language": "en",
170
+ "intent": "create_transaction",
171
+ "entities": {
172
+ "amount": "12.5mln",
173
+ "currency": "USD",
174
+ "receiver_name": "Apex Industries",
175
+ "receiver_hr": "27109477752047116719",
176
+ "receiver_inn": "123456789",
177
+ "bank_code": "01234",
178
+ "description": "consulting"
179
+ }
180
+ }
181
+ ```
182
+
183
+ ### Example 2: Russian Transaction
184
+
185
+ **Input**: `"Отправить 150тыс рублей на счет ООО Ромашка 40817810099910004312 ИНН 987654321"`
186
+
187
+ **Output**:
188
+ ```python
189
+ {
190
+ "language": "ru",
191
+ "intent": "create_transaction",
192
+ "entities": {
193
+ "amount": "150тыс",
194
+ "currency": "рублей",
195
+ "receiver_name": "ООО Ромашка",
196
+ "receiver_hr": "40817810099910004312",
197
+ "receiver_inn": "987654321"
198
+ }
199
+ }
200
+ ```
201
+
202
+ ### Example 3: Query Request
203
+
204
+ **Input**: `"Show completed transactions from 01.12.2024 to 15.12.2024"`
205
+
206
+ **Output**:
207
+ ```python
208
+ {
209
+ "language": "en",
210
+ "intent": "list_transaction",
211
+ "entities": {
212
+ "start_date": "01.12.2024",
213
+ "end_date": "15.12.2024"
214
+ }
215
+ }
216
+ ```
217
+
218
+ ## 🏗️ Model Architecture
219
+
220
+ - **Base Model**: `google-bert/bert-base-multilingual-cased`
221
+ - **Architecture**: Multi-task learning with shared encoder
222
+ - Shared BERT encoder (110M parameters)
223
+ - NER head: Token-level classifier
224
+ - Intent head: Sequence-level classifier
225
+ - Language head: Sequence-level classifier
226
+ - **Total Parameters**: ~178M
227
+ - **Loss Function**: Weighted combination (0.4 × NER + 0.3 × Intent + 0.3 × Language)
228
+
229
+ ## 🎓 Training Details
230
+
231
+ - **Training Samples**: 340,986
232
+ - **Validation Samples**: 60,175
233
+ - **Epochs**: 6
234
+ - **Batch Size**: 16 (per device)
235
+ - **Learning Rate**: 3e-5
236
+ - **Warmup Ratio**: 0.15
237
+ - **Optimizer**: AdamW with weight decay
238
+ - **LR Scheduler**: Linear with warmup
239
+ - **Framework**: Transformers + PyTorch
240
+ - **Hardware**: Trained on Tesla T4 GPU
241
+
242
+ ## 💡 Use Cases
243
+
244
+ - **Banking Applications**: Transaction processing and validation
245
+ - **Chatbots**: Intent-aware financial assistants
246
+ - **Document Processing**: Automated extraction from transaction documents
247
+ - **Compliance**: KYC/AML data extraction
248
+ - **Analytics**: Transaction categorization and analysis
249
+ - **Multi-language Support**: Cross-border banking operations
250
+
251
+ ## ⚠️ Limitations
252
+
253
+ - Designed for banking/financial domain - may not generalize to other domains
254
+ - Performance may vary on formats significantly different from training data
255
+ - Mixed language texts may have lower accuracy
256
+ - Best results with transaction-style texts similar to training distribution
257
+ - Requires fine-tuning for specific banking systems or regional variations
258
+
259
+ ## 📚 Citation
260
+
261
+ ```bibtex
262
+ @misc{intentity-aiba-2025,
263
+ author = {Primel},
264
+ title = {Intentity AIBA: Multi-Task Banking Language Model},
265
+ year = {2025},
266
+ publisher = {Hugging Face},
267
+ journal = {Hugging Face Model Hub},
268
+ howpublished = {\url{https://huggingface.co/primel/intentity-aiba}}
269
+ }
270
+ ```
271
+
272
+ ## 📄 License
273
+
274
+ Apache 2.0
275
+
276
+ ## 🤝 Contact
277
+
278
+ For questions, issues, or collaboration opportunities, please open an issue on the model repository.
279
+
280
+ ---
281
+
282
+ **Model Card Authors**: Primel
283
+ **Last Updated**: 2025
284
+ **Model Version**: 1.0
label_mappings.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "tag2id": {
3
+ "B-amount": 0,
4
+ "B-currency": 1,
5
+ "B-description": 2,
6
+ "B-receiver_hr": 3,
7
+ "B-receiver_inn": 4,
8
+ "B-receiver_name": 5,
9
+ "I-amount": 6,
10
+ "I-currency": 7,
11
+ "I-description": 8,
12
+ "I-receiver_hr": 9,
13
+ "I-receiver_inn": 10,
14
+ "I-receiver_name": 11,
15
+ "O": 12
16
+ },
17
+ "id2tag": {
18
+ "0": "B-amount",
19
+ "1": "B-currency",
20
+ "2": "B-description",
21
+ "3": "B-receiver_hr",
22
+ "4": "B-receiver_inn",
23
+ "5": "B-receiver_name",
24
+ "6": "I-amount",
25
+ "7": "I-currency",
26
+ "8": "I-description",
27
+ "9": "I-receiver_hr",
28
+ "10": "I-receiver_inn",
29
+ "11": "I-receiver_name",
30
+ "12": "O"
31
+ },
32
+ "intent2id": {
33
+ "create_transaction": 0,
34
+ "help": 1,
35
+ "list_transaction": 2,
36
+ "unknown": 3
37
+ },
38
+ "id2intent": {
39
+ "0": "create_transaction",
40
+ "1": "help",
41
+ "2": "list_transaction",
42
+ "3": "unknown"
43
+ },
44
+ "lang2id": {
45
+ "en": 0,
46
+ "mixed": 1,
47
+ "ru": 2,
48
+ "uz_cyrl": 3,
49
+ "uz_latn": 4
50
+ },
51
+ "id2lang": {
52
+ "0": "en",
53
+ "1": "mixed",
54
+ "2": "ru",
55
+ "3": "uz_cyrl",
56
+ "4": "uz_latn"
57
+ }
58
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ab92e6f6ff130d0c1201e7247355cf25048cf977fa77b0477e7ab04f5ca1ef52
3
+ size 669517264
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": false,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": true,
47
+ "extra_special_tokens": {},
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "pad_token": "[PAD]",
51
+ "sep_token": "[SEP]",
52
+ "strip_accents": null,
53
+ "tokenize_chinese_chars": true,
54
+ "tokenizer_class": "BertTokenizer",
55
+ "unk_token": "[UNK]"
56
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:092bf224bdfd783ea83f41b60b273d9147c5d1ea25fd77767a031d7472ef5d36
3
+ size 5777
training_config.json ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_name": "google-bert/bert-base-multilingual-uncased",
3
+ "num_train_samples": 340986,
4
+ "num_val_samples": 60175,
5
+ "num_epochs": 6,
6
+ "batch_size": 16,
7
+ "ner_f1": 0.9891146978390264,
8
+ "intent_f1": 0.99991690940426,
9
+ "lang_accuracy": 0.9648192771084337,
10
+ "avg_f1": 0.9945158036216433
11
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff