Ranjit0034 commited on
Commit
b46c0a4
Β·
verified Β·
1 Parent(s): 879d473

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -3,3 +3,4 @@ adapters/adapters.safetensors filter=lfs diff=lfs merge=lfs -text
3
  model-00001-of-00002.safetensors filter=lfs diff=lfs merge=lfs -text
4
  model-00002-of-00002.safetensors filter=lfs diff=lfs merge=lfs -text
5
  tokenizer.model filter=lfs diff=lfs merge=lfs -text
 
 
3
  model-00001-of-00002.safetensors filter=lfs diff=lfs merge=lfs -text
4
  model-00002-of-00002.safetensors filter=lfs diff=lfs merge=lfs -text
5
  tokenizer.model filter=lfs diff=lfs merge=lfs -text
6
+ finance-extractor-v8-f16.gguf filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -10,40 +10,55 @@ tags:
10
  - email
11
  - bank-statement
12
  - payment-apps
 
 
 
 
13
  - mlx
14
  - lora
15
  - phi-3
16
  - indian-banking
17
  - multi-bank
18
  - structured-output
19
- - pytorch
20
- - transformers
21
  library_name: transformers
22
  ---
23
 
24
  # 🧠 Finance Entity Extractor v0.8.0 (Universal)
25
 
26
- > **Now supports Linux/NVIDIA/PyTorch!** Production-ready LLM for structured financial extraction.
27
 
28
  [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
29
  [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97-Hugging%20Face-blue)](https://huggingface.co/Ranjit0034/finance-entity-extractor)
30
  [![PyTorch](https://img.shields.io/badge/PyTorch-Supported-red)](https://pytorch.org/)
 
 
 
31
 
32
- ## 🌟 Features
 
 
 
 
 
 
33
 
34
- - **Universal Support**: Runs on Linux (NVIDIA/CPU) and Mac (MLX).
35
- - **Multi-Bank**: HDFC, ICICI, SBI, Axis, Kotak.
 
 
36
  - **Structured JSON**: Validated, parseable output.
37
- - **Accuracy**: 94.5% (Multi-bank), 100% (Real HDFC).
 
 
38
 
39
- ## πŸ“¦ Installation
 
 
40
 
41
  ```bash
42
- pip install transformers torch
43
  ```
44
 
45
- ## ⚑ Quick Start (PyTorch / Linux / NVIDIA)
46
-
47
  ```python
48
  from transformers import AutoModelForCausalLM, AutoTokenizer
49
  import torch
@@ -54,7 +69,7 @@ tokenizer = AutoTokenizer.from_pretrained(model_id)
54
  model = AutoModelForCausalLM.from_pretrained(
55
  model_id,
56
  torch_dtype=torch.float16,
57
- device_map="auto"
58
  )
59
 
60
  prompt = """Extract financial entities from this email:
@@ -69,18 +84,56 @@ outputs = model.generate(**inputs, max_new_tokens=200)
69
  print(tokenizer.decode(outputs[0], skip_special_tokens=True))
70
  ```
71
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
72
  ## 🍏 Quick Start (Apple Silicon / MLX)
73
 
 
 
74
  ```bash
75
  pip install mlx-lm
76
  ```
77
 
78
  ```python
79
  from mlx_lm import load, generate
 
80
  model, tokenizer = load("Ranjit0034/finance-entity-extractor", adapter_path="adapters")
81
- # ... use as normal
 
 
82
  ```
83
 
 
 
84
  ## πŸ“Š Evaluation
85
 
86
  | Bank | Accuracy | Status |
@@ -88,14 +141,19 @@ model, tokenizer = load("Ranjit0034/finance-entity-extractor", adapter_path="ada
88
  | ICICI | 100% | βœ… |
89
  | HDFC | 95% | βœ… |
90
  | SBI | 93.3% | βœ… |
 
 
91
  | **Overall** | **94.5%** | πŸ† |
92
 
93
- ## πŸ“ Files
94
 
95
- - `model.safetensors`: Full 7GB PyTorch model (Dequantized).
96
- - `adapters/`: LoRA adapters (25MB) for bandwidth constrained users.
97
- - `inference.py`: Production API wrapper.
98
- - `train.py`: Reproducible training script.
 
 
 
99
 
100
  ---
101
  **Made with ❀️ by Ranjit Behera**
 
10
  - email
11
  - bank-statement
12
  - payment-apps
13
+ - pytorch
14
+ - transformers
15
+ - gguf
16
+ - llama-cpp
17
  - mlx
18
  - lora
19
  - phi-3
20
  - indian-banking
21
  - multi-bank
22
  - structured-output
 
 
23
  library_name: transformers
24
  ---
25
 
26
  # 🧠 Finance Entity Extractor v0.8.0 (Universal)
27
 
28
+ > **Production-ready LLM** for structured financial entity extraction. Works on **Linux/NVIDIA**, **macOS/MLX**, and **any platform via GGUF**.
29
 
30
  [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
31
  [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97-Hugging%20Face-blue)](https://huggingface.co/Ranjit0034/finance-entity-extractor)
32
  [![PyTorch](https://img.shields.io/badge/PyTorch-Supported-red)](https://pytorch.org/)
33
+ [![GGUF](https://img.shields.io/badge/GGUF-llama.cpp-green)](https://github.com/ggerganov/llama.cpp)
34
+
35
+ ## 🌟 Platform Support
36
 
37
+ | Platform | Framework | Status |
38
+ |----------|-----------|--------|
39
+ | Linux + NVIDIA GPU | PyTorch/Transformers | βœ… Full Support |
40
+ | Linux + CPU | PyTorch/GGUF | βœ… Full Support |
41
+ | Windows | GGUF/llama.cpp | βœ… Full Support |
42
+ | macOS Apple Silicon | MLX | βœ… Full Support |
43
+ | Cloud (AWS/GCP/Azure) | PyTorch/Transformers | βœ… Full Support |
44
 
45
+ ## 🎯 Features
46
+
47
+ - **Universal Support**: Runs on Linux, Windows, macOS (any hardware).
48
+ - **Multi-Bank**: HDFC, ICICI, SBI, Axis, Kotak + PhonePe, GPay, Paytm.
49
  - **Structured JSON**: Validated, parseable output.
50
+ - **Accuracy**: 94.5% (Multi-bank), 100% (Real HDFC emails).
51
+
52
+ ---
53
 
54
+ ## ⚑ Quick Start (PyTorch / Linux / NVIDIA)
55
+
56
+ Recommended for production servers with NVIDIA GPUs.
57
 
58
  ```bash
59
+ pip install transformers torch accelerate
60
  ```
61
 
 
 
62
  ```python
63
  from transformers import AutoModelForCausalLM, AutoTokenizer
64
  import torch
 
69
  model = AutoModelForCausalLM.from_pretrained(
70
  model_id,
71
  torch_dtype=torch.float16,
72
+ device_map="auto" # Automatically uses GPU
73
  )
74
 
75
  prompt = """Extract financial entities from this email:
 
84
  print(tokenizer.decode(outputs[0], skip_special_tokens=True))
85
  ```
86
 
87
+ ---
88
+
89
+ ## πŸ¦™ Quick Start (GGUF / llama.cpp)
90
+
91
+ Recommended for CPU inference / cross-platform deployment / edge devices.
92
+
93
+ ```bash
94
+ pip install llama-cpp-python
95
+ ```
96
+
97
+ ```python
98
+ from llama_cpp import Llama
99
+
100
+ # Download the GGUF file from this repo first
101
+ llm = Llama(model_path="finance-extractor-v8-f16.gguf")
102
+
103
+ output = llm(
104
+ "Extract financial entities from: Rs.500 debited from A/c 1234 on 01-01-25\nOutput JSON:",
105
+ max_tokens=200
106
+ )
107
+ print(output["choices"][0]["text"])
108
+ ```
109
+
110
+ **llama.cpp CLI:**
111
+ ```bash
112
+ ./main -m finance-extractor-v8-f16.gguf \
113
+ -p "Extract financial entities from: Rs.500 debited from A/c 1234 on 01-01-25"
114
+ ```
115
+
116
+ ---
117
+
118
  ## 🍏 Quick Start (Apple Silicon / MLX)
119
 
120
+ Recommended for Mac developers using Apple Silicon.
121
+
122
  ```bash
123
  pip install mlx-lm
124
  ```
125
 
126
  ```python
127
  from mlx_lm import load, generate
128
+
129
  model, tokenizer = load("Ranjit0034/finance-entity-extractor", adapter_path="adapters")
130
+ prompt = "Extract financial entities from: Rs.500 debited from A/c 1234 on 01-01-25"
131
+ response = generate(model, tokenizer, prompt=prompt, max_tokens=200)
132
+ print(response)
133
  ```
134
 
135
+ ---
136
+
137
  ## πŸ“Š Evaluation
138
 
139
  | Bank | Accuracy | Status |
 
141
  | ICICI | 100% | βœ… |
142
  | HDFC | 95% | βœ… |
143
  | SBI | 93.3% | βœ… |
144
+ | Axis | 93.3% | βœ… |
145
+ | Kotak | 92% | βœ… |
146
  | **Overall** | **94.5%** | πŸ† |
147
 
148
+ ## πŸ“ Repository Files
149
 
150
+ | File | Size | Description |
151
+ |------|------|-------------|
152
+ | `model-*.safetensors` | ~7.1GB | Full PyTorch model (bfloat16) |
153
+ | `finance-extractor-v8-f16.gguf` | ~7.1GB | GGUF for llama.cpp (F16) |
154
+ | `adapters/` | ~24MB | LoRA adapters for MLX |
155
+ | `inference.py` | - | Production API wrapper |
156
+ | `train.py` | - | Reproducible training script |
157
 
158
  ---
159
  **Made with ❀️ by Ranjit Behera**
adapters/adapter_config.json CHANGED
@@ -1,12 +1,12 @@
1
  {
2
- "adapter_path": "models/adapters/finance-lora-v8",
3
  "batch_size": 1,
4
  "config": null,
5
  "data": "data/training",
6
  "fine_tune_type": "lora",
7
  "grad_accumulation_steps": 1,
8
  "grad_checkpoint": false,
9
- "iters": 800,
10
  "learning_rate": 1e-05,
11
  "lora_parameters": {
12
  "rank": 8,
 
1
  {
2
+ "adapter_path": "models/adapters/finance-lora-v6",
3
  "batch_size": 1,
4
  "config": null,
5
  "data": "data/training",
6
  "fine_tune_type": "lora",
7
  "grad_accumulation_steps": 1,
8
  "grad_checkpoint": false,
9
+ "iters": 1500,
10
  "learning_rate": 1e-05,
11
  "lora_parameters": {
12
  "rank": 8,
adapters/adapters.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:af742ba0c9de0119ea1f0a667a8dca8f42d105d67f52401bb514e79f6e59937c
3
  size 25179794
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1c98bb44acb9bdf50180c215020c6b93ffa42e1b9626bc3b2eac8522b2e6bf03
3
  size 25179794
finance-extractor-v8-f16.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3a130047e6928fe3bd2eedc2ff9fc556263325d1bc9ee7de59634bd7a9b601b2
3
+ size 7643296448