add sample for usage with vLLM to Readme
Browse files
README.md
CHANGED
|
@@ -131,6 +131,47 @@ print(prediction_text)
|
|
| 131 |
|
| 132 |
This example demonstrates how to load the model and tokenizer, prepare input, generate text, and print the result.
|
| 133 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 134 |
## Training Details
|
| 135 |
|
| 136 |
### Pre-Training Data
|
|
|
|
| 131 |
|
| 132 |
This example demonstrates how to load the model and tokenizer, prepare input, generate text, and print the result.
|
| 133 |
|
| 134 |
+
### Usage with vLLM Server
|
| 135 |
+
Starting the vLLM Server:
|
| 136 |
+
``` shell
|
| 137 |
+
vllm serve openGPT-X/Teuken-7B-instruct-research-v0.4 --trust-remote-code
|
| 138 |
+
```
|
| 139 |
+
Use Chat API with vLLM and pass the language of the Chat-Template as extra body:
|
| 140 |
+
``` python
|
| 141 |
+
from openai import OpenAI
|
| 142 |
+
|
| 143 |
+
client = OpenAI(
|
| 144 |
+
api_key="EMPTY",
|
| 145 |
+
base_url="http://localhost:8000/v1",
|
| 146 |
+
)
|
| 147 |
+
completion = client.chat.completions.create(
|
| 148 |
+
model="openGPT-X/Teuken-7B-instruct-research-v0.4",
|
| 149 |
+
messages=[{"role": "User", "content": "Hallo"}],
|
| 150 |
+
extra_body={"chat_template":"DE"}
|
| 151 |
+
)
|
| 152 |
+
print(f"Assistant: {completion]")
|
| 153 |
+
```
|
| 154 |
+
The default language of the Chat-Template can also be set when starting the vLLM Server. For this create a new file with the name `lang` and the content `DE` and start the vLLM Server as follows:
|
| 155 |
+
``` shell
|
| 156 |
+
vllm serve openGPT-X/Teuken-7B-instruct-research-v0.4 --trust-remote-code --chat-template lang
|
| 157 |
+
```
|
| 158 |
+
|
| 159 |
+
### Usage with vLLM Offline Batched Inference
|
| 160 |
+
``` python
|
| 161 |
+
from vllm import LLM, SamplingParams
|
| 162 |
+
|
| 163 |
+
sampling_params = SamplingParams(temperature=0.01, max_tokens=1024, stop=["</s>"])
|
| 164 |
+
llm = LLM(model="openGPT-X/Teuken-7B-instruct-research-v0.4", trust_remote_code=True, dtype="bfloat16")
|
| 165 |
+
outputs = llm.chat(
|
| 166 |
+
messages=[{"role": "User", "content": "Hallo"}],
|
| 167 |
+
sampling_params=sampling_params,
|
| 168 |
+
chat_template="DE"
|
| 169 |
+
)
|
| 170 |
+
print(f"Prompt: {outputs[0].prompt}")
|
| 171 |
+
print(f"Assistant: {outputs[0].outputs[0].text}")
|
| 172 |
+
```
|
| 173 |
+
|
| 174 |
+
|
| 175 |
## Training Details
|
| 176 |
|
| 177 |
### Pre-Training Data
|