Instructions to use togethercomputer/GPT-JT-6B-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use togethercomputer/GPT-JT-6B-v1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="togethercomputer/GPT-JT-6B-v1")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("togethercomputer/GPT-JT-6B-v1")
model = AutoModelForCausalLM.from_pretrained("togethercomputer/GPT-JT-6B-v1")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use togethercomputer/GPT-JT-6B-v1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "togethercomputer/GPT-JT-6B-v1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "togethercomputer/GPT-JT-6B-v1",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/togethercomputer/GPT-JT-6B-v1

SGLang

How to use togethercomputer/GPT-JT-6B-v1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "togethercomputer/GPT-JT-6B-v1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "togethercomputer/GPT-JT-6B-v1",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "togethercomputer/GPT-JT-6B-v1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "togethercomputer/GPT-JT-6B-v1",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use togethercomputer/GPT-JT-6B-v1 with Docker Model Runner:
```
docker model run hf.co/togethercomputer/GPT-JT-6B-v1
```

Complete noob question - cloned the repository, now what?

#17

by hansintheair - opened Dec 29, 2022

Discussion

hansintheair

Dec 29, 2022

I'm a complete newcomer to hugging face, and would like to try running this model on my machine.

I created a new environment with:

conda create -n model-env python=3.10 transformers pytorch

activated the environment, read through the readme, and copied the quick start code given there into a new .py file.

It seems the GPT-JT-6B-v1 folder contains all the files necessary to run this model, so modified the path to point to my local directory:

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(r"C:\<path>\test\GPT-JT-6B-v1")
tokenizer = AutoTokenizer.from_pretrained(r"C:\<path>\test\GPT-JT-6B-v1")

However, when I run this I get the following error:

OSError: Unable to load weights from pytorch checkpoint file for 'C:\<path>\test\GPT-JT-6B-v1\pytorch_model.bin' at 'C:\<path>\test\GPT-JT-6B-v1\pytorch_model.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.

Am I doing it wrong??

DarwinAnim8or

Jan 5, 2023

Have you tried letting the transformers model handle the installation of the model?

For example:
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("togethercomputer/GPT-JT-6B-v1")

model = AutoModelForCausalLM.from_pretrained("togethercomputer/GPT-JT-6B-v1")

This code will download all the files itself and manage them

juewang

Together org Jan 23, 2023

Hello, as @DarwinAnim8or mentioned, the transformers library can handle loading the model for you.
However, if you prefer to load the model from a local folder, please ensure that the folder C:\<path>\test\GPT-JT-6B-v1 directly contains all of the necessary files from this repository, and that there are no nested folders within it. Thank you 😊

hansintheair changed discussion status to closed Feb 4, 2023

hansintheair

Feb 4, 2023

Thank you @juewang , it turns out the pytorch_model.bin file was corrupted. After I downloaded it independently and replaced the corrupted file, I got it running!

However, I ended up getting a bunch of weird jibberish to my prompt "once upon a time":

once upon a timeonceonceonceonlyonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonce onceonce once onceonce once once a timeonceonce
 once a timeonceonce once once a timeonceonce once a timeonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonce

Along with a warning:

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.`

Looking into this now, but any pointers are much appreciated =]

hansintheair changed discussion status to open Feb 4, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment