Instructions to use togethercomputer/GPT-JT-6B-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use togethercomputer/GPT-JT-6B-v1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="togethercomputer/GPT-JT-6B-v1")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("togethercomputer/GPT-JT-6B-v1") model = AutoModelForCausalLM.from_pretrained("togethercomputer/GPT-JT-6B-v1") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use togethercomputer/GPT-JT-6B-v1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "togethercomputer/GPT-JT-6B-v1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "togethercomputer/GPT-JT-6B-v1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/togethercomputer/GPT-JT-6B-v1
- SGLang
How to use togethercomputer/GPT-JT-6B-v1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "togethercomputer/GPT-JT-6B-v1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "togethercomputer/GPT-JT-6B-v1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "togethercomputer/GPT-JT-6B-v1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "togethercomputer/GPT-JT-6B-v1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use togethercomputer/GPT-JT-6B-v1 with Docker Model Runner:
docker model run hf.co/togethercomputer/GPT-JT-6B-v1
Complete noob question - cloned the repository, now what?
I'm a complete newcomer to hugging face, and would like to try running this model on my machine.
I created a new environment with:
conda create -n model-env python=3.10 transformers pytorch
activated the environment, read through the readme, and copied the quick start code given there into a new .py file.
It seems the GPT-JT-6B-v1 folder contains all the files necessary to run this model, so modified the path to point to my local directory:
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(r"C:\<path>\test\GPT-JT-6B-v1")
tokenizer = AutoTokenizer.from_pretrained(r"C:\<path>\test\GPT-JT-6B-v1")
However, when I run this I get the following error:
OSError: Unable to load weights from pytorch checkpoint file for 'C:\<path>\test\GPT-JT-6B-v1\pytorch_model.bin' at 'C:\<path>\test\GPT-JT-6B-v1\pytorch_model.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.
Am I doing it wrong??
Have you tried letting the transformers model handle the installation of the model?
For example:
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("togethercomputer/GPT-JT-6B-v1")
model = AutoModelForCausalLM.from_pretrained("togethercomputer/GPT-JT-6B-v1")
This code will download all the files itself and manage them
Hello, as @DarwinAnim8or mentioned, the transformers library can handle loading the model for you.
However, if you prefer to load the model from a local folder, please ensure that the folder C:\<path>\test\GPT-JT-6B-v1 directly contains all of the necessary files from this repository, and that there are no nested folders within it. Thank you π
Thank you @juewang , it turns out the pytorch_model.bin file was corrupted. After I downloaded it independently and replaced the corrupted file, I got it running!
However, I ended up getting a bunch of weird jibberish to my prompt "once upon a time":
once upon a timeonceonceonceonlyonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonce onceonce once onceonce once once a timeonceonce
once a timeonceonce once once a timeonceonce once a timeonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonceonce
Along with a warning:
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.`
Looking into this now, but any pointers are much appreciated =]