Instructions to use Naphula/Goetia-24B-v1.3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Naphula/Goetia-24B-v1.3 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Naphula/Goetia-24B-v1.3") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Naphula/Goetia-24B-v1.3") model = AutoModelForCausalLM.from_pretrained("Naphula/Goetia-24B-v1.3") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Naphula/Goetia-24B-v1.3 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Naphula/Goetia-24B-v1.3" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Naphula/Goetia-24B-v1.3", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Naphula/Goetia-24B-v1.3
- SGLang
How to use Naphula/Goetia-24B-v1.3 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Naphula/Goetia-24B-v1.3" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Naphula/Goetia-24B-v1.3", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Naphula/Goetia-24B-v1.3" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Naphula/Goetia-24B-v1.3", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Naphula/Goetia-24B-v1.3 with Docker Model Runner:
docker model run hf.co/Naphula/Goetia-24B-v1.3
It's a DELLA! 👼
First vibe test positive, the magic is there... ✨
Glad to hear it , I did not test this one as much yet.
Cant wait to finetune 24B models eventually
Townfolk was talking about some kind of repetition bug, prompting development of Goetia v1.4 -- what's that all about? 👀
Not sure but v1.4 is probably a ways off. I'm trying to save up for a 3090, working long hours. I have made some new datasets and found a way to improve Cthulhu/Raven/Morpheus with longer context Q&A Pairs (~2k tokens instead of 500). My initial test set for this long-form style performed quite well.
So, a proper Cthulhu/Goetia 24B dataset is under construction, but we'll likely see the smaller versions first.
Attempts to finetune 12B on a 8GB card have failed, but I can now merge MOE for mistral 7b and llama 8b. These seem to be promising. I have more ideas for MOE finetunes/merges coming up.
Also was working on a custom moe_karcher method but lost the script in a BSOD
Update: The script has been recovered! Running a test of it now
What's your total VRAM? At least 16-24B is enough to help with finetuning.
Also you need high system ram (32-64gb, and a pagefile with SSD that has enough write cycles, I recommend at least 50-100GB)
I don't use mac but here is a supposed way to install it.
If you can get it to run a simple SLERP merge let me know, and I can send you more advanced scripts to test.
Save this as config.yaml and try to run it (recreate kekthulhu)
Some of these commands won't work on mac, so you may have to ask a 'smart' LLM for help mergekit-yaml C:\mergekit-main\config.yaml C:\mergekit-main\merged_model_output --copy-tokenizer --allow-crimes --out-shard-size 5B --trust-remote-code --lazy-unpickle --random-seed 420 --cuda
base_model: SicariusSicariiStuff\Assistant_Pepe_8B
architecture: MistralForCausalLM
merge_method: slerp
dtype: float32
out_dtype: bfloat16
slices:
- sources:
- model: Naphula\Cthulhu-8B-v1.4
layer_range: [0, 32]
- model: SicariusSicariiStuff\Assistant_Pepe_8B
layer_range: [0, 32]
parameters:
t: 0.5
tokenizer:
source: union
#chat_template: auto
To run Arcee AI's MergeKit on a Macintosh, first clone the repository using git clone https://github.com/arcee-ai/mergekit.git, then navigate to the directory with cd mergekit and install the package using pip install -e .. After installation, you can use the main script mergekit-yaml to perform merges by specifying a YAML configuration file and an output path.
Setting Up MergeKit on macOS
Prerequisites
Python: Ensure you have Python installed. You can download it from the official Python website.
Git: Install Git to clone the MergeKit repository. You can download it from the Git website.
Installation Steps
Clone the Repository:
Open your terminal and run the following command:
bash
git clone https://github.com/arcee-ai/mergekit.git
Navigate to the Directory:
Change to the MergeKit directory:
bash
cd mergekit
Install MergeKit:
Use pip to install MergeKit:
bash
pip install -e .
Running MergeKit
Prepare Your Configuration:
Create a YAML configuration file for your merge. You can use a template or create one from scratch.
Run the Merge Command:
Execute the merge command using the following syntax:
bash
mergekit-yaml path/to/your/config.yml ./output-model-directory [--cuda] [--lazy-unpickle] [--allow-crimes] [... other options]
Replace path/to/your/config.yml with the path to your configuration file and ./output-model-directory with your desired output directory.
Thanks for sharing these steps, I will try them out when I’m back home. I should be able to use around 50 GB as VRAM, though I don’t know if that translates well with this use case. Will keep you posted, once I’m at it 👍
If you manage to get Mergekit running then you will want to replace the graph.py with this version
https://huggingface.co/spaces/Naphula/model_tools/blob/main/graph_v18.py
(you would want to edit some of the settings to recalibrate it for your vram ceiling)
Not sure if it works on Mac but if so then you should be able to breeze through big merges in minutes instead of hours with 50GB VRAM (assuming you have storage space on the SSD for them, it takes forever to transfer from HDD)
I’m still on sick leave, but beginning next week, I hope to finally be able to try it out! I'd love to be able to contribute in a meaningful way, even if I don’t really know what I’m doing 🤪
Ran into a little stumbling block... git clone worked fine, but pip install -e requires an argument, I don't seem to have, tried something, but got this error:
ERROR: file:///Users/xxx/mergekit/mergekit does not appear to be a Python project: neither 'setup.py' nor 'pyproject.toml' found.
Edit: Never mind, had to copy my local path as content for the -e argument... it's doing things now! 🫡
Any idea how I can suppress the check?
error: the configured Python interpreter version (3.14) is newer than PyO3's maximum supported version (3.13)
= help: please check if an updated version of PyO3 is available. Current version: 0.22.6
= help: set PYO3_USE_ABI3_FORWARD_COMPATIBILITY=1 to suppress this check and build anyway using the stable ABI
warning: build failed, waiting for other jobs to finish...
💥 maturin failed
Caused by: Failed to build a native library through cargo
Not sure but I'll try to look into this.
What part is it failing at, during the merge or before?
Also I narrowed down Mephisto to 10 models but now instead of early terminations i'm having the opposite problem of endless prompts. High creativity but something with the chat ml unifier script apparently broke the tokenizers.
Morpheus and Meme Trix v2 coming soon. The new dataset is much cleaner and advanced than v1.
v1
========================================
SCAN RESULTS
========================================
Total Entries: 665
Max Characters: 3560
Max Tokens (Exact): 868
----------------------------------------
Average Tokens: 224
95th Percentile: 441 tokens (Covers 95% of data)
99th Percentile: 610 tokens (Covers 99% of data)
========================================
v2a (comparing to v2b)
========================================
SCAN RESULTS
========================================
Total Entries: 76
Max Characters: 11219
Max Tokens (Exact): 2012
----------------------------------------
Average Tokens: 1044
95th Percentile: 1560 tokens (Covers 95% of data)
99th Percentile: 1792 tokens (Covers 99% of data)
========================================
v2b
========================================
SCAN RESULTS
========================================
Total Entries: 104
Max Characters: 11219
Max Tokens (Exact): 2012
----------------------------------------
Average Tokens: 961
95th Percentile: 1250 tokens (Covers 95% of data)
99th Percentile: 1717 tokens (Covers 99% of data)
========================================
The 77 dataset is doing best so far. Now I'm testing a della merge of 3 different epochs
I'm using python 3.11. Asked gemini and it said this about your Mac mergekit (not sure if it is correct, back up anything important first):
The error you are seeing is because you are using Python 3.14, which is currently an experimental/developer version. Most machine learning libraries (like tokenizers, which mergekit depends on) are not yet compatible with 3.14 because it hasn't been officially released as stable.
Here is the advice for the specific "suppress" fix you asked for, followed by the "correct" way to handle this on a Mac to avoid future headaches.
1. The Quick Fix (Suppressing the check)
To follow the suggestion in your error message and force the installation to proceed, run these two commands in your terminal:
export PYO3_USE_ABI3_FORWARD_COMPATIBILITY=1
pip install -e .
Note: This might allow the installation to finish, but because Python 3.14 is so new, you might run into "Segmentation Faults" or crashes later when actually trying to merge models.
2. The "Idiot-Proof" Stable Fix (Recommended)
In the AI/ML world, Python 3.10, 3.11, or 3.12 are the "sweet spots." Python 3.14 is too "bleeding edge" and will break many of the tools you need.
Since you are on a Mac, the best way to handle this without messing up your system is to create a Virtual Environment using a stable version of Python.
Check if you have a stable version installed:
Runpython3.12 --version. If it returns a version number, use that. If not, trypython3.11.Create a clean environment (inside your mergekit folder):
# Create the environment (replace 3.12 with 3.11 if needed) python3.12 -m venv venv # Activate it source venv/bin/activateYour terminal prompt should now show
(venv)at the beginning.Install MergeKit inside that clean environment:
pip install --upgrade pip pip install -e .
3. Mac-Specific Tips for your 50GB VRAM
Since you mentioned having 50GB of "VRAM" (likely a Mac Studio or a high-spec MacBook Pro with Unified Memory):
- Unified Memory: On Mac, your "VRAM" is actually your System RAM. If you have 64GB of RAM, macOS usually limits a single process (like Python) to about 70-80% of that.
- The
--cudaflag: The command Naphula gave you includes--cuda. This will not work on Mac. Macs use Metal (MPS), not CUDA. - The Command to use on Mac:
Instead of--cuda, you should try running it without that flag, or check if your version ofmergekitsupports the device flag. Usually, for Mac, you just run:mergekit-yaml config.yaml ./output_folder --copy-tokenizer --allow-crimes --out-shard-size 5B --trust-remote-code - Accelerated Merging: MergeKit performs most operations in a way that is very efficient on CPU/RAM. On a Mac with Unified Memory, the "CPU" merge is actually extremely fast because the data doesn't have to travel between a dedicated GPU and RAM.
Summary for McG-221: Try the export command first if you want to stay on Python 3.14, but if the program crashes when you start the merge, you must downgrade to Python 3.12 using the Virtual Environment (Step 2) to get it to work.
Thanks for looking into this, thought as much... hope to be able to upgrade my hardware soon, until then it's "never touch a running system"... 😅