Instructions to use Naphula/Goetia-24B-v1.3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Naphula/Goetia-24B-v1.3 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Naphula/Goetia-24B-v1.3")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Naphula/Goetia-24B-v1.3")
model = AutoModelForCausalLM.from_pretrained("Naphula/Goetia-24B-v1.3")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Naphula/Goetia-24B-v1.3 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Naphula/Goetia-24B-v1.3"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Naphula/Goetia-24B-v1.3",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Naphula/Goetia-24B-v1.3

SGLang

How to use Naphula/Goetia-24B-v1.3 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Naphula/Goetia-24B-v1.3" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Naphula/Goetia-24B-v1.3",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Naphula/Goetia-24B-v1.3" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Naphula/Goetia-24B-v1.3",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Naphula/Goetia-24B-v1.3 with Docker Model Runner:
```
docker model run hf.co/Naphula/Goetia-24B-v1.3
```

It's a DELLA! 👼

by McG-221 - opened Feb 10

Discussion

McG-221

Feb 10

First vibe test positive, the magic is there... ✨

Naphula

Owner Feb 10

Glad to hear it , I did not test this one as much yet.

Cant wait to finetune 24B models eventually

McG-221

Feb 14

Townfolk was talking about some kind of repetition bug, prompting development of Goetia v1.4 -- what's that all about? 👀

Naphula

Owner Feb 21

•

edited Feb 21

Not sure but v1.4 is probably a ways off. I'm trying to save up for a 3090, working long hours. I have made some new datasets and found a way to improve Cthulhu/Raven/Morpheus with longer context Q&A Pairs (~2k tokens instead of 500). My initial test set for this long-form style performed quite well.

So, a proper Cthulhu/Goetia 24B dataset is under construction, but we'll likely see the smaller versions first.

Attempts to finetune 12B on a 8GB card have failed, but I can now merge MOE for mistral 7b and llama 8b. These seem to be promising. I have more ideas for MOE finetunes/merges coming up.

Also was working on a custom moe_karcher method ~~but lost the script in a BSOD~~

Update: The script has been recovered! Running a test of it now

McG-221

Feb 21

@Naphula Here's the thing: I have good hardware, but am not proficient in the software magic side of things. If you could provide me with an idiot-proof script that runs on macOS, I could certainly try to help you with merging, or whatever needed. Hit me up, if you think this could work…

Naphula

Owner Feb 21

•

edited Feb 21

What's your total VRAM? At least 16-24B is enough to help with finetuning.

Also you need high system ram (32-64gb, and a pagefile with SSD that has enough write cycles, I recommend at least 50-100GB)

I don't use mac but here is a supposed way to install it.

If you can get it to run a simple SLERP merge let me know, and I can send you more advanced scripts to test.

Save this as config.yaml and try to run it (recreate kekthulhu)

Some of these commands won't work on mac, so you may have to ask a 'smart' LLM for help mergekit-yaml C:\mergekit-main\config.yaml C:\mergekit-main\merged_model_output --copy-tokenizer --allow-crimes --out-shard-size 5B --trust-remote-code --lazy-unpickle --random-seed 420 --cuda

base_model: SicariusSicariiStuff\Assistant_Pepe_8B
architecture: MistralForCausalLM
merge_method: slerp
dtype: float32
out_dtype: bfloat16
slices:
  - sources:
      - model: Naphula\Cthulhu-8B-v1.4
        layer_range: [0, 32]
      - model: SicariusSicariiStuff\Assistant_Pepe_8B
        layer_range: [0, 32]
parameters:
  t: 0.5
tokenizer:
source: union
#chat_template: auto

To run Arcee AI's MergeKit on a Macintosh, first clone the repository using git clone https://github.com/arcee-ai/mergekit.git, then navigate to the directory with cd mergekit and install the package using pip install -e .. After installation, you can use the main script mergekit-yaml to perform merges by specifying a YAML configuration file and an output path.

Setting Up MergeKit on macOS

Prerequisites

Python: Ensure you have Python installed. You can download it from the official Python website.
Git: Install Git to clone the MergeKit repository. You can download it from the Git website.

Installation Steps

Clone the Repository:

Open your terminal and run the following command:

bash

git clone https://github.com/arcee-ai/mergekit.git

Navigate to the Directory:

Change to the MergeKit directory:

bash

cd mergekit

Install MergeKit:

Use pip to install MergeKit:

bash

pip install -e .

Running MergeKit

Prepare Your Configuration:

Create a YAML configuration file for your merge. You can use a template or create one from scratch.

Run the Merge Command:

Execute the merge command using the following syntax:

bash

mergekit-yaml path/to/your/config.yml ./output-model-directory [--cuda] [--lazy-unpickle] [--allow-crimes] [... other options]

Replace path/to/your/config.yml with the path to your configuration file and ./output-model-directory with your desired output directory.

McG-221

Feb 21

Thanks for sharing these steps, I will try them out when I’m back home. I should be able to use around 50 GB as VRAM, though I don’t know if that translates well with this use case. Will keep you posted, once I’m at it 👍

Naphula

Owner Feb 22

•

edited Feb 22

If you manage to get Mergekit running then you will want to replace the graph.py with this version

https://huggingface.co/spaces/Naphula/model_tools/blob/main/graph_v18.py

(you would want to edit some of the settings to recalibrate it for your vram ceiling)

Not sure if it works on Mac but if so then you should be able to breeze through big merges in minutes instead of hours with 50GB VRAM (assuming you have storage space on the SSD for them, it takes forever to transfer from HDD)

McG-221

Feb 22

I’m still on sick leave, but beginning next week, I hope to finally be able to try it out! I'd love to be able to contribute in a meaningful way, even if I don’t really know what I’m doing 🤪

McG-221

Feb 23

•

edited Feb 23

Ran into a little stumbling block... git clone worked fine, but pip install -e requires an argument, I don't seem to have, tried something, but got this error:

ERROR: file:///Users/xxx/mergekit/mergekit does not appear to be a Python project: neither 'setup.py' nor 'pyproject.toml' found.

Edit: Never mind, had to copy my local path as content for the -e argument... it's doing things now! 🫡

McG-221

Feb 23

Any idea how I can suppress the check?

error: the configured Python interpreter version (3.14) is newer than PyO3's maximum supported version (3.13)
= help: please check if an updated version of PyO3 is available. Current version: 0.22.6
= help: set PYO3_USE_ABI3_FORWARD_COMPATIBILITY=1 to suppress this check and build anyway using the stable ABI
warning: build failed, waiting for other jobs to finish...
💥 maturin failed
Caused by: Failed to build a native library through cargo

Naphula

Owner Feb 25

Not sure but I'll try to look into this.

What part is it failing at, during the merge or before?

Naphula

Owner Feb 25

Also I narrowed down Mephisto to 10 models but now instead of early terminations i'm having the opposite problem of endless prompts. High creativity but something with the chat ml unifier script apparently broke the tokenizers.

Naphula

Owner Feb 25

•

edited Feb 25

Morpheus and Meme Trix v2 coming soon. The new dataset is much cleaner and advanced than v1.

========================================
       SCAN RESULTS
========================================
Total Entries:      665
Max Characters:     3560
Max Tokens (Exact): 868
----------------------------------------
Average Tokens:     224
95th Percentile:    441 tokens (Covers 95% of data)
99th Percentile:    610 tokens (Covers 99% of data)
========================================

v2a (comparing to v2b)

========================================
       SCAN RESULTS
========================================
Total Entries:      76
Max Characters:     11219
Max Tokens (Exact): 2012
----------------------------------------
Average Tokens:     1044
95th Percentile:    1560 tokens (Covers 95% of data)
99th Percentile:    1792 tokens (Covers 99% of data)
========================================

v2b

========================================
       SCAN RESULTS
========================================
Total Entries:      104
Max Characters:     11219
Max Tokens (Exact): 2012
----------------------------------------
Average Tokens:     961
95th Percentile:    1250 tokens (Covers 95% of data)
99th Percentile:    1717 tokens (Covers 99% of data)
========================================

Naphula

Owner Feb 25

The 77 dataset is doing best so far. Now I'm testing a della merge of 3 different epochs

Naphula

Owner Feb 25

I'm using python 3.11. Asked gemini and it said this about your Mac mergekit (not sure if it is correct, back up anything important first):

The error you are seeing is because you are using Python 3.14, which is currently an experimental/developer version. Most machine learning libraries (like tokenizers, which mergekit depends on) are not yet compatible with 3.14 because it hasn't been officially released as stable.

Here is the advice for the specific "suppress" fix you asked for, followed by the "correct" way to handle this on a Mac to avoid future headaches.

1. The Quick Fix (Suppressing the check)

To follow the suggestion in your error message and force the installation to proceed, run these two commands in your terminal:

export PYO3_USE_ABI3_FORWARD_COMPATIBILITY=1
pip install -e .

Note: This might allow the installation to finish, but because Python 3.14 is so new, you might run into "Segmentation Faults" or crashes later when actually trying to merge models.

2. The "Idiot-Proof" Stable Fix (Recommended)

In the AI/ML world, Python 3.10, 3.11, or 3.12 are the "sweet spots." Python 3.14 is too "bleeding edge" and will break many of the tools you need.

Since you are on a Mac, the best way to handle this without messing up your system is to create a Virtual Environment using a stable version of Python.

Check if you have a stable version installed:
Run python3.12 --version. If it returns a version number, use that. If not, try python3.11.

Create a clean environment (inside your mergekit folder):

# Create the environment (replace 3.12 with 3.11 if needed)
python3.12 -m venv venv

# Activate it
source venv/bin/activate

Your terminal prompt should now show (venv) at the beginning.

Install MergeKit inside that clean environment:
```
pip install --upgrade pip
pip install -e .
```

3. Mac-Specific Tips for your 50GB VRAM

Since you mentioned having 50GB of "VRAM" (likely a Mac Studio or a high-spec MacBook Pro with Unified Memory):

Unified Memory: On Mac, your "VRAM" is actually your System RAM. If you have 64GB of RAM, macOS usually limits a single process (like Python) to about 70-80% of that.
The --cuda flag: The command Naphula gave you includes --cuda. This will not work on Mac. Macs use Metal (MPS), not CUDA.
The Command to use on Mac:
Instead of --cuda, you should try running it without that flag, or check if your version of mergekit supports the device flag. Usually, for Mac, you just run:
```
mergekit-yaml config.yaml ./output_folder --copy-tokenizer --allow-crimes --out-shard-size 5B --trust-remote-code
```
Accelerated Merging: MergeKit performs most operations in a way that is very efficient on CPU/RAM. On a Mac with Unified Memory, the "CPU" merge is actually extremely fast because the data doesn't have to travel between a dedicated GPU and RAM.

Summary for McG-221: Try the export command first if you want to stay on Python 3.14, but if the program crashes when you start the merge, you must downgrade to Python 3.12 using the Virtual Environment (Step 2) to get it to work.

McG-221

Feb 25

Thanks for looking into this, thought as much... hope to be able to upgrade my hardware soon, until then it's "never touch a running system"... 😅

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment