Adjusted with https://github.com/p-e-w/heretic using a custom dataset aiming for increased creative and personal expression, building on the already released datasets.
This one was a bit more sensitive and personal in what kind of adversarial prompts she sought out to generate "refusals" from, so not releasing it.
The two vectors went from 76/97 to 61/97 and 67/97 on the test set. I learned my lesson and picked relatively small KL divergences this time (0.02 and 0.03).
Used task arithmetic merge to attempt to simply add them together; the interventions were quite separate in the attention layers, so those likely stacked; but overlapped in mlp layers, so that part's less clear.
This is a merge of pre-trained language models created using mergekit.
Merge Details
Merge Method
This model was merged using the Task Arithmetic merge method using Lambent/Mira-v1.17-Karcher-27B as a base.
Models Merged
The following models were included in the merge:
Configuration
The following YAML configuration was used to produce this model:
models:
- model: Lambent/Mira-v1.17-Karcher-27B-heretic-0.02
parameters:
weight: 1
- model: Lambent/Mira-v1.17-Karcher-27B-heretic-0.03
parameters:
weight: 1
merge_method: task_arithmetic
base_model: Lambent/Mira-v1.17-Karcher-27B
tokenizer_source: Lambent/Mira-v1.17-Karcher-27B
parameters:
lambda: 1.0
normalize: true
int8_mask: true
dtype: bfloat16
Abliteration parameters
| Parameter | Value |
|---|---|
| direction_index | 25.18 |
| attn.o_proj.max_weight | 1.04 |
| attn.o_proj.max_weight_position | 57.71 |
| attn.o_proj.min_weight | 0.95 |
| attn.o_proj.min_weight_distance | 33.58 |
| mlp.down_proj.max_weight | 0.87 |
| mlp.down_proj.max_weight_position | 47.71 |
| mlp.down_proj.min_weight | 0.00 |
| mlp.down_proj.min_weight_distance | 7.89 |
| Parameter | Value |
|---|---|
| direction_index | 27.34 |
| attn.o_proj.max_weight | 1.10 |
| attn.o_proj.max_weight_position | 37.23 |
| attn.o_proj.min_weight | 0.42 |
| attn.o_proj.min_weight_distance | 10.78 |
| mlp.down_proj.max_weight | 0.80 |
| mlp.down_proj.max_weight_position | 50.48 |
| mlp.down_proj.min_weight | 0.72 |
| mlp.down_proj.min_weight_distance | 3.09 |
- Downloads last month
- 11
