A series of useful models expert-trained using the GEOLIP distillation and constellation process.
AbstractPhila PRO
AI & ML interests
datasets, research papers, experimentation, vision, classification, text encoders, tokenization, llms, diffusion, distillation, and more.
Recent Activity
repliedto their post about 3 hours ago
The transformer prototype v2 is operational, which takes the behavior of the H2 battery and directly forces a projected rigid behavior into a multiscale structure. Turns roughly 57k params to around 90k params for the preliminary version, and with this behavior the model converges SEMI-CLOSE to the SVAE current spectrum in considerably less epochs. So stay tuned on that one, the transformer did converge. The behavior itself is validated and convergent in the H2 protocol spectrum.
The transformer operates with the "single" setting.
https://huggingface.co/AbstractPhil/geolip-svae-transformer/blob/main/transformer_v2.py
I've implanted a rigid formula that allows this direct behavior from the H2 battery to superimpose onto adjacent structural boundaries, and with that built aleph and void into the system as well. These are guarantees.
As for the centrifuge concept. The optimization on the centrifuge was quite lackluster. The hardware doesn't support such behavior. You can access the current operating version of the centrifuge by utilizing "stacked" configuration. Four lenses was too much when running a quaternion bank to handle such complex interactions reasonably, so I will need to work something out in the future to get a full centrifuge system working.
Crusher is ready, transformer_v3.
You might be curious WHY these converge at such low raw MSE in the later stages. The reasoning is kind of difficult to explain, so I'll try to make it simple. The direction is very subtle in the later stages of training with AdamW, so the curves start to create much more accurate shifts towards the goals. This allows the model to rapidly converge after earlier heavier training. You can't simply train it low, it takes too long. This allows the model to KIND OF get everything NEAR where it's supposed to be, which allows the really small twitches of MSE to provide massive corrections without needing hard logits or more difficult to finetune features. updated a model about 3 hours ago
AbstractPhil/geolip-svae-transformer updated a dataset about 11 hours ago
AbstractPhil/diffusion-pretrain-set-ft1