-
-
-
-
-
-
Inference Providers
Active filters: rlhf
8B • Updated
• 53
• 2
TheBloke/CapybaraHermes-2.5-Mistral-7B-GGUF
7B • Updated
• 5.46k
• 125
VijayShinde1996/vrs-sft-Qwen2_5-7B-it
Text Generation
• 8B • Updated
• 37
• 1
sileod/deberta-v3-base-tasksource-nli
Zero-Shot Classification
• 0.2B • Updated
• 8.85k
• • 133
stanfordnlp/SteamSHP-flan-t5-xl
Updated
• 6
• 43
stanfordnlp/SteamSHP-flan-t5-large
Updated
• 17
• 33
sileod/deberta-v3-large-tasksource-nli
Zero-Shot Classification
• 0.4B • Updated
• 4.28k
• • 37
sileod/deberta-v3-large-tasksource-rlhf-reward-model
Text Classification
• Updated
• 322
• 11
trl-lib/llama-7b-se-rl-peft
Updated
• 103
trl-lib/llama-7b-se-rm-peft
toloka/gpt2-large-rl-prompt-writing
Text Generation
• 0.8B • Updated
• 16
• 3
AdamG012/chat-opt-1.3b-rlhf-actor-deepspeed
Text Generation
• Updated
• 7
• 5
AdamG012/chat-opt-1.3b-rlhf-critic-deepspeed
Text Generation
• Updated
• 9
• 3
AdamG012/chat-opt-1.3b-rlhf-actor-ema-deepspeed
Text Generation
• Updated
• 4
• 8
sileod/mdeberta-v3-base-tasksource-nli
Zero-Shot Classification
• 0.3B • Updated
• 20
• 18
Text Generation
• Updated
• 9
• 5
Text Generation
• Updated
• 4
• 3
Text Generation
• Updated
• 5
• 6
argilla/roberta-base-reward-model-falcon-dolly
Text Classification
• Updated
• 7
• 4
Text Generation
• Updated
PKU-Alignment/beaver-7b-v1.0
Reinforcement Learning
• 7B • Updated
• 16
• 13
lyogavin/Anima33B-DPO-Belle-1k
Text Generation
• Updated
• 1
lyogavin/Anima33B-DPO-Belle-1k-merged
Text Generation
• Updated
• 11
• 12
PKU-Alignment/beaver-7b-v1.0-reward
Reinforcement Learning
• 7B • Updated
• 5.85k
• 17
PKU-Alignment/beaver-dam-7b
Updated
• 2.94k
• 17
PKU-Alignment/beaver-7b-v1.0-cost
Reinforcement Learning
• 7B • Updated
• 6.06k
• 10
Ablustrund/moss-rlhf-reward-model-7B-zh
Updated
• 3
• 23
OpenMOSS-Team/moss-rlhf-reward-model-7B-en
OpenMOSS-Team/moss-rlhf-sft-model-7B-en