Gyanateet Dutta
Ryukijano
AI & ML interests
Computer Vision, Robotics, Generative modelling, AI for Sciences.
Recent Activity
upvoted a paper 1 day ago
World Action Models: The Next Frontier in Embodied AI updated a Space 4 days ago
Ryukijano/CatCon-One-Shot-Controlnet-SD-1-5-b2 updated a model 4 days ago
Ryukijano/parameter-golf-modelsOrganizations
Learning
Vision_transformer_robotics
Midi-composer
Neural Rendering
This collection focuses on using neural networks for photorealistic rendering and image synthesis. It features models capable to text-to-image gen.
-
NeRF-Det: Learning Geometry-Aware Volumetric Representation for Multi-View 3D Object Detection
Paper • 2307.14620 • Published • 15 -
LU-NeRF: Scene and Pose Estimation by Synchronizing Local Unposed NeRFs
Paper • 2306.05410 • Published • 4 -
ashawkey/nerf2mesh
Updated • 14 - Build errorFeatured25
NeRF
🔮25
Own Work
LLMs
-
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Paper • 2312.11514 • Published • 264 -
3D-LFM: Lifting Foundation Model
Paper • 2312.11894 • Published • 15 -
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Paper • 2312.15166 • Published • 61 -
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
Paper • 2312.16862 • Published • 31
Audio
-
EVA-GAN: Enhanced Various Audio Generation via Scalable Generative Adversarial Networks
Paper • 2402.00892 • Published • 13 - Running on ZeroMCPFeatured294
MusicGen Streaming
🔥294Generate music from text descriptions in real-time
- Runtime errorAgents145
Whisper JAX
👀145Transcribe or translate audio from microphone, file, or YouTube
-
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning
Paper • 2406.03344 • Published • 22
Text_to_video diffusion
Text-3D
- Running on L4AgentsFeatured1.17k
Stable Fast 3D
🎮1.17kGenerate a 3D mesh from a single image
- Runtime errorAgentsFeatured184
Roblox 3D Assets Generator v1
🪄184Create a 3D model from an image in 10 seconds!
- Running on ZeroAgentsFeatured148
LLaMA Mesh
👀148Create 3D mesh by chatting.
-
stabilityai/stable-point-aware-3d
Image-to-3D • 2B • Updated • 1.62k • 346
Audio->3D
AI-4-Sciences
STEM
VILA
Diffusion models
Explore the capabilities of diffusion models for natural language processing. This collection features a diverse set of models trained using diffusion
-
PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models
Paper • 2309.05793 • Published • 51 -
3D Gaussian Splatting for Real-Time Radiance Field Rendering
Paper • 2308.04079 • Published • 202 -
stabilityai/stable-diffusion-xl-base-1.0
Text-to-Image • Updated • 2.03M • • 7.71k -
Ryukijano/lora-trained-xl-kaggle-p100
Text-to-Image • Updated • 4 • 1
Deep Reinforcement Learning
Features implementations and paces of popular RL algorithms and new paradigms on a variety of environments.
-
Ryukijano/rl_course_vizdoom_health_gathering_supreme
Reinforcement Learning • Updated -
Ryukijano/Mujoco_rl_halfcheetah_Decision_Trasformer
Reinforcement Learning • Updated • 7 -
Ryukijano/poca-SoccerTwos
Reinforcement Learning • Updated • 28 -
AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning
Paper • 2308.03526 • Published • 29
Deep learning
-
NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation
Paper • 2311.12229 • Published • 25 - Running on ZeroAgentsFeatured1.01k
IP-Adapter-FaceID
🧑1.01kGenerate AI images that blend your face with any prompt
-
Design2Code: How Far Are We From Automating Front-End Engineering?
Paper • 2403.03163 • Published • 98
Computer vision
-
Unsupervised Universal Image Segmentation
Paper • 2312.17243 • Published • 20 -
Denoising Vision Transformers
Paper • 2401.02957 • Published • 31 -
timm/ViT-B-16-SigLIP
Zero-Shot Image Classification • Updated • 72.7k • 37 - Running on ZeroAgents19
Slimsam
🌖19Small yet powerful mask generation application ⚡️
Multi modal foundational models
Vision_language_models
2D->3D
Segmentation
AI-For-Quantum Computing
AI-4-Sciences
Learning
STEM
Vision_transformer_robotics
VILA
Midi-composer
Diffusion models
Explore the capabilities of diffusion models for natural language processing. This collection features a diverse set of models trained using diffusion
-
PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models
Paper • 2309.05793 • Published • 51 -
3D Gaussian Splatting for Real-Time Radiance Field Rendering
Paper • 2308.04079 • Published • 202 -
stabilityai/stable-diffusion-xl-base-1.0
Text-to-Image • Updated • 2.03M • • 7.71k -
Ryukijano/lora-trained-xl-kaggle-p100
Text-to-Image • Updated • 4 • 1
Neural Rendering
This collection focuses on using neural networks for photorealistic rendering and image synthesis. It features models capable to text-to-image gen.
-
NeRF-Det: Learning Geometry-Aware Volumetric Representation for Multi-View 3D Object Detection
Paper • 2307.14620 • Published • 15 -
LU-NeRF: Scene and Pose Estimation by Synchronizing Local Unposed NeRFs
Paper • 2306.05410 • Published • 4 -
ashawkey/nerf2mesh
Updated • 14 - Build errorFeatured25
NeRF
🔮25
Deep Reinforcement Learning
Features implementations and paces of popular RL algorithms and new paradigms on a variety of environments.
-
Ryukijano/rl_course_vizdoom_health_gathering_supreme
Reinforcement Learning • Updated -
Ryukijano/Mujoco_rl_halfcheetah_Decision_Trasformer
Reinforcement Learning • Updated • 7 -
Ryukijano/poca-SoccerTwos
Reinforcement Learning • Updated • 28 -
AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning
Paper • 2308.03526 • Published • 29
Own Work
Deep learning
-
NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation
Paper • 2311.12229 • Published • 25 - Running on ZeroAgentsFeatured1.01k
IP-Adapter-FaceID
🧑1.01kGenerate AI images that blend your face with any prompt
-
Design2Code: How Far Are We From Automating Front-End Engineering?
Paper • 2403.03163 • Published • 98
LLMs
-
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Paper • 2312.11514 • Published • 264 -
3D-LFM: Lifting Foundation Model
Paper • 2312.11894 • Published • 15 -
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Paper • 2312.15166 • Published • 61 -
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
Paper • 2312.16862 • Published • 31
Computer vision
-
Unsupervised Universal Image Segmentation
Paper • 2312.17243 • Published • 20 -
Denoising Vision Transformers
Paper • 2401.02957 • Published • 31 -
timm/ViT-B-16-SigLIP
Zero-Shot Image Classification • Updated • 72.7k • 37 - Running on ZeroAgents19
Slimsam
🌖19Small yet powerful mask generation application ⚡️
Audio
-
EVA-GAN: Enhanced Various Audio Generation via Scalable Generative Adversarial Networks
Paper • 2402.00892 • Published • 13 - Running on ZeroMCPFeatured294
MusicGen Streaming
🔥294Generate music from text descriptions in real-time
- Runtime errorAgents145
Whisper JAX
👀145Transcribe or translate audio from microphone, file, or YouTube
-
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning
Paper • 2406.03344 • Published • 22
Multi modal foundational models
Text_to_video diffusion
Vision_language_models
Text-3D
- Running on L4AgentsFeatured1.17k
Stable Fast 3D
🎮1.17kGenerate a 3D mesh from a single image
- Runtime errorAgentsFeatured184
Roblox 3D Assets Generator v1
🪄184Create a 3D model from an image in 10 seconds!
- Running on ZeroAgentsFeatured148
LLaMA Mesh
👀148Create 3D mesh by chatting.
-
stabilityai/stable-point-aware-3d
Image-to-3D • 2B • Updated • 1.62k • 346
2D->3D
Audio->3D
Segmentation