Spaces:

jbilcke-hf
/

Hunyuan-GameCraft

Paused

App Files Files Community

Hunyuan-GameCraft / CLAUDE.md

Julian Bilcke

Initial commit with LFS-tracked binary files

01c0e76 4 months ago

preview code

raw

history blame contribute delete

5.23 kB

	# CLAUDE.md

	This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

	## Project Overview

	Hunyuan-GameCraft is a high-dynamic interactive game video generation system that creates gameplay videos with controllable camera movements and actions. The system uses diffusion models and action-controlled generation to synthesize realistic game footage from reference images and keyboard/mouse input controls.

	## Key Commands

	### Installation
	```bash
	# Create and activate conda environment
	conda create -n HYGameCraft python==3.10
	conda activate HYGameCraft

	# Install PyTorch and dependencies
	conda install pytorch==2.5.1 torchvision==0.20.0 torchaudio==2.5.1 pytorch-cuda=12.4 -c pytorch -c nvidia

	# Install requirements
	python -m pip install -r requirements.txt

	# Install flash attention (optional, for acceleration)
	python -m pip install ninja
	python -m pip install git+https://github.com/Dao-AILab/[email protected]
	```

	### Download Models
	```bash
	cd weights
	huggingface-cli download tencent/Hunyuan-GameCraft-1.0 --local-dir ./
	```

	### Run Inference

	Multi-GPU (8 GPUs) - Standard Model:
	```bash
	torchrun --nnodes=1 --nproc_per_node=8 --master_port 29605 hymm_sp/sample_batch.py \
	--image-path "asset/village.png" \
	--prompt "YOUR_PROMPT" \
	--ckpt weights/gamecraft_models/mp_rank_00_model_states.pt \
	--video-size 704 1216 \
	--cfg-scale 2.0 \
	--image-start \
	--action-list w s d a \
	--action-speed-list 0.2 0.2 0.2 0.2 \
	--seed 250160 \
	--infer-steps 50 \
	--save-path './results/'
	```

	Single GPU with Low VRAM (24GB minimum):
	```bash
	export DISABLE_SP=1
	export CPU_OFFLOAD=1
	torchrun --nnodes=1 --nproc_per_node=1 --master_port 29605 hymm_sp/sample_batch.py \
	--ckpt weights/gamecraft_models/mp_rank_00_model_states.pt \
	--cpu-offload \
	--use-fp8 \
	[other parameters...]
	```

	Distilled Model (faster, 8 inference steps):
	```bash
	torchrun --nnodes=1 --nproc_per_node=8 --master_port 29605 hymm_sp/sample_batch.py \
	--ckpt weights/gamecraft_models/mp_rank_00_model_states_distill.pt \
	--cfg-scale 1.0 \
	--infer-steps 8 \
	--use-fp8 \
	[other parameters...]
	```

	## Architecture Overview

	### Core Components

	1. Main Entry Points
	- `hymm_sp/sample_batch.py`: Main script for batch video generation with distributed processing
	- `hymm_sp/sample_inference.py`: Core inference logic and model sampling
	- `hymm_sp/config.py`: Configuration parsing and argument handling

	2. Model Architecture (`hymm_sp/modules/`)
	- `models.py`: Core diffusion model implementation
	- `cameranet.py`: Camera control and action encoding for game interactions
	- `token_refiner.py`: Text token refinement for prompt conditioning
	- `parallel_states.py`: Distributed training/inference state management
	- `fp8_optimization.py`: FP8 quantization for memory/speed optimization

	3. VAE Module (`hymm_sp/vae/`)
	- `autoencoder_kl_causal_3d.py`: 3D causal VAE for video encoding/decoding
	- Handles latent space conversion for video frames

	4. Diffusion Pipeline (`hymm_sp/diffusion/`)
	- `pipeline_hunyuan_video_game.py`: Custom pipeline for game video generation
	- `scheduling_flow_match_discrete.py`: Flow matching scheduler for denoising

	5. Data Processing (`hymm_sp/data_kits/`)
	- `video_dataset.py`: Dataset handling for video inputs
	- `data_tools.py`: Video saving and processing utilities

	### Key Features

	- Action Control: Maps keyboard inputs (w/a/s/d) to continuous camera space for smooth transitions
	- Hybrid History Conditioning: Extends video sequences autoregressively while preserving scene context
	- Model Distillation: Accelerated inference model (8 steps vs 50 steps)
	- Memory Optimization: FP8 quantization, CPU offloading, and SageAttention support
	- Distributed Processing: Multi-GPU support with sequence parallelism

	### Important Parameters

	- `--action-list`: Sequence of keyboard actions (w/a/s/d)
	- `--action-speed-list`: Movement speed for each action (0.0-3.0)
	- `--video-size`: Output resolution (height width)
	- `--cfg-scale`: Classifier-free guidance scale (1.0 for distilled, 2.0 for standard)
	- `--infer-steps`: Denoising steps (8 for distilled, 50 for standard)
	- `--use-fp8`: Enable FP8 optimization for memory reduction
	- `--cpu-offload`: Offload model to CPU for low VRAM scenarios

	### Model Weights Structure
	```
	weights/
	├── gamecraft_models/
	│ ├── mp_rank_00_model_states.pt # Standard model
	│ └── mp_rank_00_model_states_distill.pt # Distilled model
	└── stdmodels/
	├── vae_3d/ # 3D VAE model
	├── llava-llama-3-8b-v1_1-transformers/ # Text encoder
	└── openai_clip-vit-large-patch14/ # CLIP encoder
	```

	## Development Notes

	- Environment variable `MODEL_BASE` should point to `weights/stdmodels`
	- Use `export DISABLE_SP=1` and `export CPU_OFFLOAD=1` for single GPU inference
	- Minimum GPU memory: 24GB (very slow), Recommended: 80GB per GPU
	- Action length determines video duration (1 action = 33 frames at 25 FPS)
	- SageAttention can be installed for additional acceleration