Quick Start Guide
Get from zero to your first fine-tuned model in minutes.
Before You Begin
Hardware Requirements
- NVIDIA GPU with 8GB+ VRAM (for GPU acceleration)
- 32GB+ System RAM
- 64GB+ free disk space
Software Requirements
- Docker Installation: Docker Desktop (Windows/macOS) or Docker + NVIDIA Container Toolkit (Linux)
- Native Installation: Python 3.11+ and CUDA 12.8+
Not sure if your system is ready? Check the Prerequisites.
Choose Your Installation Method
Docker (Recommended)
Best for: Quick setup, Windows users, isolated environments
- Zero dependency management
- Works on Windows (WSL2), Linux, macOS
- Automatic GPU detection
- 5-minute setup
Native Installation
Best for: Direct system access, development, maximum control
- Faster startup times
- Full system integration
- Easier debugging
- No container overhead
Docker Installation
Prerequisites
- Docker 20.10+ and Docker Compose 2.0+
- NVIDIA Driver 535+ installed on host
- For Windows: WSL2 enabled with Docker Desktop
- For Linux: NVIDIA Container Toolkit installed
Linux NVIDIA Container Toolkit Setup:
# Ubuntu/Debian
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
Windows setup: Docker Desktop with WSL2 includes GPU support automatically—just install the NVIDIA driver on Windows.
Installation Steps
# 1. Clone repository
git clone https://github.com/jwest33/lora_craft.git
cd lora_craft
# 2. Start application (builds image on first run)
docker compose up -d
# 3. View logs to verify startup
docker compose logs -f
# Wait for "Starting LoRA Craft Flask Application" message
# Press Ctrl+C to exit logs
First startup takes 5-15 minutes to download base image and install dependencies. Subsequent starts are much quicker.
Verify Installation
# Check container is running
docker compose ps
# Verify GPU is detected
docker compose logs | grep "CUDA Available"
# Should show: CUDA Available: True
# Open browser to http://localhost:5000
Docker Management
# Stop application
docker compose down
# Restart application
docker compose restart
# View live logs
docker compose logs -f
# Access container shell
docker compose exec lora-craft bash
# Check GPU inside container
docker compose exec lora-craft nvidia-smi
# Update to latest version
git pull
docker compose build
docker compose up -d
Skip to Training Your First Model
Native Installation
1. Clone the Repository
git clone https://github.com/jwest33/lora_craft.git
cd lora_craft
2. Install PyTorch with CUDA
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128
3. Install Dependencies
pip install -r requirements.txt
4. Verify GPU Access
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"
You should see CUDA available: True
.
Starting the Application
For Docker users: Your application is already running! Skip to Training Your First Model.
For native installation:
1. Launch the Server
python server.py
2. Open the Interface
Navigate to http://localhost:5000
in your web browser.
You should see the LoRA Craft interface with tabs for Model, Dataset, Config, Reward, and Training.
Training Your First Model
Follow this 7-step workflow to train a math reasoning model.
Step 1: Select Your Model
- Click the Model tab
- Choose Recommended preset
- Select Qwen3 model family
- Choose Qwen/Qwen2.5-1.5B-Instruct
- Click Load Model
Why Qwen3 1.5B?
- Small enough for most GPUs
- Fast training (minutes, not hours)
- Strong baseline performance
Step 2: Choose a Dataset
- Click the Dataset tab
- Select Public Datasets
- Filter by Math category
- Choose GSM8K (8,500 grade school math problems)
- Click Load Dataset
- Preview samples to verify data format
What is GSM8K?
- Grade school math word problems
- Requires multi-step reasoning
- Perfect for testing GRPO training
Step 3: Configure Training
- Click the Config tab
- Use these beginner-friendly settings:
Training Duration:
- Epochs:
1
- Samples per epoch:
500
(subset for quick test)
Batch Settings:
- Batch size:
1
- Gradient accumulation:
4
Learning Rate:
- Learning rate:
0.0002
- Warmup steps:
10
- Scheduler:
constant
Generation:
- Max sequence length:
2048
- Max new tokens:
512
- Temperature:
0.7
Pre-training:
- Enabled:
Yes
- Epochs:
1
- Max samples:
100
Step 4: Select Reward Function
- Click the Reward tab
- Choose Preset Library
- Select Math & Science category
- Pick Math Problem Solver reward
- Verify field mappings:
- Instruction →
question
- Response →
answer
- Instruction →
- Click Test Reward with a sample to verify
How Rewards Work: The reward function checks if the model’s answer matches the expected solution, rewarding correct answers with 1.0 and incorrect with 0.0.
Step 5: Start Training
- Click the Training tab
- Review your configuration summary
- Click Start Training
- Watch the real-time metrics appear
What to Watch:
- Mean Reward: Should increase over time (target: 0.5+)
- Training Loss: Should decrease
- KL Divergence: Should stay relatively stable (< 0.1)
Training 500 samples on a 1.5B model takes approximately 10-15 minutes on a modern GPU.
Step 6: Export Your Model
Once training completes:
- Navigate to the Export section
- Choose format:
- HuggingFace: For Python/API use
- GGUF (Q4_K_M): For llama.cpp/Ollama/LM Studio
- Click Export Model
- Wait for conversion (1-2 minutes)
Your model is saved in exports/<session_id>/
Step 7: Test Your Model
- Click the Test tab
- Select your newly trained model
- Enter a test problem:
Sarah has 5 apples. She buys 3 more apples. Then she gives 2 apples to her friend. How many apples does Sarah have now?
- Click Generate
- Compare the output to the base model
Expected Improvement: Your fine-tuned model should show structured reasoning and correct answers more consistently than the base model.
Quick Workflow Summary
1. Select Model → Qwen3 1.5B
2. Load Dataset → GSM8K (Math)
3. Configure Training → 1 epoch, 500 samples
4. Choose Reward → Math & Science
5. Start Training → ~15 minutes
6. Export Model → GGUF format
7. Test Output → Verify improvement
Common First-Time Issues
Docker: GPU Not Detected
Symptom: Container logs show “CUDA Available: False”
Solutions:
# 1. Test GPU access works
docker run --rm --gpus all nvidia/cuda:12.8.0-base-ubuntu22.04 nvidia-smi
# 2. If test fails on Linux, install/configure NVIDIA Container Toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
# 3. If test fails on Windows, restart Docker Desktop
# Docker Desktop → Restart
# 4. Rebuild and restart container
docker compose down
docker compose up -d
Docker: Container Won’t Start
Symptom: “exec /app/src/entrypoint.sh: no such file or directory”
Solution:
# Rebuild image without cache
docker compose build --no-cache
docker compose up -d
Training is too slow
- Reduce samples per epoch to 200
- Check GPU is being used (metrics should show GPU memory usage)
- Reduce max sequence length to 1024
Out of memory errors
- Reduce batch size to 1
- Increase gradient accumulation to 8
- Use smaller model (Qwen3 0.6B)
Rewards stay at 0.0
- Check field mappings match your dataset
- Verify reward function with test button
- Try a different reward function from presets
Model outputs are gibberish
- Enable pre-training (helps model learn format)
- Increase pre-training epochs to 2
- Check system prompt matches expected output format
Next Steps
Try Different Tasks
Code Generation
- Dataset: Code Alpaca
- Reward: Code Generation
- Model: Qwen3 1.5B or Phi-2
Question Answering
- Dataset: SQuAD v2
- Reward: Question Answering
- Model: Llama 3.2 3B
Creative Writing
- Dataset: Alpaca
- Reward: Creative Writing
- Model: Mistral 7B
Scale Up
Once comfortable with the basics:
- Train on full datasets (remove sample limits)
- Increase epochs to 3-5 for better results
- Try larger models (3B-7B parameters)
- Experiment with custom reward functions
Deploy Your Models
Use with Ollama:
ollama create math-tutor -f exports/<session_id>/Modelfile
ollama run math-tutor "Solve: 15 × 12 ="
Use with llama.cpp:
./main -m exports/<session_id>/model-q4_k_m.gguf \
-p "Calculate the area of a circle with radius 7"
Integrate via API: Load your HuggingFace format model in any Python application with the Transformers library.
Learn More
Need Help?
- Documentation: Full technical guide
- GitHub Issues: Report bugs or request features
- Discussions: Ask questions and share tips
Happy fine-tuning!