Skip to the content.

Documentation

Complete technical guide for installing, configuring, and using LoRA Craft.


Table of Contents

  1. Prerequisites
  2. Installation
  3. User Guide
  4. Key Concepts
  5. Troubleshooting
  6. Technical Reference
  7. Glossary

Prerequisites

Hardware Recommendations

Software Requirements


Installation

LoRA Craft supports two installation methods: Docker (recommended for most users) and Native (for development or advanced users).

Docker vs Native Installation

Feature Docker Native
Setup Time 5-10 minutes 15-30 minutes
Dependency Management Automatic Manual
GPU Support Automatic detection Requires CUDA setup
Platform Support Windows (WSL2), Linux, macOS* Linux, Windows*
Updates Simple rebuild Manual package updates
Isolation Fully isolated System-wide install
Best For Production, Windows users Development, debugging

*macOS Docker runs without GPU; Windows native requires WSL2 for GPU support.

See DOCKER-QUICKSTART.md for platform-specific Docker setup guides.


Docker provides a pre-configured environment with all dependencies, CUDA runtime, and automatic GPU detection.

Prerequisites

Quick Setup

# Clone repository
git clone https://github.com/jwest33/lora_craft.git
cd lora_craft

# Optional: Configure environment
cp .env.example .env

# Start application (builds on first run)
docker compose up -d

# View logs
docker compose logs -f

# Access at http://localhost:5000

First startup takes 5-10 minutes to download base image (~5GB) and install PyTorch.

What’s Included

Docker Commands Reference

# Check status
docker compose ps

# Stop application
docker compose down

# Restart
docker compose restart

# View logs
docker compose logs -f

# Check GPU
docker compose exec lora-craft nvidia-smi

# Access shell
docker compose exec lora-craft bash

# Update to latest
git pull && docker compose build && docker compose up -d

# Clean rebuild
docker compose down
docker compose build --no-cache
docker compose up -d

Volume Management

Docker automatically mounts these directories:

Local Container Purpose
./outputs/ /app/outputs Model checkpoints
./exports/ /app/exports GGUF exports
./configs/ /app/configs Configurations
./uploads/ /app/uploads Dataset uploads
./logs/ /app/logs Application logs

Named volumes (in Docker):

To backup: Copy local directories above. Named volumes persist across container restarts.


Native Installation

Step 1: Clone the Repository

git clone https://github.com/jwest33/lora_craft.git
cd lora_craft

Step 2: Install PyTorch with CUDA Support

Install PyTorch with CUDA 12.8 support:

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128

For other CUDA versions, visit PyTorch’s installation page.

Step 3: Install Dependencies

pip install -r requirements.txt

This will install all required packages including:

Step 4: Verify Installation

Check that your GPU is accessible:

python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"

You should see CUDA available: True.

Step 5: Start Application

python server.py

Access the web interface at http://localhost:5000.


User Guide

Step 1: Model Selection

Model Selection

The Model Configuration page allows you to select the base model for fine-tuning.

Quick Setup Options

Model Family

Choose from several model families:

Model Size Selection

Select a model size based on your available VRAM:

LoRA Configuration (Custom/Advanced)

Step 2: Dataset Configuration

Dataset Selection

Configure the training data for your model.

Dataset Source Options

  1. Public Datasets: Browse curated datasets from HuggingFace
    • Filter by category: Math, Coding, General, Q&A
    • View dataset size and sample count
    • Preview dataset samples before training
  2. Custom HF Dataset: Enter any HuggingFace dataset path
    • Format: username/dataset-name
    • Specify split (train, test, validation)
  3. Upload File: Use your own data
    • Supported formats: JSON, JSONL, CSV, Parquet
    • Maximum size: 10GB

Field Mapping

Map your dataset columns to expected fields:

The system auto-detects common field names (question, answer, prompt, completion, etc.).

System Prompt Configuration

System Prompt

Define the instruction format for your model:

Step 3: Training Configuration

Configure hyperparameters for the training process.

Essential Parameters

Training Duration

Batch Settings

Learning Rate

Optimization

GRPO-Specific Parameters

Generation Parameters

Pre-training Phase

Optional supervised fine-tuning phase before GRPO:

Pre-training helps the model learn output formatting before reinforcement learning.

Step 4: Reward Functions

Reward Catalog

Reward functions evaluate model outputs and guide training. Choose functions that match your task.

Reward Function Categories

Algorithm Implementation

Chain of Thought

Citation Format

Code Generation

Concise Summarization

Creative Writing

Math & Science

Programming

Reasoning

Question Answering

Configuring Reward Functions

Reward Function Mapping

  1. Select Algorithm Type: GRPO (standard), GSPO (sequence-level), or OR-GRPO (robust variant)

  2. Choose Reward Source:
    • Quick Start: Auto-configured based on dataset
    • Preset Library: Browse categorized reward functions
    • Custom Builder: Create custom reward logic (advanced)
  3. Map Dataset Fields:
    • Instruction: Field containing the input prompt
    • Response: Field containing the expected output
    • Additional fields may be required depending on the reward function
  4. Test Reward: Verify reward function works with sample data before training

Step 5: Training & Monitoring

Training Metrics

Once training starts, monitor progress through real-time metrics.

Training Metrics Dashboard

Top Metrics Bar

Reward Metrics Chart

Training Loss Chart

KL Divergence Chart

Completion Length Statistics

Policy Clip Ratios

Learning Rate Schedule

Training Controls

Training Sessions

The left sidebar shows all training sessions:

Step 6: Model Export

After training completes, export your model for deployment.

Export Formats

HuggingFace Format

GGUF Format

Quantization Options

Quantization reduces model size for deployment:

Using Exported Models

With llama.cpp

./main -m exports/<session_id>/model-q4_k_m.gguf -p "Your prompt here"

With Ollama

ollama create my-model -f exports/<session_id>/Modelfile
ollama run my-model

With LM Studio

Step 7: Testing Models

Testing Model

Test your fine-tuned model with custom prompts.

Interactive Testing

  1. Select Model: Choose from trained models or active training sessions
  2. Enter Prompt: Type or paste your test question
  3. Configure Generation:
    • Temperature: Control randomness (0.1 = deterministic, 1.0 = creative)
    • Max Tokens: Maximum response length
    • Top-P: Nucleus sampling threshold
  4. Generate: Click “Test Model” to generate response

Batch Testing

Test multiple prompts at once:

  1. Upload a file with test prompts (one per line)
  2. Configure generation parameters
  3. Run batch test
  4. Export results to JSON or CSV

Evaluation with Reward Functions

Evaluate model outputs using the same reward functions from training:

  1. Select reward function
  2. Enter prompt and expected response
  3. Generate model output
  4. View reward score and feedback

This helps quantify model improvement on your specific task.


Key Concepts

What is GRPO (Group Relative Policy Optimization)?

GRPO is a reinforcement learning algorithm for training language models. Unlike supervised learning (which simply teaches the model to imitate examples), GRPO teaches the model to maximize rewards.

How GRPO Works:

  1. Model generates multiple responses for each prompt
  2. Reward function scores each response
  3. Model learns to increase probability of high-reward responses
  4. Model learns to decrease probability of low-reward responses

Benefits:

GRPO vs Other Algorithms:

What are LoRA Adapters?

LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning method.

Key Concepts:

Benefits:

LoRA Parameters:

Understanding Reward Functions

Reward functions are Python functions that evaluate model outputs and return scores.

Components of a Reward Function:

  1. Input: Model’s generated response + reference data
  2. Evaluation Logic: Checks correctness, format, quality
  3. Output: Numerical score (typically 0.0 to 1.0)

Example: Math Reward Function

def math_reward(response, expected_answer):
    # Extract answer from response
    model_answer = extract_solution(response)

    # Check correctness
    if model_answer == expected_answer:
        return 1.0  # Correct
    else:
        return 0.0  # Incorrect

Types of Reward Functions:

Best Practices:

Understanding System Prompts

System prompts define the instruction format and expected output structure.

Components:

Example System Prompt (GRPO Default):

You are given a problem.
Think about the problem and provide your working out.
Place it between <start_working_out> and <end_working_out>.
Then, provide your solution between <SOLUTION></SOLUTION>

Why Use Structured Outputs?


Troubleshooting

Docker-Specific Issues

GPU Not Detected in Container

Symptom: Container logs show “CUDA Available: False” or “GPU Count: 0”

Solutions:

  1. Verify GPU works with Docker:
    docker run --rm --gpus all nvidia/cuda:12.8.0-base-ubuntu22.04 nvidia-smi
    

    If this fails, your Docker GPU setup needs configuration.

  2. Check docker-compose.yml has correct GPU configuration: ```yaml runtime: nvidia environment:
    • NVIDIA_VISIBLE_DEVICES=all
    • NVIDIA_DRIVER_CAPABILITIES=compute,utility ```
  3. For Docker Desktop (Windows/macOS):
    • Restart Docker Desktop
    • Settings → Resources → WSL Integration (ensure enabled)
    • Verify NVIDIA driver installed on Windows host
  4. For Linux:
    • Ensure NVIDIA Container Toolkit installed
    • Run: sudo nvidia-ctk runtime configure --runtime=docker
    • Restart Docker: sudo systemctl restart docker
  5. Rebuild container:
    docker compose down
    docker compose up -d
    

Container Won’t Start - Entrypoint Error

Symptom: “exec /app/src/entrypoint.sh: no such file or directory”

Cause: Line ending issues when building on Windows

Solution:

# Rebuild without cache
docker compose build --no-cache
docker compose up -d

The Dockerfile automatically fixes line endings, so rebuilding should resolve this.

Port 5000 Already in Use

Symptom: “Error starting userland proxy: listen tcp4 0.0.0.0:5000: bind: address already in use”

Solutions:

  1. Change port in docker-compose.yml: ```yaml ports:
    • “5001:5000” # Use port 5001 on host ```
  2. Or set in .env file:
    echo "PORT=5001" >> .env
    
  3. Or stop conflicting service:
    # Find process using port 5000
    # Linux:
    sudo lsof -i :5000
    # Windows:
    netstat -ano | findstr :5000
    

GPU Memory Issues

Problem: “CUDA out of memory” error during training

Solutions:

  1. Reduce batch size to 1
  2. Increase gradient accumulation steps (maintains effective batch size)
  3. Reduce max sequence length (e.g., 2048 → 1024)
  4. Use smaller model (e.g., 1.7B instead of 4B)
  5. Enable gradient checkpointing (trades compute for memory)
  6. Use 8-bit or 4-bit quantization (reduces memory usage)

Training Not Starting

Problem: Training session created but doesn’t start

Solutions:

  1. Check logs folder for error messages (logs/)
  2. Verify dataset downloaded successfully (check cache/ folder)
  3. Ensure reward function is properly configured
  4. Check that all required fields are mapped
  5. Restart the Flask server and try again

Dataset Loading Errors

Problem: “Failed to load dataset” error

Solutions:

  1. Verify dataset name is correct (case-sensitive)
  2. Check internet connection for HuggingFace downloads
  3. For uploaded files, verify format:
    • JSON: Must be list of objects or object with data field
    • JSONL: One JSON object per line
    • CSV: Must have column headers
    • Parquet: Standard Apache Parquet format
  4. Ensure instruction and response fields exist in dataset

Slow Training Speed

Problem: Training is slower than expected

Solutions:

  1. Verify GPU is being used: Check system monitoring (top bar should show GPU usage)
  2. Reduce gradient accumulation steps (increases update frequency)
  3. Enable flash attention if using supported model (Llama, Mistral)
  4. Disable gradient checkpointing if memory allows
  5. Use larger batch size if VRAM permits
  6. Check that CUDA and PyTorch are properly installed

Model Generation Quality Issues

Problem: Model outputs are nonsensical or low quality

Solutions:

  1. Check reward signal: Ensure rewards are varying (not all 0.0 or 1.0)
  2. Increase pre-training epochs: Model needs to learn format first
  3. Adjust KL penalty: Lower values allow more deviation from base model
  4. Verify dataset quality: Check that training data is clean and relevant
  5. Increase training epochs: Model may need more training time
  6. Check system prompt: Ensure it clearly describes expected output format
  7. Test with different temperatures: Lower temperature (0.3-0.5) for more deterministic outputs

WebSocket Connection Issues

Problem: Real-time metrics not updating

Solutions:

  1. Refresh browser page
  2. Check browser console for WebSocket errors (F12)
  3. Verify Flask server is running
  4. Check firewall settings (port 5000 must be accessible)
  5. Try a different browser (Chrome/Firefox recommended)

Export Failures

Problem: GGUF export fails or produces invalid files

Solutions:

  1. Ensure training completed successfully
  2. Check that model checkpoint exists (outputs/<session_id>/)
  3. Verify sufficient disk space for export
  4. Check logs for llama.cpp converter errors
  5. Try exporting with different quantization level

Technical Reference

API Endpoints

The Flask server provides RESTful API endpoints for programmatic access.

Training Endpoints

Start Training

POST /api/training/start
Content-Type: application/json

{
  "session_id": "unique-id",
  "config": { ... training configuration ... }
}

Stop Training

POST /api/training/stop
Content-Type: application/json

{
  "session_id": "session-id-to-stop"
}

List Training Sessions

GET /api/training/sessions

Dataset Endpoints

List Datasets

GET /api/datasets/list

Upload Dataset

POST /api/datasets/upload
Content-Type: multipart/form-data

[email protected]

Preview Dataset

POST /api/datasets/preview
Content-Type: application/json

{
  "path": "tatsu-lab/alpaca",
  "samples": 5
}

Model Endpoints

Test Model

POST /api/models/test
Content-Type: application/json

{
  "model_path": "outputs/session-id/",
  "prompt": "What is 2+2?",
  "temperature": 0.7,
  "max_tokens": 256
}

List Trained Models

GET /api/models/list

Export Model

POST /api/exports/create
Content-Type: application/json

{
  "session_id": "session-id",
  "format": "gguf",
  "quantization": "q4_k_m"
}

Configuration Endpoints

Save Configuration

POST /api/configs/save
Content-Type: application/json

{
  "name": "my-config",
  "config": { ... configuration object ... }
}

Load Configuration

GET /api/configs/load?name=my-config

List Configurations

GET /api/configs/list

WebSocket Events

Real-time updates are delivered via Socket.IO.

Connect to Socket

const socket = io('http://localhost:5000');

Subscribe to Training Updates

socket.on('training_update', (data) => {
  console.log('Step:', data.step);
  console.log('Loss:', data.loss);
  console.log('Reward:', data.reward);
});

Subscribe to System Updates

socket.on('system_update', (data) => {
  console.log('GPU Memory:', data.gpu_memory);
  console.log('GPU Utilization:', data.gpu_utilization);
});

Configuration File Format

Saved configurations are stored as JSON in the configs/ directory.

Example Configuration:

{
  "name": "math-reasoning-config",
  "model": {
    "name": "unsloth/Qwen3-1.7B",
    "lora_rank": 16,
    "lora_alpha": 32,
    "lora_dropout": 0.0
  },
  "dataset": {
    "source": "openai/gsm8k",
    "split": "train",
    "instruction_field": "question",
    "response_field": "answer",
    "max_samples": null
  },
  "training": {
    "num_epochs": 3,
    "batch_size": 1,
    "gradient_accumulation_steps": 8,
    "learning_rate": 0.0002,
    "warmup_steps": 10,
    "weight_decay": 0.001,
    "max_grad_norm": 0.3,
    "lr_scheduler_type": "constant",
    "optim": "paged_adamw_32bit",
    "max_sequence_length": 2048,
    "max_new_tokens": 512,
    "temperature": 0.7
  },
  "grpo": {
    "kl_penalty": 0.05,
    "clip_range": 0.2,
    "importance_sampling_level": "token"
  },
  "reward": {
    "type": "preset",
    "preset_name": "math"
  },
  "pre_training": {
    "enabled": true,
    "epochs": 2,
    "max_samples": 100,
    "learning_rate": 0.00005
  }
}

Supported Dataset Formats

JSON Format

[
  {
    "instruction": "What is the capital of France?",
    "response": "The capital of France is Paris."
  },
  {
    "instruction": "Solve 2+2",
    "response": "2+2 = 4"
  }
]

JSONL Format

{"instruction": "What is the capital of France?", "response": "The capital of France is Paris."}
{"instruction": "Solve 2+2", "response": "2+2 = 4"}

CSV Format

instruction,response
"What is the capital of France?","The capital of France is Paris."
"Solve 2+2","2+2 = 4"

Parquet Format

Directory Structure

lora_craft/
├── cache/              # Cached datasets from HuggingFace
├── configs/            # Saved training configurations
├── core/               # Core training logic
├── docs/               # Documentation and example images
├── exports/            # Exported models (GGUF, etc.)
├── logs/               # Application and training logs
├── outputs/            # Training outputs (model checkpoints)
├── routes/             # Flask API routes
├── services/           # Business logic services
├── static/             # Static web assets (CSS, JS, images)
├── templates/          # HTML templates
├── uploads/            # Uploaded dataset files
├── utils/              # Utility functions
├── websockets/         # WebSocket handlers
├── server.py        # Application entry point
├── app_factory.py      # Flask application factory
├── constants.py        # Application constants
└── requirements.txt    # Python dependencies

Glossary

Adapter: Small trainable module added to a frozen base model (see LoRA)

Base Model: Pre-trained language model before fine-tuning

Batch Size: Number of samples processed simultaneously during training

CUDA: NVIDIA’s parallel computing platform for GPU acceleration

Epoch: One complete pass through the entire training dataset

Fine-tuning: Training a pre-trained model on new data for a specific task

GGUF: File format for quantized models (used by llama.cpp ecosystem)

Gradient Accumulation: Technique to simulate larger batch sizes with limited memory

Gradient Clipping: Technique to prevent exploding gradients by limiting their magnitude

GRPO: Group Relative Policy Optimization (reinforcement learning algorithm)

KL Divergence: Measure of how much the fine-tuned model differs from the base model

Learning Rate: Step size for model parameter updates

LoRA: Low-Rank Adaptation (parameter-efficient fine-tuning method)

Quantization: Reducing model precision (e.g., from 16-bit to 4-bit) to save memory

Reinforcement Learning: Training paradigm where model learns from reward signals

Reward Function: Function that evaluates model outputs and assigns scores

System Prompt: Instructions that define expected model behavior and output format

Token: Smallest unit of text processed by language models (roughly 3/4 of a word)

VRAM: Video RAM (GPU memory)

Warmup: Gradual increase of learning rate at training start


Additional Resources

Documentation

Model Sources

Dataset Sources

Deployment Tools

Community & Support


License: MIT

Acknowledgments: Built with Unsloth, HuggingFace Transformers, and Flask.