Dataset & Captioning Guide

Clean folders, clear captions, and quick checks — better models with fewer headaches.

Why clean datasets matter

Training quality lives or dies on data hygiene. Consistent folders, sensible filenames, and clear captions help Kohya learn the right patterns quickly — with fewer steps and less VRAM.

Folder structure & naming

Keep one subject/style per subfolder. Avoid spaces/odd symbols in names.

/your-dataset/
  character_a/
    0001.jpg
    0002.jpg
    0003.jpg
    0001.txt        # optional caption sidecars
  character_b/
    0001.jpg
    0002.jpg
    ...
  • Filenames: stick to a-z, 0-9, -, _. Avoid spaces.
  • Images: keep them reasonably sharp; remove near-duplicates.
  • Resolution: aim for consistent long side (e.g., 640–1024) before training.
Good folder structure example
Good: clear folders, normalized names, optional .txt captions.
Bad folder structure example
Bad: mixed subjects, random names, weird characters.

Auto path-fix

If you moved or renamed folders, Training Center attempts an auto path-fix when you choose the dataset root.

Auto path-fix banner in dataset picker
Path-fix detects common layouts and proposes a sane root.

Captions: manual or auto

You can provide .txt sidecars or let the app auto-caption. Auto-caption requires A1111 running — start A1111 from the Launcher, then hit Auto Caption. Stop A1111 after captioning to free VRAM.

Manual captions alongside images
Manual: keep captions short, factual, and consistent.
Auto caption panel with A1111 requirement
Auto: start A1111 → run auto-caption → stop A1111 to save VRAM.

Caption tips

  • Style: concise nouns/adjectives, avoid story prose.
  • Content: describe pose/clothes/background if relevant.
  • Consistency: use similar wording across the same subject.

Caption Example:

sks_waifu, 1girl, bangs, bare legs, bare shoulders, barefoot, beach, blonde hair, blue dress, blue eyes, blue sky, boat, branch, breasts, cherry blossoms, cloud, cloudy sky, day, dress, falling petals, flower, gradient sky, horizon, knees up, lake, landscape, long hair, looking at viewer, mount fuji, mountain, mountainous horizon, ocean, outdoors, petals, petals on liquid, pink flower, planet, pond, river, rock, sand, scenery, seashell, shell, shore, sitting, sky, sleeveless, smile, solo, spring \(season\), starfish, sunrise, sunset, tree, water, watercraft, waves 

Quick quality checks

  1. Scan thumbnails — remove obvious blurs/dupes.
  2. Ensure per-folder consistency (one subject/style each).
  3. Verify captions exist (if you use them) and aren’t contradictory.
  4. Normalize long side (e.g., 768) to stabilize training.
Thumbnail grid QA view
A quick grid spot-check catches most issues early.

System checks (before training)

Open the Launcher → System panel to confirm VRAM headroom and disk space before you hit Start. If close to the limit, lower resolution/batch first.

Launcher System panel showing VRAM and disk usage
System: VRAM & disk usage at a glance — safer starts, fewer OOMs.

Starter sizes (good defaults)

  • Portrait LoRA: long side ~640–768
  • Full-body LoRA: taller canvas, ~832×1216 class
  • DreamBooth: match the family (SD 1.5 vs XL); keep sizes moderate first

Next up

Clean in → Good out.