Kohya SS NSFW LoRA Training: Full Setup Guide (2026)

14 min read

To train an NSFW LoRA in Kohya SS, install the kohya_ss GUI, organize your captioned images into an img/N_triggerword class folder where N is the repeat count, point the GUI at an SDXL or Pony base checkpoint, set the key fields, launch the run, and test the .safetensors output in your generator. Keep all subjects adult, fictional, and AI-generated.

Kohya SS is the most widely used local LoRA trainer, and its GUI front-end (kohya_ss) makes the whole process approachable without touching a single command flag. This guide walks through installation, the exact folder structure Kohya expects, pointing at a base checkpoint, the GUI fields that matter, launching your first run, where the output lands, and how to test it. By the end you will have a working .safetensors LoRA you can load in your generator.

Before you start

Kohya SS trains locally on your own GPU, so hardware matters. For SDXL and Pony LoRA training you want a CUDA GPU with at least 8GB of VRAM, and 12GB or more makes life much easier. If your card is short on memory, read the GPU hardware requirements guide and consider the low-VRAM checkpoint approach to keep batch size and resolution in a trainable range. No suitable GPU at all? A cloud GPU rental runs Kohya just as well, and the cost breakdown helps you budget.

You also need a finished, captioned dataset. If you have not built one yet, start with the dataset guide and the captioning guide. Kohya will not fix a weak dataset.

A nested training folder structure glowing on dark, abstract concept

Installing kohya_ss

The kohya_ss GUI is distributed on GitHub. The install is straightforward on Windows and Linux.

# Windows / Linux install (run in a terminal)
git clone https://github.com/bmaltais/kohya_ss.git
cd kohya_ss

# Windows
.\setup.bat

# Linux
./setup.sh

The setup script creates a Python virtual environment, installs PyTorch with CUDA, installs the dependencies, and prompts you through accelerate configuration. Accept the defaults for a single local GPU unless you know you need otherwise. When it finishes, launch the GUI:

# Windows
.\gui.bat

# Linux
./gui.sh

This opens a local web interface in your browser, usually at http://127.0.0.1:7860. Everything from here is point-and-click.

The folder structure Kohya expects

This is the part that trips up first-timers. Kohya does not just want a folder of images. It wants a specific three-level structure, and the number prefix on the inner folder controls how many times each image is repeated per epoch.

MyLora_project/
  img/
    15_aria_nsfwchar woman/      # 15 = repeats, then trigger, then class
      img_001.png
      img_001.txt
      img_002.png
      img_002.txt
      ...
  reg/                            # optional regularization images
  model/                          # trained LoRA outputs land here
  log/                            # training logs for TensorBoard

The critical folder is 15_aria_nsfwchar woman. Breaking it down:

  • 15 is the repeat count. Each image is seen 15 times per epoch. More repeats on a small dataset, fewer on a large one.
  • aria_nsfwchar is your trigger word, the activation tag baked into every caption.
  • woman is the class (the general category). It helps with regularization and prior preservation.

The three top-level folders (img, model, log) are pointed at separately in the GUI. The reg folder is optional and only used if you supply regularization images.

Setting repeats correctly

Repeats interact directly with your total step count. The rough formula:

total_steps = (num_images * repeats * epochs) / batch_size

For a 20-image character set, 15 repeats over 10 epochs at batch size 1 gives 3,000 steps, which is on the high side; drop repeats to 8 to 10 for roughly 1,600 to 2,000 steps. Aim for 1,200 to 2,500 total steps for a character LoRA. A large style set with 120 images needs far fewer repeats (2 to 4) to hit the same window. The training settings guide covers the math in more detail.

Pointing at a base checkpoint

In the kohya_ss GUI, open the LoRA tab, then the Training parameters and Folders sub-sections.

  1. Under Model, set Pretrained model name or path to your base checkpoint file. For NSFW work this is usually a Pony or Illustrious SDXL checkpoint you already use for generation. Browse the best NSFW checkpoints if you need to pick one, and the install guide if you need to get it onto your machine.
  2. Check SDXL Model if you are training on an SDXL-based checkpoint (Pony and Illustrious both are).
  3. Under Folders, set Image folder to your img directory (the parent, not the 15_... subfolder), Output folder to model, and Logging folder to log.
  4. Set Model output name to something descriptive like aria_nsfwchar_v1.

Train on the same base you intend to generate with. A LoRA trained on Pony behaves best when loaded on Pony.

The key GUI fields

Kohya exposes dozens of fields, but only a handful change your result meaningfully. Here are the must-set ones.

Field Where Typical SDXL/Pony value What it does
Pretrained model Model your Pony/Illustrious .safetensors The base to fine-tune.
SDXL Model Model checked Tells Kohya the base is SDXL.
Train batch size Parameters 1 to 2 Images per step; higher needs more VRAM.
Epochs Parameters 8 to 12 Full passes over the dataset.
Network Rank (dim) Network 16 to 32 LoRA capacity.
Network Alpha Network half of dim (8 to 16) Scales the learned weights.
Learning rate Parameters 1e-4 (unet) How fast it learns.
LR Scheduler Parameters cosine Decay curve of the LR.
Optimizer Parameters AdamW8bit The optimization algorithm.
Mixed precision Parameters bf16 (or fp16) Saves VRAM, speeds training.
Max resolution Parameters 1024,1024 Training resolution.
Enable buckets Parameters checked Mixed aspect ratios.

If you are unsure on dim, alpha, learning rate, and scheduler, the dedicated training settings guide gives safe starting numbers for each base model and explains the tradeoffs.

A representative config

Kohya can save and load your settings as a JSON file. Here is a representative SDXL/Pony character config you can adapt. These are sane starting values, not magic numbers.

# Representative kohya_ss SDXL/Pony LoRA config (key fields)
pretrained_model = "PonyXL.safetensors"
sdxl = true
train_batch_size = 1
max_train_epochs = 10
resolution = "1024,1024"
enable_bucket = true
min_bucket_reso = 512
max_bucket_reso = 1536

network_module = "networks.lora"
network_dim = 24
network_alpha = 12

learning_rate = 1e-4
unet_lr = 1e-4
text_encoder_lr = 5e-5
lr_scheduler = "cosine"
lr_warmup_steps = 0
optimizer_type = "AdamW8bit"

mixed_precision = "bf16"
save_precision = "fp16"
save_every_n_epochs = 2
clip_skip = 2
seed = 42

Saving an epoch every 2 epochs (save_every_n_epochs = 2) is deliberate. It gives you multiple checkpoints to compare so you can pick the one that is trained just right rather than over-baked.

Regularization images: optional but useful

The reg folder holds regularization images, which are generic examples of your class (for a woman class, generic AI-generated women that are not your character). Their job is prior preservation: they stop the LoRA from overwriting the model’s general understanding of the class, which reduces bleed where every woman you generate starts looking like your trained character.

Regularization is optional. For a single tightly scoped character LoRA you can skip it and rely on good captioning and pruning instead. It becomes more valuable when you train aggressively, with high repeats and many epochs, where overfitting risk is higher. If you use it, the reg subfolder follows the same naming pattern with a repeat count of 1, like 1_woman, and the images should be the same class but clearly not your subject. Keep these synthetic and adult as well; the safety rules apply to every image in the project, regularization included.

A training run progress bar with a loss curve descending, glowing

Training multiple concepts at once

Kohya supports multiple concept subfolders inside img, each with its own repeat count and trigger. You might have 15_aria_nsfwchar woman and 15_blake_nsfwchar man side by side to train two characters into one LoRA. This works, but it splits the network’s capacity, so each concept learns a little less cleanly than it would alone. For your first few LoRAs, train one concept at a time. Once you understand how the model converges, multi-concept training becomes a useful way to bundle related subjects.

When you do go multi-concept, balance the repeat counts so each subfolder contributes a similar number of total steps. If one character has 40 images and another has 15, give the smaller set more repeats so neither dominates the training signal.

Launching the first run

With folders set and parameters filled, scroll to the bottom of the GUI and click Start training. The console window (the terminal that launched the GUI) shows the live log: it will load the base model, build buckets, then begin stepping. Watch for:

  • Bucket sizes printed at the start. If a bucket has only one image, your aspect ratios are too scattered.
  • Loss values ticking down. Loss is noisy, so do not panic at jumps; look at the trend.
  • Saved checkpoints appearing in your model folder every 2 epochs.

A 20-image SDXL character LoRA on a 12GB card typically takes 20 to 60 minutes depending on steps and resolution. Let it finish.

Monitoring with TensorBoard

Kohya writes training logs to the log folder, and you can watch them live with TensorBoard. From the kohya_ss directory, launch it pointing at your log folder, then open the URL it prints in your browser. The loss curve gives you a rough sense of progress: a curve that flattens early may mean the learning rate is too low, while one that is still dropping steeply at the final epoch may mean you could train a touch longer. Loss is noisy and not a perfect proxy for image quality, so always confirm with actual sample generations, but it is a helpful second signal during long runs.

Kohya can also generate sample images during training if you fill the sample prompt field. Add a prompt with your trigger word and the safety negatives, and Kohya will render a preview every few epochs so you can literally watch the concept form. This is the fastest way to catch a run going wrong before it finishes.

Where the output lands

When training completes, your model folder contains one or more .safetensors files, named like aria_nsfwchar_v1.safetensors plus epoch checkpoints (aria_nsfwchar_v1-000008.safetensors). These are your LoRAs. Each is small, usually 20MB to 200MB depending on network dim.

Testing the .safetensors

Copy the LoRA into your generator’s LoRA folder. In a local setup that is the models/Lora directory of Forge or ComfyUI. Then generate with the trigger word in the prompt and the LoRA loaded.

# Test prompt (SDXL / Pony)
score_9, score_8_up, aria_nsfwchar, 1girl, solo, standing, soft lighting, detailed background, <lora:aria_nsfwchar_v1:0.8>

# Negative prompt (always include the safety baseline)
child, minor, underage, loli, shota, low quality, blurry, deformed, extra fingers, watermark

Generate a grid across LoRA weights (0.6, 0.8, 1.0) and across the saved epoch checkpoints. The best result is usually a middle epoch at weight 0.7 to 0.9. If the concept is weak, train longer or raise repeats; if outputs are rigid and overcooked, use an earlier epoch or lower the weight. You can run quick comparison generations in our free NSFW AI image generator to sanity-check the concept before fine-tuning weights locally.

Troubleshooting common Kohya errors

  • CUDA out of memory. Lower batch size to 1, drop resolution to 768, enable gradient checkpointing, and use AdamW8bit. The troubleshooting guide has more memory fixes.
  • No images found. Your folder name is wrong. It must be N_trigger class inside img, and you must point the GUI at img, not the subfolder.
  • LoRA does nothing at inference. You forgot the trigger word in the prompt, or you loaded it on a different base than you trained on.
  • Output is overcooked. Too many steps. Use an earlier epoch checkpoint or reduce repeats and epochs.
A base checkpoint plugging into a trainer panel, neon nodes on dark

Naming and versioning your LoRAs

Give every run a clear, versioned output name. A scheme like aria_nsfwchar_v1, aria_nsfwchar_v2_morepose, and so on saves you from a folder full of mystery files when you iterate. Keep a short text note per version recording the dataset size, repeats, dim, alpha, and learning rate you used, plus a one-line verdict after testing. After three or four LoRAs you will have a personal reference of what settings work for your hardware and base model, which is more valuable than any generic guide because it is tuned to your exact setup.

When you publish or archive a finished LoRA, embed the trigger word and recommended weight in the filename or an accompanying note. Future you, and anyone you share it with, needs to know the activation tag to use it at all. A LoRA without its documented trigger is nearly useless, so treat that metadata as part of the deliverable.

Safety and consent

Kohya trains whatever you feed it without judgment, so the responsibility is entirely yours. Every subject must be adult, fictional, and AI-generated or fully consented. Never train minors or minor-appearing subjects, and never a real identifiable person without explicit consent. The US TAKE IT DOWN Act makes non-consensual intimate imagery of real people a serious legal matter, and a trained LoRA can produce it at scale. This is not legal advice; use synthetic or consented data only.

With Kohya installed, your folders structured, and a base checkpoint selected, you have everything needed to ship your first LoRA. Tune the numbers using the settings guide, and when you move on to a specific person or persona, follow the character LoRA guide. Then load it up and generate in our free generator to see your own model in action.

Frequently asked questions

What does the number in the Kohya folder name mean?

The number is the repeat count. A folder named 15_aria_nsfwchar woman tells Kohya to show each image 15 times per epoch. Repeats combine with epochs and batch size to set total training steps. Use more repeats for small datasets and fewer for large ones, aiming for roughly 1,200 to 2,500 total steps for a character LoRA.

Where do I point Kohya, at the img folder or the subfolder?

Point the Image folder field at the img parent directory, not the inner N_trigger class subfolder. Kohya reads the repeat count and trigger from the subfolder name automatically when it scans img. Pointing at the subfolder directly is the most common reason for a no images found error during launch.

What base checkpoint should I train an NSFW LoRA on?

Train on the same checkpoint you intend to generate with, usually a Pony or Illustrious SDXL model for NSFW work. A LoRA learns relative to its base and behaves best when loaded on that same base. Set the checkpoint in the Pretrained model field and check the SDXL Model box if it is an SDXL-derived model.

How long does a Kohya LoRA run take?

A 20-image SDXL or Pony character LoRA on a 12GB GPU typically takes 20 to 60 minutes, depending on total steps and resolution. Larger style datasets, higher resolution, or higher step counts extend that. Cloud GPUs with more memory can train faster at higher batch sizes. Save checkpoints every couple of epochs so you can pick the best one.

Where do the trained LoRA files go?

Kohya writes the .safetensors files to the Output folder you set, typically named model. You will get a final file plus epoch checkpoints if you enabled periodic saving. Copy these into your generator’s LoRA directory, such as models/Lora in Forge or ComfyUI, then load one and include your trigger word in the prompt to test it.

My LoRA does nothing when I generate. What is wrong?

Two usual causes. First, you left the trigger word out of the prompt, so nothing activates the concept. Second, you loaded the LoRA on a different base model than you trained on, which weakens or breaks it. Confirm the trigger is present, the LoRA tag weight is around 0.8, and the base matches your training checkpoint.

How do I fix CUDA out of memory in Kohya?

Lower train batch size to 1, reduce max resolution to 768, enable gradient checkpointing, and use the AdamW8bit optimizer, which is memory efficient. Closing other GPU applications helps. If you still cannot fit it, drop network dim or rent a larger cloud GPU. These changes trade a little speed for the ability to train on limited VRAM.

How do I know which epoch checkpoint is best?

Generate a comparison grid using each saved epoch at a fixed seed and a LoRA weight around 0.8, with the trigger word in the prompt. The best checkpoint captures the concept cleanly without looking rigid or overcooked, usually a middle epoch. Earlier epochs may be undertrained and later ones overbaked, so testing several is the reliable way to choose.