Best NSFW LoRA Training Settings Explained (2026)

14 min read

Safe NSFW LoRA starting settings for SDXL and Pony: network dim 24, alpha 12, unet learning rate 1e-4, text encoder 5e-5, cosine scheduler, AdamW8bit optimizer, batch size 1, 1024 resolution, clip skip 2, around 2,000 total steps. Flux needs lower rates and no clip skip. Keep all subjects adult, fictional, and AI-generated.

Settings are where people burn the most time chasing perfect numbers. The truth is that a clean dataset with mediocre settings beats a messy dataset with perfect settings, so make sure your dataset and captions are solid first. Once they are, this guide gives you safe starting values for every major setting, explains what each one does, and notes how SDXL, Pony, and Flux differ. These are starting points, not commandments. Train, test, adjust.

The mental model

LoRA training balances two failure modes. Undertrain and the concept is weak: the trigger word barely changes anything. Overtrain and the concept is rigid and fried: every output looks identical, anatomy breaks, and flexibility disappears. Almost every setting below is a dial between those two extremes. Your goal is the middle: a LoRA that reliably produces your concept while still responding to the rest of the prompt.

The two biggest levers are total steps and learning rate. Get those in the right zone and everything else is fine-tuning. The Kohya setup guide shows where each of these fields lives in the GUI.

Training dials (dim, alpha, learning rate) with safe ranges, abstract concept

Network dim and alpha

Network dim (rank) is the capacity of the LoRA, how much it can learn. Higher dim captures more detail but produces larger files and overfits faster on small datasets.

  • Character: dim 16 to 32. Start at 24.
  • Style: dim 32 to 64. Styles need more capacity to hold their range.
  • Simple concept or low VRAM: dim 8 to 16.

Network alpha scales the learned weights. The common, safe convention is alpha equals half of dim. With dim 24, use alpha 12. Setting alpha equal to dim makes the effect stronger and learning effectively faster; setting it lower dampens the effect. Half-dim is a reliable default that rarely surprises you.

More dim is not better. A 128-dim character LoRA on 20 images will overfit and bloat to hundreds of megabytes for no quality gain. Match capacity to the concept’s complexity.

Learning rate: unet versus text encoder

Learning rate controls how big each update step is. Too high and training is unstable and fries fast; too low and it never really learns.

Kohya lets you set separate rates for the unet and the text encoder, and you should.

  • Unet learning rate: 1e-4 is the safe default for SDXL and Pony. This is where most of the visual learning happens.
  • Text encoder learning rate: 5e-5, roughly half the unet rate. The text encoder learns the association between your trigger word and the concept. Training it too hard makes the trigger overly dominant and can corrupt nearby tokens.

If your concept comes out weak, nudge the unet rate up toward 1.5e-4. If it fries, drop toward 5e-5. Make one change at a time.

Scheduler

The learning rate scheduler shapes how the rate changes over the run.

  • Cosine: smoothly decays the rate to near zero by the end. This is the safe, recommended default. It learns aggressively early and settles gently, which reduces overfitting at the tail.
  • Constant: holds the rate flat. Predictable but more prone to overcooking if you run long.
  • Cosine with restarts: periodic rate bumps. Useful for longer style runs, overkill for a character.

Start with cosine and a warmup of 0 steps. You rarely need anything else for a first LoRA.

Optimizer

The optimizer is the algorithm that applies the updates.

  • AdamW8bit: the standard. Memory efficient (the 8bit part), stable, and well understood. Use this as your default, especially on limited VRAM.
  • Prodigy: adaptive, tunes its own learning rate. Great when you do not want to hand-tune rates; set the rate to 1.0 and let it adapt. Uses more memory.
  • Adafactor: very memory efficient, good for large runs or tight VRAM, but needs its own rate handling and is less beginner-friendly.

For your first LoRA, AdamW8bit with the rates above is the path of least surprise. If you want a more hands-off run, Prodigy with rate 1.0 and a constant scheduler is a popular combination.

Steps, epochs, and repeats

These three together set how much total training happens. The formula:

total_steps = (num_images * repeats * epochs) / batch_size

Target total steps:

  • Character: 1,500 to 2,500 steps.
  • Style: 2,000 to 4,000 steps (more range to learn).
  • Simple concept: 1,000 to 1,800 steps.

Work backward from the target. A 20-image character at batch size 1: pick 10 epochs and solve for repeats to land near 2,000 steps, which is 10 repeats. Save every 2 epochs so you can pick the best checkpoint rather than guessing the exact step count in advance.

Batch size and resolution

Batch size is how many images process per step. Higher batch size is more stable and faster in wall-clock terms but uses more VRAM. On 8GB, use batch size 1. On 12GB or more you can often run 2. Note that raising batch size effectively lowers your step count, so adjust repeats to compensate.

Resolution should match your base model. SDXL, Pony, and Illustrious train at 1024,1024. SD1.5 trains at 512 or 768. Higher resolution captures more detail but costs significant VRAM. Enable bucketing so mixed aspect ratios train cleanly without forced square crops. If memory is tight, the low-VRAM checkpoint guide and hardware guide help you find a workable resolution.

Clip skip and noise settings

Clip skip controls how many layers from the end of the text encoder are used. Anime and Pony-family models are trained with clip skip 2, so match that for Pony and Illustrious. SDXL base photoreal often uses clip skip 1. Flux does not use clip skip at all.

Noise offset and related noise settings can improve contrast and dynamic range. A small noise offset (0.05 to 0.1) helps darker, moodier outputs render with better contrast. It is optional; leave it at 0 for a first run and add it later if your outputs look flat. Min SNR gamma of 5 is a mild, widely used stabilizer that can improve convergence; it is safe to enable.

The settings reference table

Setting What it does Safe start (SDXL/Pony)
Network dim LoRA capacity 24 (character), 48 (style)
Network alpha Weight scaling half of dim (12 / 24)
Unet LR Visual learning speed 1e-4
Text encoder LR Trigger association speed 5e-5
Scheduler LR decay curve cosine
Optimizer Update algorithm AdamW8bit
Total steps Amount of training 1,500 to 2,500 (character)
Batch size Images per step 1 (8GB) to 2 (12GB+)
Resolution Training size 1024,1024
Clip skip Text encoder layers 2 (Pony/Illustrious)
Min SNR gamma Convergence stabilizer 5
Noise offset Contrast/dynamic range 0 to 0.1
An optimizer and scheduler selector panel glowing on dark

A baseline config you can copy

Here is a complete, conservative SDXL/Pony character baseline. Start here, train, test, then adjust one dial at a time.

# Baseline NSFW character LoRA settings (SDXL / Pony)
network_dim = 24
network_alpha = 12
unet_lr = 1e-4
text_encoder_lr = 5e-5
lr_scheduler = "cosine"
lr_warmup_steps = 0
optimizer_type = "AdamW8bit"
train_batch_size = 1
max_train_epochs = 10
# with ~20 images and 10 repeats -> ~2000 steps
resolution = "1024,1024"
enable_bucket = true
clip_skip = 2
min_snr_gamma = 5
noise_offset = 0.0
mixed_precision = "bf16"
save_every_n_epochs = 2
seed = 42

Test the result with the trigger word and the safety baseline in your negatives:

# Test prompt
score_9, score_8_up, aria_nsfwchar, 1girl, solo, standing, soft lighting, <lora:aria_nsfwchar_v1:0.8>

# Negative prompt
child, minor, underage, loli, shota, low quality, blurry, deformed, extra fingers, watermark

Generate across weights 0.6 to 1.0 and across saved epochs to find the sweet spot. You can run quick checks in our free NSFW AI image generator before committing to local weight sweeps.

Flux differences

Flux is a different architecture and needs different handling.

  • Learning rate: lower. Start around 1e-4 to 4e-4 with the right optimizer, but many Flux LoRA recipes use lower effective rates; watch for frying and back off.
  • No clip skip. Flux does not use it; leave it out.
  • Captions: natural language, not booru tags. See the captioning guide.
  • VRAM: Flux is heavier. You will likely need 16GB or more, or a cloud GPU, and the cost guide helps you plan spend.
  • Dim: Flux LoRAs often work well at dim 16 to 32; capacity behaves differently than SDXL.

Treat Flux as its own recipe rather than porting SDXL numbers directly.

Settings for low-VRAM training

If you are on an 8GB card, several settings shift to keep the run inside memory. Use batch size 1, the AdamW8bit or Adafactor optimizer, mixed precision bf16 or fp16, and enable gradient checkpointing, which trades a little speed for a large memory saving. Drop dim to 16 if you are still tight, and consider training at 768 resolution rather than 1024, accepting slightly less fine detail. Cache latents to disk so the VAE does not sit in VRAM during the run.

These tradeoffs are real but modest. An 8GB LoRA trained carefully is still very usable, especially for a single character where dim 16 is plenty. If you keep hitting out-of-memory errors even after these changes, a short cloud GPU rental is often cheaper than the time spent fighting your card, and you can run the full 1024 settings without compromise.

Settings for style versus character

The biggest setting differences come down to what you are training. A character is a narrow, specific concept that overfits easily, so you want moderate dim, fewer steps, and tight captioning. A style is a broad concept that needs to generalize across many subjects, so it tolerates and benefits from higher dim, more steps, and a larger, more varied dataset.

Concretely, bump a style LoRA to dim 48 with alpha 24, push total steps toward 3,000 to 4,000, and feed it 60 or more images. Keep the same learning rates and optimizer. The character baseline above is deliberately conservative because the most common beginner mistake is overcooking a character, while the most common style mistake is undertraining a thin dataset. Knowing which way each concept tends to fail tells you which direction to adjust first.

How to diagnose and adjust

After your first run, read the symptoms:

  • Concept too weak (trigger barely works): raise unet LR slightly, add repeats or epochs, or raise dim a little.
  • Overcooked (rigid, fried, broken anatomy): use an earlier epoch, lower the LoRA weight at inference, reduce steps, or lower the LR.
  • Identity bleeds into everything: lower text encoder LR, add regularization images, or improve tag pruning in captions.
  • Inflexible poses: your dataset lacked pose variety; fix the data, not the settings.

Change one variable per run. Chasing several at once makes it impossible to know what helped. If outputs are weak in ways settings cannot fix, the problem is usually upstream in the dataset or captions, and the troubleshooting guide covers the generation-side issues.

Common settings myths

A few persistent myths waste people’s time, so worth clearing them up.

  • “Higher dim always means better quality.” False. Dim is capacity, and excess capacity overfits a small dataset and bloats the file. A 24-dim character LoRA usually beats a 128-dim one on the same 20 images.
  • “More steps is always safer.” False. Past a point, more steps fries the LoRA. The right amount is a window, not a floor.
  • “You need an exotic optimizer.” False. AdamW8bit handles the vast majority of LoRAs perfectly. Exotic optimizers solve specific problems you probably do not have yet.
  • “Copy a famous creator’s exact config.” Risky. Their config is tuned to their dataset, base, and hardware. Use it as a reference, but expect to adjust for your own data.

The through-line is that settings are contextual. The same numbers behave differently on different datasets, which is exactly why the iterate-and-test loop matters more than any single config.

A steps and epochs gauge set with recommended markers, neon nodes

Keeping a settings log

The fastest way to get good at this is to write down what you did. After each run, log the dataset size, dim, alpha, learning rates, scheduler, optimizer, total steps, and a one-line verdict on the result. Within a handful of LoRAs you will have a personal cheat sheet that beats any generic guide because it is calibrated to your base model and your GPU. This habit turns trial and error into a deliberate, improving process rather than random guessing.

Pair the log with your saved epoch checkpoints. When a run lands well, note which epoch and which inference weight produced the best output. That single data point, repeated across runs, is how you build genuine intuition for where the sweet spot lives.

Safety and consent

These settings apply only to ethically sourced data. Every subject must be adult, fictional, and AI-generated or fully consented. Never train minors or minor-appearing subjects, and never a real identifiable person without explicit consent. The US TAKE IT DOWN Act treats non-consensual intimate imagery of real people as a serious offense, and a tuned LoRA can mass-produce it. This is not legal advice; stick to synthetic or consented datasets.

With a clean dataset, good captions, and these starting settings, your first LoRA should land in a usable range. Iterate one dial at a time, keep notes per version, and when you are ready to train a specific person or persona, follow the character LoRA guide. Then load your model and put it to work in our free generator.

Frequently asked questions

What network dim and alpha should I use for an NSFW character LoRA?

Start with dim 24 and alpha 12 for a character, following the convention that alpha is half the dim. Styles need more capacity, so use dim 32 to 64 with alpha at half. Higher dim is not automatically better; it overfits faster on small datasets and bloats file size. Match capacity to how complex the concept actually is.

What learning rate works for SDXL and Pony LoRAs?

Use a unet learning rate of 1e-4 and a text encoder rate of 5e-5, about half the unet value. The unet handles visual learning while the text encoder ties the trigger word to the concept. If the concept is weak, nudge the unet rate up toward 1.5e-4. If it fries, drop it. Change one rate at a time.

Which optimizer is best for beginners?

AdamW8bit is the safest default. It is memory efficient and stable, and it pairs well with the standard learning rates. If you prefer a hands-off run, Prodigy adapts its own rate; set the rate to 1.0 and use a constant scheduler. Adafactor is very memory efficient for tight VRAM but is less beginner-friendly and needs special rate handling.

How many total steps should an NSFW LoRA train for?

Aim for 1,500 to 2,500 total steps for a character, 2,000 to 4,000 for a style, and 1,000 to 1,800 for a simple concept. Total steps equal images times repeats times epochs divided by batch size. Save a checkpoint every couple of epochs so you can pick the best one instead of guessing the exact step count in advance.

What scheduler should I pick?

Cosine is the recommended default. It decays the learning rate smoothly toward zero by the end of training, learning aggressively early and settling gently, which reduces overfitting at the tail. Constant holds the rate flat and is more prone to overcooking on long runs. Cosine with restarts suits long style runs but is overkill for a first character LoRA.

What is clip skip and when do I change it?

Clip skip sets how many layers from the end of the text encoder are used. Pony and Illustrious are trained with clip skip 2, so match that for them. SDXL photoreal bases often use clip skip 1. Flux does not use clip skip at all, so leave it out entirely when training a Flux LoRA. Match the base model’s convention.

How are Flux training settings different from SDXL?

Flux is a separate architecture. It skips clip skip entirely, expects natural-language captions rather than booru tags, and generally needs lower effective learning rates with careful watching for frying. It is heavier on VRAM, often needing 16GB or more or a cloud GPU, and tends to work well at dim 16 to 32. Treat Flux as its own recipe rather than porting SDXL numbers.

My LoRA is overcooked. How do I fix it?

Overcooking means too much training. First, try an earlier saved epoch checkpoint, which is often the quickest fix. You can also lower the LoRA weight at inference to around 0.6, reduce total steps, or lower the learning rate on the next run. Change one variable at a time so you can tell which adjustment actually solved the problem.