How to Build a Dataset for NSFW LoRA Training (2026)

14 min read

A strong NSFW LoRA dataset is small but clean: roughly 15 to 40 images for a character, 50 to 150 for a style, shot at 1024px or higher, deduplicated, and tightly curated. Variety in pose and lighting paired with consistency in the core concept beats raw image count every time. Keep all subjects adult, fictional, and AI-generated.

Most failed LoRAs do not fail in training. They fail in the dataset. If you already generate images and want to train your own model, the single highest-leverage skill you can build is dataset construction. This guide covers exactly how many images you need by LoRA type, the quality bar, how to balance variety against consistency, cropping, aspect-ratio bucketing, deduplication, and where to source material ethically. Get this stage right and everything downstream gets easier.

Why the dataset matters more than the settings

LoRA training is a compression problem. The network has a limited capacity, defined by your network dim and alpha, and it spends that capacity learning whatever is most consistent across your images. If your dataset is consistent in the thing you want (a face, a style, a pose) and varied in everything else (background, lighting, framing), the model learns the right thing. If your dataset is messy, the model faithfully learns the mess.

That is why throwing 500 random images at a trainer almost never works. You get a blurry average that looks vaguely like your concept but reproduces none of it cleanly. A curated set of 25 clean images, captioned well, will outperform a bloated set of 300 nearly every time. Beginners obsess over learning rate and optimizer choice, but those only matter once the data is solid. Once your dataset is clean, the best NSFW LoRA training settings become a matter of fine-tuning rather than rescue.

Think of it this way: the trainer is an extremely literal student. It will memorize the patterns you give it without judgment. Your job in the dataset stage is to make sure the only strong pattern in the data is the concept you actually want.

A grid of image cards being filtered and bucketed by aspect ratio, abstract concept

How many images by LoRA type

There is no single right number. It depends on what you are teaching. A single face is a narrow concept and needs few images. A broad art style needs many. Here is a working reference you can use as a starting point.

LoRA type Recommended image count Variety notes
Single character (face + body) 15 to 40 Vary pose, angle, expression, lighting, outfit. Keep identity constant.
Single character (face only) 12 to 25 Many head angles, a few expressions, neutral backgrounds.
Clothing / outfit 20 to 40 Same garment on different bodies and poses so it binds to the item, not the person.
Pose or position 25 to 50 Different subjects in the same pose, varied camera angles.
Art style 60 to 150 Wide subject variety, single consistent rendering style.
Concept (object, prop, effect) 30 to 80 Show the concept in many contexts and scales.

These are ranges, not hard limits. Quality dictates the number more than the table does. Twenty excellent images beat sixty mediocre ones. If you are training a consistent character LoRA, bias toward the lower end with tighter curation. If you are training a sprawling style, you genuinely need the higher count to capture its range, otherwise the LoRA only knows how to render the handful of subjects it saw.

A useful mental model: the more abstract and broad the concept, the more images it needs to generalize. The more specific and narrow, the fewer.

The quality bar

Every image in your set should clear a basic floor before it earns a spot.

  • Resolution. Aim for at least 1024px on the short side for SDXL, Pony, and Illustrious bases. For Flux, 1024px and up is the norm. The trainer will downscale and bucket, but it cannot invent detail that was never there. Upscaling soft images first does not help and often bakes in artifacts.
  • Sharpness. No motion blur, no heavy JPEG artifacts, no out-of-focus subjects unless the focus itself is the concept you are teaching.
  • Clean subject. The thing you are training should be clearly visible, well lit, and not occluded by watermarks, text overlays, UI elements, or other people.
  • Correct exposure. Avoid crushed blacks and blown highlights. The model learns from what it can actually see in the pixels.
  • No duplicates. Near-identical frames teach the model to overfit to that exact composition.

Ready to test concepts before you commit a dataset? You can prototype looks with our free NSFW AI image generator and use the outputs that nail the concept as training candidates. This is also a fast way to see whether your base checkpoint can even render the concept before you invest hours in curation.

Variety versus consistency: the core tradeoff

This is the part people get wrong. The rule is simple to state and harder to execute.

Keep the concept constant. Vary everything else.

For a character LoRA, the identity (face structure, body type, defining marks) must be consistent across every image. The pose, camera distance, lighting, background, and outfit should change as much as possible. That variety tells the network “these things are not part of the identity, ignore them,” which forces the face and body to bind to your trigger word.

For a style LoRA, flip it. The rendering style is the constant. The subjects should vary wildly: different people, objects, scenes, and compositions. That teaches the model the style is independent of any single subject, so it can apply the look to anything you prompt later.

A quick gut check: if every image in your character set has the same lighting and pose, the LoRA will only produce that lighting and pose. If every image in your style set is the same subject, the LoRA learns the subject, not the style. Audit your set for accidental consistency in the wrong dimension before you train.

Cropping and framing

Do not crop everything to tight headshots. The model learns what it sees. If you only feed faces, the LoRA struggles with full-body shots and may smear anatomy when asked for one.

  • Include a mix: close-ups, mid-shots, and full-body frames.
  • For a character, roughly 40 percent face-forward portraits, 40 percent waist-up and three-quarter, 20 percent full-body works as a starting split.
  • Crop out distracting clutter, but keep enough context that the subject reads naturally.
  • Never crop so tightly that the subject is partially cut in a way you do not want reproduced later.
  • Keep the subject off-center sometimes. Centered-only datasets produce centered-only outputs.

Aspect-ratio bucketing

Modern trainers, Kohya SS among them, support aspect-ratio bucketing, which lets you train on mixed aspect ratios without forcing everything into a square. Enable it and you do not need to crop every image to 1:1.

The trainer groups images into resolution buckets and processes each at a compatible size. Practical guidance:

# Recommended bucketing for SDXL / Pony / Illustrious
enable_bucket = true
min_bucket_reso = 512
max_bucket_reso = 1536
bucket_reso_steps = 64
resolution = 1024,1024   # target; bucketing adjusts per image

Keep your source aspect ratios reasonable. Extreme panoramas or slivers waste capacity. Stick to common ratios (1:1, 3:4, 4:3, 9:16, 16:9) and the buckets will line up cleanly. When you move on to captioning your dataset, the cropped and bucketed images are exactly what you will write captions for, so settle the crops first.

A dataset folder of thumbnail tiles glowing on dark, clean and organized

Removing duplicates and bad samples

Deduplication is non-negotiable. There are two practical ways to do it:

  1. Visual scan. For small sets under 50 images, just look. Pull anything that is a near-twin of another frame.
  2. Hash or similarity tools. For larger sets, use a perceptual-hash dedup tool or a CLIP-similarity script to flag pairs above a similarity threshold, then review and cull by hand.

Beyond exact duplicates, cut these:

  • Images where the subject is inconsistent with the rest (wrong identity for a character set).
  • Anything with a visible watermark, logo, or signature.
  • Blurry, dark, or low-resolution frames.
  • Compositions that repeat too often (three near-identical poses become one).
  • Anything where the concept you are teaching is barely visible.

It is normal to start with 60 candidates and ship 25. Cutting aggressively is a feature, not a loss. Each weak image you remove raises the average quality the network learns from.

Where to source images ethically

This is the most important section. Read it carefully.

Subjects must be adult (18+), fictional, and either AI-generated or fully owned and consented. Do not train on a real, identifiable person without their explicit, documented consent. Never train on minors or minor-appearing subjects, period. There is no version of that which is acceptable, and the consequences are severe.

The cleanest, safest, and most repeatable source is synthetic data: generate a base concept, then build your dataset from AI outputs of that concept. You own the outputs, no real person is involved, and you have full control over variety. This is the recommended default for every NSFW LoRA, and it sidesteps the entire consent problem.

A practical synthetic workflow:

  1. Generate a batch of a fictional character from a strong base checkpoint. Browse the best NSFW checkpoints for a base that already renders your target style well.
  2. Hand-pick the outputs where the identity is consistent across frames.
  3. Use those as your seed set, generating variations in pose and lighting from them.
  4. Curate down to your final count using the quality bar above.

The US TAKE IT DOWN Act and related laws make non-consensual intimate imagery of real people a serious legal matter. Training a LoRA on someone without consent can produce exactly that kind of harmful output at scale. Avoid it entirely. This is not legal advice; when in doubt, consult a qualified attorney and stick to synthetic or fully consented material.

For a deeper end-to-end walkthrough that ties the dataset into the full pipeline, see the complete guide to training a NSFW LoRA.

Organizing the dataset folder

Before captioning, get your files in order. A clean folder structure saves headaches later and prevents the trainer from choking on stray files. Put all final images in one directory, name them sequentially (img_001.png, img_002.png) so they sort predictably, and keep only the images you intend to ship. Resist the urge to leave “maybe” candidates in the folder; if it is not training-ready, it does not belong there.

Keep your raw candidate pool in a separate backup folder. You will often want to swap an image in or out after a first test run reveals a weakness, and having the originals handy makes that a two-minute job instead of a re-curation. Once the folder is final, every image gets a matching caption file in the same directory, which is the step covered in the captioning guide.

How dataset size interacts with training steps

Dataset size and total training steps are linked. A common formula is total steps equals image count multiplied by repeats multiplied by epochs, divided by batch size. A small 20-image dataset needs more repeats to reach a healthy step count, while a 120-image style dataset reaches the same count with fewer repeats. The target is usually somewhere between 1,000 and 2,500 total steps for a character, more for a complex style.

This is why you cannot reason about image count in isolation. A 15-image character set with 10 repeats over 10 epochs gives 1,500 steps at batch size 1, which is reasonable. The same set at 2 repeats would undertrain badly. Plan the count and the repeats together, and lean on the training settings guide to match them. If you are working on limited hardware, the low-VRAM training approach influences batch size, which in turn changes your step math.

Duplicate and low quality cards being removed from a sample set, neon nodes on dark

A simple dataset checklist

Before you start training, run through this:

  • [ ] Image count matches the LoRA type (see table)
  • [ ] All images 1024px or higher on the short side
  • [ ] Subject is consistent (character) or style is consistent (style)
  • [ ] Wide variety in the non-concept dimensions
  • [ ] Mix of close, mid, and full-body crops
  • [ ] Aspect-ratio bucketing enabled, sane ratios only
  • [ ] Duplicates and near-twins removed
  • [ ] No watermarks, no text overlays
  • [ ] Every subject adult, fictional, AI-generated or consented

With a dataset this clean, you can spin up a test in minutes. If you want to compare your trained LoRA against existing options, the best NSFW LoRAs roundup is a good benchmark, and you can keep prototyping concepts in our free generator while your training run finishes.

Common dataset mistakes

  • Too many images. More is not better. Capacity gets spread thin and the result is mushy.
  • All the same composition. The LoRA can only reproduce the framing it saw.
  • Soft or low-res source. Detail you do not feed in will not appear in outputs.
  • Mixed identities in a character set. The face comes out as a blend of everyone.
  • Captioning before cropping. Settle crops and bucketing first, then caption.
  • Ignoring consent and age. The fastest way to a legal and ethical disaster. Synthetic-only is the safe path.

Get the dataset right and the rest of the pipeline gets dramatically easier. Move next to captioning and tagging your images, then dial in your training settings and run it through Kohya SS. A clean dataset is the foundation that makes all three of those steps work as intended.

Frequently asked questions

How many images do I really need for an NSFW character LoRA?

For a single character covering face and body, 15 to 40 clean, varied images is the sweet spot. Face-only concepts can work with 12 to 25. Quality matters far more than quantity. Twenty sharp, consistent images with varied poses and lighting will outperform sixty soft or repetitive ones, because the network learns whatever is most consistent across the set.

Can I use real photos for training?

Only if every subject is an adult and has given explicit, documented consent, or the images are fully owned by you. Never use a real identifiable person without consent, and never minors or minor-appearing subjects. The safest and most repeatable approach is synthetic data: generate a fictional character and build your dataset from AI outputs you own outright.

What resolution should my training images be?

Aim for at least 1024px on the short side for SDXL, Pony, Illustrious, and Flux bases. The trainer downscales and buckets images, but it cannot create detail that was never captured. Upscaling soft images before training does not help and can introduce artifacts. Start with genuinely sharp, high-resolution source material for the best result.

Should I crop everything to square?

No. Enable aspect-ratio bucketing in your trainer and keep a mix of natural ratios like 1:1, 3:4, and 16:9. Bucketing groups images by resolution and trains each at a compatible size, so you avoid forced square crops that cut off important detail. Just keep ratios reasonable and avoid extreme panoramas or slivers that waste capacity.

How do I balance variety and consistency?

Keep the concept constant and vary everything else. For a character, hold the identity steady while changing pose, lighting, background, and outfit. For a style, hold the rendering style steady while varying subjects widely. This teaches the network which features belong to your concept and which are incidental, so the LoRA binds to the right thing instead of memorizing noise.

Do I need to remove duplicate images?

Yes. Near-identical frames cause the LoRA to overfit to that exact composition, which limits flexibility. For small sets, scan visually and cut twins. For larger sets, use a perceptual-hash or CLIP-similarity tool to flag pairs above a threshold, then review and cull. Also remove watermarked, blurry, dark, or off-identity images during the same pass.

Is it better to have more images or cleaner images?

Cleaner, almost always. LoRA training is a compression problem with limited capacity, and that capacity goes toward whatever is most consistent. A curated set of 25 sharp, well-varied images beats 300 noisy ones, which produce a blurry average. Start with many candidates, then curate down aggressively to the strongest, most consistent images you have.

What is aspect-ratio bucketing and should I use it?

Aspect-ratio bucketing lets a trainer handle mixed aspect ratios by grouping images into resolution buckets and processing each at a compatible size, so you do not have to crop everything to square. Enable it in Kohya SS with sensible min and max bucket resolutions. It preserves composition and lets you use natural framing without wasting detail on forced crops.