NSFW img2img takes an existing image as a starting point and regenerates it through your prompt, with denoising strength deciding how much it changes: low values (0.2 to 0.4) keep the original almost intact for subtle restyles, while high values (0.7 to 0.95) treat the source as loose inspiration and produce a nearly new image. It powers photo-to-AI conversion, sketch-to-image, batch processing, and upscaling.
How img2img works
Text-to-image starts from pure random noise and denoises it into a picture guided by your prompt. Img2img starts from your input image instead: it adds a controlled amount of noise to that image, then denoises it back toward a picture, guided by your prompt. The amount of noise added is set by denoising strength. A little noise leaves most of the original structure intact, so the output closely resembles the input. A lot of noise erases most of the original, so the output is mostly driven by the prompt and only loosely echoes the source.
That single slider, denoising strength, is the entire mental model. Everything else (resize mode, sampler, steps) is supporting cast. Once you internalize that denoise equals how-far-from-the-original, img2img becomes predictable instead of a guessing game. It lives in the img2img tab in A1111 and Forge, and as a VAE Encode plus KSampler chain in ComfyUI, where the KSampler’s denoise input plays the identical role to the denoising strength slider in the A1111 interface. If you want a source image to start from, our free NSFW AI image generator can produce a base you then refine locally with img2img.

Denoising strength explained
Denoising strength runs from 0 to 1 and is the most important img2img control by a wide margin.
- 0.1 to 0.3 (subtle): the output is nearly identical to the input, with minor style or detail shifts. Use for gentle restyles, light cleanup, or adding a touch of a different aesthetic while keeping the exact composition and identity.
- 0.35 to 0.55 (moderate): real change while respecting the original layout, pose, and rough colors. This is the band for converting a photo to an AI render, refining a rough image, or shifting style meaningfully without losing the subject.
- 0.6 to 0.75 (strong): the prompt takes over much of the content. The composition is loosely guided by the input but faces, details, and textures are largely regenerated. Use for turning a sketch into a finished image or substantially reworking a picture.
- 0.8 to 0.95 (near-new): the input is barely more than a color and composition hint. The result is essentially a fresh generation that happens to share the input’s broad layout.
| Denoise range | Output relationship to source | Typical use |
|---|---|---|
| 0.10 to 0.30 | Nearly identical | Subtle restyle, light cleanup |
| 0.35 to 0.55 | Same layout, new render | Photo to AI, refine rough image |
| 0.60 to 0.75 | Loose composition only | Sketch to image, big rework |
| 0.80 to 0.95 | Color/layout hint only | Near-fresh generation |
The practical habit: start lower than you think and raise denoise in steps of 0.05 to 0.1 until the change is enough. Jumping to 0.8 usually loses the very thing you wanted to keep from the source.
It helps to think of denoise as a slider between two extremes you already understand. At 0.0 you would get the input back unchanged (no noise added, nothing to denoise). At 1.0 you get a pure text-to-image generation that ignores the input entirely. Every value in between is a blend of how much the source survives and how much the prompt rewrites. Because the relationship is roughly linear, small adjustments are predictable: 0.5 keeps about half the source character, 0.3 keeps most of it, 0.7 keeps little. Building that intuition means you can usually predict the right value for a task before you run a single generation, and then fine-tune with one or two test runs rather than blindly sweeping the whole range.
Resize modes
When your input image and your output dimensions differ, img2img has to reconcile them, and the resize mode decides how.
- Just resize stretches the input to the output dimensions, distorting it if the aspect ratios differ.
- Crop and resize scales the input to fill the output and crops the overflow, keeping proportions but losing edges. This is the safe default for most work.
- Resize and fill scales the input to fit inside the output and fills the empty margins, useful when you want to keep the whole image and add space (a step toward outpainting).
- Just resize (latent upscale) resizes in latent space, mainly used in upscaling workflows.
For most NSFW img2img, match your output aspect ratio to the input and use Crop and resize so nothing important gets stretched or chopped. If you must change aspect ratio, decide whether keeping the whole frame (resize and fill) or filling the new frame (crop and resize) matters more.
Photo-to-AI conversion
A common use is turning a real photo into an AI-rendered image in your checkpoint’s style. Load the photo, write a prompt describing the subject and the style you want, and set denoise in the 0.4 to 0.6 range. Too low and it stays photographic; too high and it loses the subject’s likeness and pose. The 0.45 to 0.55 sweet spot keeps the composition and rough identity while rendering everything in your model’s aesthetic.
# Photo to AI render starting point:
Denoising strength: 0.5
Resize mode: Crop and resize
Sampling steps: 25 to 30
Sampler: DPM++ 2M Karras
Prompt: describe subject + desired style + quality tags
For likeness-preserving conversions, pairing img2img with a ControlNet (depth or canny from the photo) locks the structure while the prompt restyles it, which lets you push denoise higher without losing the pose. That combination is covered below and in the ControlNet NSFW guide.
Sketch-to-image
Img2img turns rough sketches and crude paint-overs into finished images, which is one of its most useful tricks. Make a rough sketch or block in colors and shapes (even a crude one), load it, write a prompt for the finished result, and use a high denoise of 0.6 to 0.8 so the model fills in detail while following your composition. The sketch supplies layout and rough color; the prompt and checkpoint supply realism and detail.
This is also the basis of the manual-fix trick for broken anatomy: roughly paint the corrected shape over a problem area, then run img2img (or inpaint) at moderate denoise so the model refines your rough paint into clean detail. It gives the model correct structure to build on rather than asking it to invent structure. The same idea underpins fixing hands, covered in the fix hands guide, and overlaps heavily with inpainting when you only want to change part of the image.
Batch img2img
The Batch sub-tab processes a whole folder of images through the same img2img settings, which is powerful for consistent restyling across many frames or applying one treatment to a set. Point it at an input directory and an output directory, set your prompt and denoise, and it runs every image. It is the basis of simple video stylization (process extracted frames) and of bulk restyling a gallery into one aesthetic.
# Batch img2img:
Input directory: /path/to/source_frames
Output directory: /path/to/output
Denoising strength: 0.35 to 0.5
# Lower denoise = more temporal consistency frame to frame
Keep denoise low (0.3 to 0.45) when processing video frames, because higher denoise makes each frame diverge and the result flickers. For a uniform gallery restyle where flicker does not matter, you have more freedom. A fixed seed across the batch also improves consistency, since it removes one source of frame-to-frame variation. Batch processing is also handy for generating many variations of a single source at once by pointing it at a folder of copies, letting you pick the best result without babysitting each run.

Img2img with ControlNet
The most powerful img2img technique is adding ControlNet. On its own, high denoise in img2img loses the original structure. With a ControlNet unit (depth, canny, or openpose) extracted from the input, you lock the structure independently of denoise, so you can push denoise high to fully restyle the image while the pose and composition stay exactly where they were. This decouples how much it changes from how much structure it keeps, which plain img2img cannot do.
For example, converting a photo to a stylized NSFW render at denoise 0.75 would normally lose the pose; add a depth or openpose ControlNet from the photo and the pose holds while the style fully changes. This is the backbone of high-quality photo-to-AI and consistent character work. The control type choice, weights, and model matching (SD1.5 vs SDXL) are all in the ControlNet NSFW guide, and the broader node-based version of this workflow is in the ComfyUI for NSFW guide.
Upscaling via img2img
Img2img is a classic upscaling method. Send an image to img2img at a larger output resolution with a low denoise (0.2 to 0.4), and the model regenerates it at the higher resolution, adding detail rather than just interpolating pixels like a plain resize. The low denoise keeps the image faithful while the larger canvas gives the model room to render finer skin, hair, and fabric.
For large upscales, SD Upscale (a script in the img2img tab) or Ultimate SD Upscale tiles the image, runs img2img on each tile, and stitches them, which avoids the VRAM cost and incoherence of upscaling a huge image in one pass. Combine the tiled upscale with a Tile ControlNet to add detail without hallucinating extra limbs.
# img2img upscale (single pass):
Denoising strength: 0.3
Resize to: 1.5x to 2x source
Sampler: DPM++ 2M Karras
# For big upscales: use SD Upscale / Ultimate SD Upscale
# + Tile ControlNet at weight ~0.5
Keep upscale denoise low. Above about 0.4 the upscale starts changing the image rather than just enlarging it, which can reintroduce anatomy errors you already fixed. The discipline of low-denoise upscaling preserves all your earlier work.
The right order also matters when upscaling is part of a larger edit. Fix anatomy and faces at base resolution first, then upscale, because fixing at base resolution is faster and the corrections carry up cleanly, whereas a high-denoise upscale applied before fixing can undo your work. After the upscale, a final low-denoise img2img pass at around 0.25, or an ADetailer pass, crisps any detail the enlargement softened. This generate, fix, upscale, polish sequence is the reliable backbone of a clean high-resolution NSFW image, and img2img is the engine behind both the conversion and the upscale stages of it.
A reliable img2img routine
Start by matching your output aspect ratio to the input and choosing Crop and resize. Pick a denoise based on intent using the table above, then nudge it in small steps until the change is right. For anything where structure matters (preserving a pose or likeness while restyling), add a ControlNet so you can raise denoise freely. Use low denoise for upscaling and for processing video frames, moderate denoise for photo-to-AI and refinement, and high denoise for sketch-to-image and big reworks. Keep your quality tags and negative prompt in place throughout, since img2img obeys them just like text-to-image. Two or three denoise adjustments, optionally with a ControlNet, handles almost every img2img task cleanly, and once a setting works for a given source type you can reuse it across similar images.

CFG, sampler, and steps in img2img
The sampling settings behave the same as in text-to-image but interact with denoise in ways worth knowing. Steps apply to the denoising that actually runs, and because img2img only denoises part of the way (proportional to your denoise value), a low-denoise pass effectively uses fewer real steps. That is why upscaling at denoise 0.3 is fast even at 30 steps: most of the schedule is skipped. For moderate and high denoise, 25 to 30 steps is plenty. CFG scale controls how hard the prompt is enforced; the same 5 to 8 range that works in text-to-image works here, and pushing CFG too high in img2img tends to fry colors faster because the source already constrains the result. Sampler choice carries over from your normal workflow, and keeping the same sampler you used to generate the source keeps the style consistent through an img2img pass.
One img2img-specific gotcha is the VAE and model match. If you img2img an image through a different checkpoint or VAE than it was made with, you may see a subtle color and contrast shift, especially at low denoise where the original is mostly preserved. For faithful restyles, use the same VAE family, and accept that crossing model families (an SD1.5 image through an SDXL checkpoint) will change the look more than the denoise alone suggests.
When to use img2img vs text-to-image
Reach for text-to-image when you are exploring fresh ideas and want maximum variety, since starting from noise gives the widest range of outputs. Reach for img2img when you already have something worth keeping: a composition, a pose, a likeness, a rough sketch, or an image that needs upscaling or restyling. The mistake beginners make is using high-denoise img2img as a slow text-to-image, which throws away the source’s value and adds no benefit. If denoise is above 0.85 and you are not using a ControlNet to hold structure, you are usually better off in text-to-image with the same prompt. Conversely, if you keep rerolling text-to-image hoping to land a composition you already have in another image, switch to img2img and lock it. Picking the right tab for the job is half of working efficiently. If you do not have a source image yet, generate one with our free NSFW AI image generator and bring it into img2img for refinement, upscaling, or restyling once you have a base you like.
Frequently asked questions
What denoising strength should I use in img2img?
It depends on how much you want to change. Use 0.1 to 0.3 for subtle restyles that keep the original nearly intact, 0.35 to 0.55 for converting a photo to an AI render or refining a rough image while keeping the layout, 0.6 to 0.75 for turning a sketch into a finished image, and 0.8 to 0.95 for a near-fresh generation that only echoes the source’s broad composition. Start low and raise in small steps.
What is the difference between img2img and inpainting?
Img2img regenerates the whole image through your prompt, with denoising strength controlling how far it moves from the source. Inpainting is img2img restricted to a masked region, so it changes only the area you paint while leaving the rest untouched. Use img2img to restyle or upscale an entire image, and inpainting to fix or change one part. Both share the same denoise mechanic, just applied to the whole image or a mask.
How do I convert a real photo into an AI image?
Load the photo in img2img, write a prompt describing the subject and the style you want, and set denoise around 0.45 to 0.55. Too low keeps it photographic, too high loses the likeness and pose. For likeness-preserving conversions, add a depth or canny ControlNet from the photo so the structure is locked and you can push denoise higher to fully restyle without losing the original pose or composition.
Which resize mode should I use in img2img?
Crop and resize is the safe default: it scales the input to fill the output and crops the overflow, keeping proportions. Just resize stretches and distorts if aspect ratios differ. Resize and fill keeps the whole image and fills empty margins, useful when adding space. For most work, match your output aspect ratio to the input and use Crop and resize so nothing important gets stretched or chopped off.
How do I keep video frames consistent in batch img2img?
Use a low denoising strength of around 0.3 to 0.45, because higher denoise makes each frame diverge from the others and the result flickers. A fixed seed across the batch also improves consistency. For stronger temporal stability, add a ControlNet such as depth or canny per frame to lock structure. Even so, plain batch img2img flickers somewhat; dedicated video-consistency workflows go further, but low denoise plus a fixed seed is the simple baseline.
Can I upscale with img2img and how?
Yes. Send the image to img2img at a larger output resolution with a low denoise of 0.2 to 0.4, and the model regenerates it bigger, adding real detail rather than interpolating. Keep denoise low so it enlarges without changing content. For large upscales, use SD Upscale or Ultimate SD Upscale, which tile the image and run img2img per tile, ideally paired with a Tile ControlNet to add detail without hallucinating extra limbs.
How does ControlNet improve img2img?
ControlNet locks structure (pose, depth, edges) independently of denoising strength, so you can push denoise high to fully restyle an image while the composition and pose stay exactly where they were. Plain img2img cannot do this, since high denoise loses the original structure. Extract a depth, canny, or openpose control from your input and the source’s layout holds through aggressive restyling, which is the backbone of high-quality photo-to-AI and consistent character work.
Why does my img2img output lose the original subject?
Your denoising strength is too high. Above about 0.6, the prompt drives most of the content and the source becomes only a loose composition hint, so faces, identity, and detail get regenerated. Lower the denoise toward 0.4 to 0.5 to keep the subject while still restyling. If you need to keep the subject at high denoise, add a ControlNet from the source to lock structure so denoise only changes surface style, not layout.



