Combining it with standard image processing (using OpenCV or numpy) is good for speed and memory efficiency, but simply using ControlNet alone might also improve success rates.
In any case, diffusion models tend to be creative when used raw and generally struggle with precise tasks, so it’s best to either constrain their behavior or pinpoint their specific use cases.
Also, if you’re doing local smoothing as part of an algorithm, lightweight inpainting models like LaMa might be an option.
What I would do (highest success rate for many cards)
Your current failures are largely caused by giving the model a blank white ring + a hard rectangular boundary. That combination strongly encourages “mat/frame/border” artifacts, and prompt wording won’t reliably override it.
The most reliable solution is:
- Prefill the bleed area with plausible pixels (not white)
- Feather the mask edge (blur) so there is no hard seam to “draw a line” on
- Use negative_prompt for frame/border/text
- Force the center unchanged by overlay/paste-back (do not rely on the model)
Mask blurring is explicitly recommended in Diffusers for softening the transition at the mask boundary. (Hugging Face)
Diffusers also exposes apply_overlay specifically to overlay the inpainted result with the original image using the mask. (Hugging Face)
Solution 1: “Duplicate the borders then smooth” (best baseline for bleed)
This is exactly your idea, implemented as reflect/edge padding + inpaint.
Step A — Prefill the new canvas by reflecting the edges
Reflection (mirror padding) gives the model immediate context and removes the “white border” cue.
import numpy as np
from PIL import Image
def reflect_pad(img: Image.Image, pad_x: int, pad_y: int) -> Image.Image:
a = np.array(img.convert("RGB"))
a = np.pad(a, ((pad_y, pad_y), (pad_x, pad_x), (0, 0)), mode="reflect")
return Image.fromarray(a, "RGB")
Step B — Mask: paint the outer band, but blur the mask edge
Diffusers documents blur_factor for smoothing the transition. (Hugging Face)
from PIL import Image
def make_ring_mask(orig_w, orig_h, pad_x, pad_y):
new_w, new_h = orig_w + 2*pad_x, orig_h + 2*pad_y
mask = Image.new("L", (new_w, new_h), 255) # white = paint
mask.paste(Image.new("L", (orig_w, orig_h), 0), (pad_x, pad_y)) # black = keep
return mask
Step C — Inpaint with a short prompt + strong negative prompt
prompt = "seamless continuation, matching colors lighting and style"
negative = "frame, border, outline, mat, vignette, stroke, text, watermark, logo, caption"
Step D — Force “center unchanged”
Use apply_overlay (or paste back your original center). apply_overlay is an explicit API on VaeImageProcessor. (Hugging Face)
Note: if you ever use cropping modes or mismatched shapes, apply_overlay can behave unexpectedly; keep image/mask shapes consistent. (GitHub)
Solution 2: Outpaint one side at a time (reduces frame artifacts)
Instead of inpainting a full ring in one go, extend right, then left, then top, then bottom (each pass has more context and less chance to “invent a border”).
This is commonly recommended in practice; one tutorial explicitly notes it’s best to outpaint one direction at a time. (Stable Diffusion Art)
Implementation pattern:
- Expand only one direction by
bleed_px
- Mask only that strip (plus a small overlap into the original)
- Inpaint
- Repeat for the next side
This is slower than one pass, but tends to be much more consistent.
Solution 3: ControlNet Inpaint (more consistent than prompt-only inpainting)
If you want a “Stable Diffusion way” that improves reliability across many different borders, use ControlNet Inpaint.
Why it helps
ControlNet lets you condition generation on an additional control image (not just the prompt), providing stronger guidance and typically reducing random seam/border inventions. Diffusers’ ControlNet docs describe this conditioning mechanism. (Hugging Face)
Recommended weights (SD1.5)
Canonical “make_inpaint_condition” + pipeline usage
Hugging Face provides an example that constructs a control image by setting masked pixels to -1 and then runs StableDiffusionControlNetInpaintPipeline. (Hugging Face)
import numpy as np
import torch
from diffusers import ControlNetModel, StableDiffusionControlNetInpaintPipeline, DDIMScheduler
def make_inpaint_condition(image, image_mask):
image = np.array(image.convert("RGB")).astype(np.float32) / 255.0
image_mask = np.array(image_mask.convert("L")).astype(np.float32) / 255.0
image[image_mask > 0.5] = -1.0 # masked pixels
image = np.expand_dims(image, 0).transpose(0, 3, 1, 2)
return torch.from_numpy(image)
controlnet = ControlNetModel.from_pretrained(
"lllyasviel/control_v11p_sd15_inpaint",
torch_dtype=torch.float16,
use_safetensors=True,
)
pipe = StableDiffusionControlNetInpaintPipeline.from_pretrained(
"stable-diffusion-v1-5/stable-diffusion-v1-5",
controlnet=controlnet,
torch_dtype=torch.float16,
use_safetensors=True,
).to("cuda")
pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config)
control_image = make_inpaint_condition(extended_img, mask).to("cuda", dtype=torch.float16)
out = pipe(
prompt=prompt,
negative_prompt=negative,
image=extended_img,
mask_image=mask,
control_image=control_image,
num_inference_steps=20,
guidance_scale=4.5,
strength=0.9,
).images[0]
Important practical note
Even with ControlNet, mask feathering and non-white prefilling still help a lot.
Solution 4: Multi-ControlNet (Inpaint + Tile) for texture continuity
For bleed margins, the hardest part is often keeping local texture/gradient continuity.
A strong setup is:
- ControlNet Inpaint = “respect the masked/unmasked structure”
- ControlNet Tile = “keep local textures consistent”
Diffusers supports Multi-ControlNet with a list of conditioning scales. (Hugging Face)
There is also community discussion specifically about multi-controlnet + inpaint workflows. (Hugging Face Forums)
High-level idea:
controlnet_conditioning_scale=[0.6, 0.2] (inpaint stronger, tile weaker)
- tile control image is usually the padded image itself (or a downscaled version)
Solution 5: Differential Diffusion (best seam quality on smooth backgrounds)
If your card edges often contain gradients / bokeh / fog / flat tone backgrounds, seams and “frames” are very noticeable. Differential diffusion is explicitly used to reduce outpainting seams by using blurred/graded masks and region-aware behavior. (Hugging Face)
If you can use SDXL, the OzzyGT guide shows a complete workflow (mask blur, expanding to square, etc.) and calls out that blurred masks are important for smooth transitions. (Hugging Face)
Prompting changes (useful, but not the main fix)
Do
- Keep positive prompt short: “seamless continuation / matching style”
- Put prohibitions in negative_prompt
- Lower CFG (
guidance_scale) to reduce “graphic design additions” like borders
Don’t
- Don’t ask for “edges must be identical” via prompt; enforce that mechanically with overlay/paste-back (Diffusers provides this concept via
apply_overlay). (Hugging Face)
Parameter presets I’d start with (SD1.5 inpaint bleed)
For a thin bleed band:
num_inference_steps: 20–30
guidance_scale: 4.0–6.0
strength: 0.75–0.95
- Mask blur: 16–40 (scale with resolution) (Hugging Face)
If you want one “production” answer
For many images, I would implement this fallback chain:
- Reflect-pad only (fast, deterministic)
- Reflect-pad + SD1.5 inpaint (blur mask + negative prompt + overlay) (Hugging Face)
- If still problematic: per-side outpaint (Stable Diffusion Art)
- For stubborn styles: ControlNet Inpaint, optionally + Tile (Hugging Face)
- For smooth gradients/seams: Differential Diffusion (Hugging Face)
This replaces “try 10 seeds” with “change the conditioning so failures become rare.”