Technology Deep Dive
Understanding the technical foundations of consistent virtual character generation.
Semantic Prompt Engineering
Traditional prompt writing treats all words equally. Our semantic engine understands context, relationships, and visual hierarchy. We assign dynamic weights based on:
- Shot composition (weight distribution for framing)
- Content type (SFW/NSFW semantic adjustments)
- Subject focus (facial features vs. environment balance)
- Style coherence (avoiding semantic conflicts)
Technical Details:
Example Weight Distribution:
Base prompt: "woman, red dress, studio lighting"
Semantic Analysis:
- "woman" [weight: 1.0, priority: 1, category: subject]
- "red dress" [weight: 0.9, priority: 2, category: wardrobe]
- "studio lighting" [weight: 0.7, priority: 3, category: environment]
Conflict Detection:
"natural lighting" + "studio lighting" → Semantic conflict
Resolution: Environment context takes precedence
IPAdapter FaceID Integration
Maintaining facial consistency across generations requires sophisticated identity anchoring. We use IPAdapter FaceID Plus v2 with dynamic parameter adjustment:
Parameters:
weight: Identity preservation strength (0.5-0.8)start_at: When identity influence begins (0.1-0.3)end_at: When identity influence ends (0.6-0.7)
Adaptive Logic:
if shot_distance == 'full-body':
weight = 0.50 # Lower for body composition
start_at = 0.30
end_at = 0.60
elif shot_distance == 'portrait':
weight = 0.80 # Higher for facial detail
start_at = 0.10
end_at = 0.70
This ensures faces remain consistent while allowing compositional flexibility.
LoRA Architecture
We employ dual LoRA adapters for optimization and enhancement:
dmd2_4step_lora:
- Reduces sampling steps from 40 to 4-7
- Maintains quality through distillation
- 85% faster generation without quality loss
ultrares_xl25:
- Post-processing detail enhancement
- Micro-detail preservation
- Photorealistic texture refinement
Negative Prompt Strategy
Effective negative prompts are content-aware and adaptive:
SFW Content:
bad quality, worst quality, low resolution, blurry,
distorted face, deformed hands, multiple people,
watermark, text, signature
NSFW Content:
Similar to SFW but excludes body-related negatives to avoid corrupting intended content.
Semantic Conflicts: Our engine detects and removes negative terms that would conflict with positive selections.
Infrastructure & Workflow
Hardware:
- Current: Multiple MacMini M4 workstations (in-house development and inference)
- Planned: NVIDIA cluster for video generation
- Future: Scalable to A6000 (48GB) for advanced video consistency research
- Storage: High-speed NVMe for model caching across workstations
Software Pipeline:
Frontend (React)
↓ [Job Queue]
Bridge Worker (Python)
↓ [Semantic IA Advanced Processing]
ComfyUI Engine
↓ [SDXL + IPAdapter + LoRAs]
Image Output → Hosting
↓ [HMAC-secured URLs]
User Gallery
Performance Metrics:
- Prompt generation: <500ms
- Image generation: 30-40s
- Queue processing: Real-time
- Uptime: 82.5% (monitored)
Research Methodology
Our open research model collects anonymized data:
- Selector combination frequencies
- Generated Prompts
- Generation success rates
- User satisfaction ratings
- Edge case identification
This data trains our semantic understanding and improves weight optimization algorithms.