AI Video Generation: Flux, Runway & Production Deployment

Open-source Flux + Wan2.1 on your GPU · Runway Gen-3 Alpha API · Production deployment on AWS · Voice agent → avatar pipeline

Level: IntermediateDuration: 2 days5 modules5 quizzes

Video AI Landscape 2026: ROI, Tools & Architecture Decisions

By the end of this module you'll know which video generation tool to use for each use case, how to estimate costs accurately, and why the open-source + API hybrid approach beats pure cloud solutions by 60–80% on cost.

In 2026, AI video generation crossed the commercial viability threshold. Product teams at Zalando, IKEA, and dozens of mid-market e-commerce companies now generate product demo videos automatically — no studio, no post-production crew. The numbers are clear: video content on product pages delivers a 12% average CTR uplift and a 6–8% conversion rate improvement over static images. For a catalog of 10,000 SKUs, that's an opportunity that wasn't economically feasible even 18 months ago.

The three use cases driving adoption

Product hero videos — 3–5 second looping clips replacing static product images. Cost target: <$0.10/clip. Quality bar: 720p, no artifacts, brand-consistent background.
Training & L&D content — Talking head videos from scripts. A 10-minute training module that used to cost €3,000 in studio time now costs €15 in compute and €80 in API fees.
Personalized marketing — Name/logo insertion in video templates at scale. 1,000 personalized 15-second clips for a campaign: feasible in 4 hours with a batch pipeline.

Tool comparison: Flux / CogVideoX / Wan2.1 vs. Runway vs. Pika vs. Kling

No single tool wins every scenario. The decision is always cost vs. quality vs. control. Here is the honest comparison based on production benchmarks:

Flux.1 (Black Forest Labs) — Image generation only. Best-in-class quality for generating reference keyframes. Runs on 10 GB+ VRAM. Free/open-source. Use as the image foundation for img2video pipelines.
Wan2.1-1.3B (Alibaba) — Text-to-video and image-to-video. Runs on 8 GB VRAM. 480p quality. ~4s/frame on RTX 3080. Best open-source choice for high-volume, cost-sensitive workloads.
CogVideoX-5B (THUDM) — Higher quality open-source video. Needs 24 GB VRAM (A10G). 720p output. Better motion coherence than Wan2.1 for complex scenes.
Runway Gen-3 Alpha (API) — Best quality commercially available. $0.05/5-sec 720p clip. Async job API. Ideal for hero videos where quality is the primary constraint.
Pika 2.0 (API) — Good for stylized/artistic content. More limited API access. Not suited for high-volume product video generation.
Kling 1.6 (Kuaishou) — Competitive quality at slightly lower cost than Runway. Growing API availability. Good motion realism.

Production architecture recommendation: use Wan2.1-1.3B for bulk generation (product thumbnails, drafts), Runway Gen-3 Alpha for hero content (homepage, paid ads), and Flux.1 as the keyframe generator for any img2video pipeline. This hybrid approach cuts costs by 65% vs. using Runway exclusively.

Cost modeling: what does 10,000 product videos actually cost?

Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# Cost calculator for hybrid video pipeline
# Scenario: 10,000 product SKU videos, 5 seconds each, 720p

COSTS = {
    "wan2_1_bulk": {
        "description": "Wan2.1-1.3B on your GPU (RTX 3080 12GB)",
        "videos": 8_000,          # 80% of catalog — standard products
        "cost_per_video": 0.008,  # electricity + amortized GPU cost
        "total": 8_000 * 0.008,
    },
    "runway_gen3_hero": {
        "description": "Runway Gen-3 Alpha API — hero/featured products",
        "videos": 2_000,          # 20% of catalog — premium products
        "cost_per_video": 0.05,   # Runway 2026 pricing, 5-sec 720p
        "total": 2_000 * 0.05,
    },
    "flux_keyframes": {
        "description": "Flux.1-dev for reference image generation",
        "images": 10_000,
        "cost_per_image": 0.003,  # local GPU, ~20s per image on RTX 3080
        "total": 10_000 * 0.003,
    },
    "storage_cdn": {
        "description": "S3 + CloudFront delivery (10k × 30MB avg)",
        "gb_stored": 300,
        "cost_per_gb_month": 0.023,
        "total": 300 * 0.023,
    },
}

total_cost = sum(v["total"] for v in COSTS.values())
for name, v in COSTS.items():
    print(f"{v['description']}: ${v['total']:.2f}")
print(f"\nTotal for 10,000 videos: ${total_cost:.2f}")
print(f"Cost per video: ${total_cost / 10_000:.4f}")

Quiz disponible

Terminez la lecture de ce module puis validez vos connaissances avec le quiz.

Open-Source Video Generation: Flux + Wan2.1 on Your GPU

You need Python 3.11+, PyTorch 2.3+ with CUDA/ROCm, and 8–12 GB VRAM. The full stack runs on a single RTX 3080, RTX 4070, or AMD RX 6700 XT (ROCm). Cloud alternative: an A10G spot instance on AWS costs ~$0.60/hour.

Architecture: Flux.1 → img2video pipeline

The production open-source pipeline has two stages. Stage 1: Flux.1 generates a photorealistic reference image from your product description — this is the anchor frame. Stage 2: Wan2.1 (or CogVideoX) animates that image with a motion prompt, producing a 3–5 second video clip. This two-stage approach gives you precise control over the visual appearance (Flux excels here) while delegating motion generation to a specialized video model.

Stage 1: Generating the reference keyframe with Flux.1

Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
# Stage 1: Flux.1-dev keyframe generation
# Requirements: pip install diffusers transformers accelerate torch
# VRAM: ~10 GB for Flux.1-dev, ~4 GB for Flux.1-schnell

import torch
from diffusers import FluxPipeline
from PIL import Image

def generate_product_keyframe(
    product_description: str,
    output_path: str = "keyframe.png",
    model: str = "black-forest-labs/FLUX.1-dev",
    guidance_scale: float = 3.5,
    steps: int = 28,
) -> Image.Image:
    """Generate a photorealistic product reference frame using Flux.1."""

    pipe = FluxPipeline.from_pretrained(
        model,
        torch_dtype=torch.bfloat16,
    )
    pipe.enable_model_cpu_offload()  # handles VRAM pressure gracefully

    prompt = (
        f"Product photography: {product_description}. "
        "Studio lighting, white background, sharp focus, "
        "commercial grade, 4K, no shadows."
    )

    image = pipe(
        prompt,
        height=768,
        width=768,
        guidance_scale=guidance_scale,
        num_inference_steps=steps,
        max_sequence_length=512,
        generator=torch.Generator("cpu").manual_seed(42),
    ).images[0]

    image.save(output_path)
    print(f"Keyframe saved: {output_path} ({image.size[0]}×{image.size[1]}px)")
    return image


if __name__ == "__main__":
    generate_product_keyframe(
        product_description="ceramic coffee mug, matte black, minimalist design, "
                            "steam rising from top, morning light",
        output_path="mug_keyframe.png",
    )

Stage 2: Animating with Wan2.1

Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
# Stage 2: Wan2.1-1.3B image-to-video animation
# Requirements: pip install diffusers transformers accelerate torch
# VRAM: ~8 GB for Wan2.1-1.3B

import torch
from diffusers import AutoencoderKLWan, WanImageToVideoPipeline
from diffusers.utils import export_to_video, load_image

def animate_product_image(
    image_path: str,
    motion_prompt: str,
    output_path: str = "product_video.mp4",
    duration_frames: int = 33,   # ~3 seconds at 11fps
    fps: int = 11,
) -> str:
    """Animate a product keyframe using Wan2.1 img2video."""

    model_id = "Wan-AI/Wan2.1-I2V-14B-480P-Diffusers"

    vae = AutoencoderKLWan.from_pretrained(
        model_id, subfolder="vae", torch_dtype=torch.float32
    )
    pipe = WanImageToVideoPipeline.from_pretrained(
        model_id,
        vae=vae,
        torch_dtype=torch.bfloat16,
    )
    pipe.enable_model_cpu_offload()

    image = load_image(image_path).resize((832, 480))

    negative_prompt = (
        "blurry, low quality, distorted, watermark, text overlay, "
        "camera shake, overexposed"
    )

    frames = pipe(
        image=image,
        prompt=motion_prompt,
        negative_prompt=negative_prompt,
        height=480,
        width=832,
        num_frames=duration_frames,
        guidance_scale=5.0,
        num_inference_steps=30,
        generator=torch.Generator("cpu").manual_seed(42),
    ).frames[0]

    export_to_video(frames, output_path, fps=fps)
    print(f"Video saved: {output_path} ({duration_frames} frames @ {fps}fps)")
    return output_path


if __name__ == "__main__":
    animate_product_image(
        image_path="mug_keyframe.png",
        motion_prompt=(
            "The coffee mug sits on a wooden table. Steam rises gently. "
            "Slow camera orbit from left to right, cinematic, product showcase."
        ),
        output_path="mug_product_video.mp4",
    )

Exercice pratique

Change the PRODUCT dictionary to a product you work with. Adjust cost constants to reflect your GPU tier (A100 is ~5× faster, proportionally cheaper). Observe the cost delta between local GPU and Runway API — this is the core trade-off.

Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# Exercise: Generate a product video pipeline end-to-end
# Modify the product description and motion prompt to generate
# a video for a different product category.

import os

# For this sandbox, we simulate the pipeline without GPU
# Replace with real model calls in your environment

PRODUCT = {
    "name": "Wireless Noise-Cancelling Headphones",
    "description": "over-ear headphones, matte white, premium build, "
                   "minimalist logo, studio setting",
    "motion": "headphones floating on white background, slow 360-degree rotation, "
              "premium product reveal, cinematic lighting",
}

# --- Cost estimation ---
FLUX_COST_PER_IMAGE = 0.003   # local GPU
WAN_COST_PER_VIDEO  = 0.008   # local GPU, 3-second clip
RUNWAY_COST_5S      = 0.050   # Runway Gen-3 Alpha API

print("=== Product Video Pipeline ===")
print(f"Product: {PRODUCT['name']}")
print()
print("Stage 1 — Keyframe generation (Flux.1-dev):")
print(f"  Prompt: {PRODUCT['description'][:60]}...")
print(f"  Cost: ${FLUX_COST_PER_IMAGE:.4f} (local GPU)")
print()
print("Stage 2 — Animation (Wan2.1-1.3B):")
print(f"  Prompt: {PRODUCT['motion'][:60]}...")
print(f"  Cost: ${WAN_COST_PER_VIDEO:.4f} (local GPU)")
print()
print(f"Total pipeline cost:      ${FLUX_COST_PER_IMAGE + WAN_COST_PER_VIDEO:.4f}")
print(f"Runway Gen-3 equivalent:  ${RUNWAY_COST_5S:.4f}")
print(f"Savings vs. Runway:       {(1 - (FLUX_COST_PER_IMAGE + WAN_COST_PER_VIDEO)/RUNWAY_COST_5S)*100:.0f}%")

Wan2.1-1.3B and 14B model weights total 6–30 GB depending on variant. Download once and cache locally. First inference also compiles CUDA kernels (~5 min on first run). Always call enable_model_cpu_offload() to avoid OOM on consumer GPUs.

Quiz disponible

Terminez la lecture de ce module puis validez vos connaissances avec le quiz.

Runway Gen-3 Alpha API: Production Patterns

Runway's Gen-3 Alpha API is the production standard for quality video generation in 2026. It produces 720p output with strong motion coherence — the bar that local models haven't matched yet at reasonable inference times. The API is async: you submit a job, poll for completion, then download the result. A typical 5-second clip takes 60–90 seconds to generate.

Complete Runway API client with retry and cost tracking

Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
# runway_client.py — production-ready Runway Gen-3 Alpha client
# pip install runwayml httpx

import os
import time
import base64
import httpx
from pathlib import Path
from typing import Literal

RUNWAY_API_KEY = os.environ["RUNWAY_API_KEY"]
RUNWAY_BASE_URL = "https://api.dev.runwayml.com/v1"

# 2026 Runway pricing
COST_PER_SECOND = {
    "gen3a_turbo": 0.008,   # turbo quality
    "gen3a":       0.010,   # standard quality
}


class RunwayClient:
    def __init__(self, api_key: str = RUNWAY_API_KEY):
        self.headers = {
            "Authorization": f"Bearer {api_key}",
            "X-Runway-Version": "2024-11-06",
            "Content-Type": "application/json",
        }
        self.total_cost = 0.0

    def image_to_video(
        self,
        image_path: str,
        prompt: str,
        duration: Literal[5, 10] = 5,
        ratio: str = "1280:720",
        model: str = "gen3a_turbo",
        max_retries: int = 3,
    ) -> str:
        """Submit an image-to-video job and return the output video URL."""

        # Encode source image as base64 data URI
        image_data = Path(image_path).read_bytes()
        ext = Path(image_path).suffix.lstrip(".")
        b64 = base64.b64encode(image_data).decode()
        image_uri = f"data:image/{ext};base64,{b64}"

        payload = {
            "model": model,
            "promptImage": image_uri,
            "promptText": prompt,
            "duration": duration,
            "ratio": ratio,
            "watermark": False,
        }

        # Submit job with retry
        for attempt in range(max_retries):
            with httpx.Client(timeout=30) as client:
                resp = client.post(
                    f"{RUNWAY_BASE_URL}/image_to_video",
                    json=payload,
                    headers=self.headers,
                )
            if resp.status_code == 200:
                task_id = resp.json()["id"]
                print(f"Job submitted: {task_id}")
                break
            elif resp.status_code == 429:
                wait = 2 ** attempt * 10
                print(f"Rate limited. Waiting {wait}s...")
                time.sleep(wait)
            else:
                raise RuntimeError(f"Submission failed: {resp.status_code} {resp.text}")
        else:
            raise RuntimeError("Max retries exceeded on job submission")

        # Poll for completion
        video_url = self._poll_task(task_id)

        # Track cost
        cost = duration * COST_PER_SECOND[model]
        self.total_cost += cost
        print(f"Cost: ${cost:.4f} | Session total: ${self.total_cost:.4f}")

        return video_url

    def _poll_task(self, task_id: str, poll_interval: int = 8) -> str:
        """Poll a Runway task until completion. Returns output video URL."""
        for _ in range(60):  # max 8 min wait
            time.sleep(poll_interval)
            with httpx.Client(timeout=15) as client:
                resp = client.get(
                    f"{RUNWAY_BASE_URL}/tasks/{task_id}",
                    headers=self.headers,
                )
            data = resp.json()
            status = data.get("status")

            if status == "SUCCEEDED":
                url = data["output"][0]
                print(f"Succeeded: {url}")
                return url
            elif status == "FAILED":
                raise RuntimeError(f"Task failed: {data.get('failure', 'unknown')}")
            else:
                print(f"  Status: {status} (elapsed: {_ * poll_interval}s)")

        raise TimeoutError(f"Task {task_id} did not complete within 8 minutes")


# --- Usage example ---
if __name__ == "__main__":
    client = RunwayClient()

    video_url = client.image_to_video(
        image_path="mug_keyframe.png",
        prompt=(
            "The mug sits on a marble kitchen counter. "
            "Morning sunlight streams through a window, casting soft shadows. "
            "Slow push-in camera movement. Cinematic, product advertisement."
        ),
        duration=5,
        model="gen3a_turbo",
    )

    print(f"\nVideo ready: {video_url}")

Exercice pratique

Modify COST_CAP to see how the batch stops early. Add a retry queue for failed jobs. Try adjusting the simulated failure rate to see how it affects total cost.

Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
# Exercise: Batch video generation with cost cap
# Implement a batch generator that stops when a cost threshold is reached

import time
import random

# Simulated product catalog
PRODUCTS = [
    {"sku": "MUG-001", "image": "mug.png", "prompt": "ceramic mug, steam rising, warm kitchen"},
    {"sku": "LAMP-042", "image": "lamp.png", "prompt": "desk lamp turns on, warm glow, office"},
    {"sku": "BAG-118", "image": "bag.png", "prompt": "leather bag rotating, studio lighting"},
    {"sku": "SHOE-205", "image": "shoe.png", "prompt": "sneaker floating, clean background"},
    {"sku": "WATCH-089", "image": "watch.png", "prompt": "watch face close-up, hands moving"},
]

COST_PER_VIDEO = 0.05   # Runway Gen-3 turbo, 5-sec
COST_CAP = 0.20         # Stop after spending this much

def simulate_runway_job(sku: str) -> dict:
    """Simulates a Runway API call (no real API key needed)."""
    time.sleep(0.1)  # simulate network
    success = random.random() > 0.1  # 90% success rate
    return {
        "sku": sku,
        "status": "SUCCEEDED" if success else "FAILED",
        "url": f"https://cdn.example.com/videos/{sku}.mp4" if success else None,
        "cost": COST_PER_VIDEO if success else 0,
    }

def batch_generate(products: list, cost_cap: float) -> list:
    results = []
    total_cost = 0.0

    for product in products:
        if total_cost + COST_PER_VIDEO > cost_cap:
            print(f"Cost cap ${cost_cap:.2f} reached. Stopping.")
            break

        print(f"Generating: {product['sku']}...", end=" ")
        result = simulate_runway_job(product["sku"])
        results.append(result)

        if result["status"] == "SUCCEEDED":
            total_cost += result["cost"]
            print(f"OK — ${total_cost:.2f} spent")
        else:
            print("FAILED — retrying next cycle")

    print(f"\nBatch complete: {len(results)} processed, ${total_cost:.2f} spent")
    return results

batch_generate(PRODUCTS, cost_cap=COST_CAP)

Quiz disponible

Terminez la lecture de ce module puis validez vos connaissances avec le quiz.

Production Deployment: Caching, Batching & Cost Optimization

A production video generation system has three components: a submission API (accepts generation requests), a processing backend (calls Runway/local GPU), and a delivery layer (serves generated videos via CDN). The biggest cost optimization opportunity is deduplication: many product catalogs have near-duplicate images (same product, slightly different angle). Perceptual hashing catches these before the expensive generation call.

Perceptual hash deduplication before generation

Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
# dedup.py — perceptual hash deduplication cache
# pip install imagehash pillow boto3

import hashlib
import json
import boto3
import imagehash
from PIL import Image
from io import BytesIO

class VideoDeduplicationCache:
    """
    Cache layer that prevents re-generating videos for visually similar images.
    Uses perceptual hashing (pHash) with a Hamming distance threshold.
    Stores hash→video_url mappings in DynamoDB.
    """

    def __init__(
        self,
        table_name: str = "video-generation-cache",
        similarity_threshold: int = 8,  # max Hamming distance (0 = identical, 64 = opposite)
        region: str = "eu-west-1",
    ):
        self.threshold = similarity_threshold
        self.dynamo = boto3.resource("dynamodb", region_name=region)
        self.table = self.dynamo.Table(table_name)

    def compute_hash(self, image: Image.Image) -> str:
        """Compute perceptual hash string for an image."""
        phash = imagehash.phash(image, hash_size=8)
        return str(phash)

    def find_similar(self, image: Image.Image) -> str | None:
        """
        Check if a visually similar image has already been processed.
        Returns the existing video URL if found, None otherwise.
        """
        query_hash = imagehash.phash(image, hash_size=8)

        # Scan for nearby hashes (in production, use GSI on hash prefix)
        resp = self.table.scan(
            FilterExpression="attribute_exists(phash)",
            ProjectionExpression="phash, video_url, sku",
        )

        for item in resp.get("Items", []):
            stored_hash = imagehash.hex_to_hash(item["phash"])
            distance = query_hash - stored_hash

            if distance <= self.threshold:
                print(
                    f"  Cache HIT: distance={distance} — reusing video for {item['sku']}"
                )
                return item["video_url"]

        return None

    def store(self, image: Image.Image, sku: str, video_url: str) -> None:
        """Store a new hash→URL mapping after successful generation."""
        phash = self.compute_hash(image)
        self.table.put_item(Item={
            "sku": sku,
            "phash": phash,
            "video_url": video_url,
            "created_at": int(__import__("time").time()),
        })
        print(f"  Cached: {sku} → phash={phash}")


# --- Usage in generation pipeline ---
def generate_with_dedup(sku: str, image: Image.Image, runway_client) -> str:
    cache = VideoDeduplicationCache()

    # Check cache first
    cached_url = cache.find_similar(image)
    if cached_url:
        return cached_url  # free — no generation needed

    # Generate (costs money)
    import tempfile, os
    with tempfile.NamedTemporaryFile(suffix=".png", delete=False) as f:
        image.save(f.name)
        video_url = runway_client.image_to_video(
            image_path=f.name,
            prompt=f"Product showcase for {sku}, smooth rotation, studio lighting",
        )
        os.unlink(f.name)

    cache.store(image, sku, video_url)
    return video_url

SQS + Lambda async processing architecture

Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
# lambda_handler.py — SQS-triggered Lambda for async video generation
# Processes batches of video generation requests from SQS queue

import json
import os
import boto3

s3 = boto3.client("s3")
dynamo = boto3.resource("dynamodb")
jobs_table = dynamo.Table("video-generation-jobs")

VIDEO_BUCKET = os.environ["VIDEO_BUCKET"]
RUNWAY_API_KEY = os.environ["RUNWAY_API_KEY"]


def handler(event: dict, context) -> dict:
    """
    SQS-triggered Lambda. Each SQS message is one video generation job.
    Lambda timeout: 5 minutes (Runway gen takes 60–90s).
    """
    processed = 0
    failed = 0

    for record in event["Records"]:
        body = json.loads(record["body"])
        job_id = body["job_id"]
        sku = body["sku"]
        image_s3_key = body["image_s3_key"]
        prompt = body["prompt"]

        print(f"Processing job {job_id} for SKU {sku}")

        # Update job status
        jobs_table.update_item(
            Key={"job_id": job_id},
            UpdateExpression="SET #s = :s",
            ExpressionAttributeNames={"#s": "status"},
            ExpressionAttributeValues={":s": "processing"},
        )

        try:
            # Download source image from S3
            img_response = s3.get_object(Bucket=VIDEO_BUCKET, Key=image_s3_key)
            image_data = img_response["Body"].read()

            # Save temp file for Runway client
            import tempfile
            with tempfile.NamedTemporaryFile(suffix=".jpg", delete=False) as f:
                f.write(image_data)
                temp_path = f.name

            # Call Runway API
            from runway_client import RunwayClient
            client = RunwayClient(RUNWAY_API_KEY)
            video_url = client.image_to_video(
                image_path=temp_path,
                prompt=prompt,
                duration=5,
                model="gen3a_turbo",
            )

            # Download and store video in S3
            import httpx, os
            video_data = httpx.get(video_url, timeout=60).content
            video_key = f"videos/{sku}/{job_id}.mp4"
            s3.put_object(
                Bucket=VIDEO_BUCKET,
                Key=video_key,
                Body=video_data,
                ContentType="video/mp4",
                CacheControl="public, max-age=31536000",
            )
            cdn_url = f"https://cdn.example.com/{video_key}"

            # Mark job complete
            jobs_table.update_item(
                Key={"job_id": job_id},
                UpdateExpression="SET #s = :s, video_url = :u",
                ExpressionAttributeNames={"#s": "status"},
                ExpressionAttributeValues={":s": "completed", ":u": cdn_url},
            )

            print(f"  Job {job_id} completed: {cdn_url}")
            processed += 1
            os.unlink(temp_path)

        except Exception as e:
            print(f"  Job {job_id} FAILED: {e}")
            jobs_table.update_item(
                Key={"job_id": job_id},
                UpdateExpression="SET #s = :s, error_msg = :e",
                ExpressionAttributeNames={"#s": "status"},
                ExpressionAttributeValues={":s": "failed", ":e": str(e)},
            )
            failed += 1

    return {
        "statusCode": 200,
        "processed": processed,
        "failed": failed,
    }

Set the SQS visibility timeout to 6× the average generation time (e.g., 90s × 6 = 9 minutes) to prevent duplicate processing if Lambda is slow. Configure a Dead Letter Queue (DLQ) to capture jobs that fail 3+ times — these often have invalid images or rate-limit issues requiring manual review.

Quiz disponible

Terminez la lecture de ce module puis validez vos connaissances avec le quiz.

Multimodal Integration: Voice Agents → Talking Avatar Videos

The most powerful video generation pattern in 2026 combines three capabilities you may already have: a Claude-based script generator, an ElevenLabs TTS pipeline, and a lip-sync model. The output is a talking head video that can replace expensive spokesperson content for training modules, customer onboarding, and multilingual product demos. A 5-minute training module that cost €2,500 in studio time now costs €22 in compute.

The voice-to-avatar pipeline

Step 1 — Script: Claude generates the spoken text from your content brief.
Step 2 — TTS: ElevenLabs converts the script to a WAV audio file using your chosen voice.
Step 3 — Lip-sync: SadTalker or Wav2Lip animates a portrait image to match the audio's phonemes.
Step 4 — Merge: FFmpeg combines the video frames with the audio track into the final MP4.

Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
# avatar_pipeline.py — End-to-end voice agent → talking avatar video
# pip install anthropic elevenlabs opencv-python-headless ffmpeg-python

import os
import tempfile
import subprocess
import anthropic
from elevenlabs import ElevenLabs

ANTHROPIC_KEY = os.environ["ANTHROPIC_API_KEY"]
ELEVENLABS_KEY = os.environ["ELEVENLABS_API_KEY"]
VOICE_ID = "nPczCjzI2devNBz1zQrb"  # ElevenLabs "Brian" — clear, professional


def generate_script(topic: str, duration_seconds: int = 60) -> str:
    """Generate a spoken script using Claude."""
    client = anthropic.Anthropic(api_key=ANTHROPIC_KEY)
    words_target = int(duration_seconds * 2.5)  # ~150 wpm

    message = client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=2048,
        messages=[{
            "role": "user",
            "content": (
                f"Write a {duration_seconds}-second training module script on: {topic}. "
                f"Target: {words_target} words. "
                "Conversational tone, clear pronunciation. "
                "No markdown, no headers. Plain spoken text only."
            ),
        }],
    )
    script = message.content[0].text
    word_count = len(script.split())
    print(f"Script generated: {word_count} words (~{word_count//2.5:.0f}s audio)")
    return script


def text_to_audio(script: str, output_path: str = "voice.wav") -> str:
    """Convert script to WAV audio using ElevenLabs."""
    client = ElevenLabs(api_key=ELEVENLABS_KEY)

    audio = client.text_to_speech.convert(
        text=script,
        voice_id=VOICE_ID,
        model_id="eleven_turbo_v2_5",
        output_format="pcm_44100",
    )

    with open(output_path, "wb") as f:
        for chunk in audio:
            f.write(chunk)

    print(f"Audio saved: {output_path}")
    return output_path


def generate_talking_avatar(
    portrait_path: str,
    audio_path: str,
    output_path: str = "avatar_video.mp4",
    sadtalker_path: str = "~/SadTalker",
) -> str:
    """
    Run SadTalker to create lip-synced avatar video.
    SadTalker: github.com/OpenTalker/SadTalker
    Runs on 6 GB VRAM, generates ~1s video per 1s of audio.
    """
    cmd = [
        "python", f"{sadtalker_path}/inference.py",
        "--driven_audio", audio_path,
        "--source_image", portrait_path,
        "--result_dir", "./results",
        "--still",               # reduces head movement for professional look
        "--preprocess", "crop",
        "--enhancer", "gfpgan",  # face enhancement
    ]

    result = subprocess.run(cmd, capture_output=True, text=True, timeout=300)
    if result.returncode != 0:
        raise RuntimeError(f"SadTalker failed: {result.stderr}")

    print(f"Avatar video: {output_path}")
    return output_path


def full_pipeline(
    topic: str,
    portrait_path: str,
    output_path: str = "training_video.mp4",
    duration_seconds: int = 60,
) -> str:
    """End-to-end: topic → talking avatar video."""
    with tempfile.TemporaryDirectory() as tmpdir:
        script = generate_script(topic, duration_seconds)

        audio_path = f"{tmpdir}/voice.wav"
        text_to_audio(script, audio_path)

        video_path = generate_talking_avatar(
            portrait_path=portrait_path,
            audio_path=audio_path,
            output_path=output_path,
        )

    print(f"\nPipeline complete: {output_path}")
    return video_path


if __name__ == "__main__":
    full_pipeline(
        topic="How to write effective AI prompts for customer support teams",
        portrait_path="trainer_portrait.jpg",
        output_path="training_module_01.mp4",
        duration_seconds=90,
    )

Exercice pratique

Adjust the costs to match your region and team rates. The 'modules_per_day' in STUDIO is often the most optimistic assumption — try 2 for a more realistic studio baseline.

Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# Exercise: Cost calculator for training content pipeline
# Compare: studio production vs. AI avatar pipeline

import math

# --- Studio production costs (traditional) ---
STUDIO = {
    "half_day_studio_rental": 800,    # EUR
    "videographer": 400,
    "presenter_fee": 600,
    "editing_post_production": 500,
    "modules_per_day": 4,             # realistic for studio sessions
}

studio_cost_per_module = (
    STUDIO["half_day_studio_rental"]
    + STUDIO["videographer"]
    + STUDIO["presenter_fee"]
    + STUDIO["editing_post_production"]
) / STUDIO["modules_per_day"]

# --- AI avatar pipeline costs ---
AI_PIPELINE = {
    "claude_script_gen": 0.03,        # claude-sonnet-4-5, ~1000 tokens out
    "elevenlabs_tts": 0.18,           # $0.18/1000 chars × ~900 chars/90s script
    "sadtalker_compute": 1.20,        # 2 min on A10G spot @ $0.60/hr = $1.20
    "s3_cloudfront": 0.05,            # storage + delivery
}

ai_cost_per_module = sum(AI_PIPELINE.values())

# --- Comparison ---
modules = [1, 5, 10, 50, 100]

print("=== Training Content Cost Comparison ===")
print(f"{'Modules':>10} | {'Studio (EUR)':>14} | {'AI Pipeline (EUR)':>18} | {'Savings':>10}")
print("-" * 60)
for n in modules:
    studio_total = n * studio_cost_per_module
    ai_total = n * ai_cost_per_module
    savings_pct = (1 - ai_total / studio_total) * 100
    print(f"{n:>10} | {studio_total:>14,.0f} | {ai_total:>18,.2f} | {savings_pct:>9.0f}%")

print(f"\nBreak-even: {studio_cost_per_module / ai_cost_per_module:.1f}x cheaper with AI pipeline")

Deepfake regulations vary by jurisdiction. Before deploying a talking avatar in production, verify: (1) you have explicit consent from the person whose likeness is used, (2) the video includes a disclosure label ('AI-generated') where required by law, (3) your terms of service cover AI-generated content. The EU AI Act (2026 enforcement) requires transparency labeling for synthetic media.

Quiz disponible

Terminez la lecture de ce module puis validez vos connaissances avec le quiz.