AI Image & Video Generation Roundup: December 2025

December 2025 brought some of the biggest AI image and video generation announcements we've seen all year. A $3.5 billion startup beat Google and OpenAI at their own game. China's Kling solved the audio-video sync problem. And open-source models continued closing the gap with closed systems. Here's everything you need to know.

The Big Picture: December showed that smaller, focused teams can outpace tech giants. Runway's 100-person team beat trillion-dollar companies. DeepSeek built GPT-5 competitors under U.S. chip restrictions. The AI race is getting more competitive, not less.

Runway Gen-4.5 Takes #1 Spot

The biggest news of the month: Runway's Gen-4.5 is now the world's top-rated AI video model, dethroning Google Veo 3 and pushing OpenAI's Sora 2 Pro to seventh place on the Artificial Analysis benchmark.

What Makes Gen-4.5 Special:

1,247 Elo points — highest score on the Text to Video leaderboard
Realistic physics — objects carry weight and momentum naturally, unlike older models where things "float"
Visual coherence — details like hair, fabric, and surface reflections stay consistent across frames
Native audio — generates synchronized sound effects and ambient audio with the video

Runway CEO Cristóbal Valenzuela codenamed the project "David" — a nod to David vs. Goliath. His 100-person team beat multi-trillion-dollar tech giants. The company is now valued at $3.55 billion after raising $300M+ from NVIDIA and SoftBank.

Known Limitations:

The model still has causal reasoning issues (effects sometimes precede causes), object permanence problems (things can vanish between frames), and success bias (improbable actions often succeed). But it's still the best available.

Read our full Runway Gen-4.5 breakdown →

Kling 2.6: Native Audio-Video Generation

Kuaishou's Kling AI released Video 2.6, introducing "simultaneous audio-visual generation" — the ability to create video, dialogue, sound effects, and ambient sounds in a single pass.

Key Improvements:

Single-pass generation — no more creating silent video first, then adding audio separately
Multiple audio types — speech, dialogue, singing, rap, ambient sounds, and mixed effects
30% lower costs and 15% better instruction following vs. previous versions
10-second 1080p output with synchronized voiceovers and soundscapes

Kling 2.6 Pro positions itself against Sora 2 and Veo 3.1 by offering a complete multimodal workflow. It's particularly strong for multi-character dialogue, music performances, and creative commercials.

See Kling 2.6 in action on BestPhoto →

DeepSeek V3.2: China Matches GPT-5

China's DeepSeek dropped two new AI models that rival OpenAI's latest. The release is significant because it happened despite U.S. chip export restrictions — proving that hardware access isn't the only path to frontier AI.

DeepSeek-V3.2

• Production-ready (no longer "Exp")
• Matches GPT-5 on reasoning benchmarks
• Integrated tool-use (search, calculators, code)
• Open-source under MIT license

DeepSeek-V3.2-Speciale

• Gold medal at 2025 Math Olympiad (35/42)
• Gold at Informatics Olympiad (ranked 10th)
• 2nd place at ICPC World Finals
• Available until December 15

DeepSeek introduced "Sparse Attention" (DSA), which processes long sequences with significantly less compute than traditional models. This architectural innovation explains how they're achieving frontier performance without top-tier NVIDIA chips.

FLUX.2: The Open-Source Image Giant

Black Forest Labs released FLUX.2 in late November, and it's been dominating discussions all month. The 32-billion-parameter model is the most powerful open-weight image generator available.

FLUX.2 Highlights:

Multi-reference conditioning — maintain character and style consistency across up to 10 reference images
4MP editing — edit images at up to 4 megapixels while preserving detail
Improved text rendering — finally, AI that can write legible text in images
Multiple variants — Pro (best quality), Flex (developer control), Dev (open weights), Klein (coming soon, Apache 2.0)

The catch: FLUX.2 is demanding. The full model needs 90GB VRAM. But NVIDIA worked with Black Forest Labs to provide FP8 quantizations that reduce VRAM requirements by 40% and improve performance by 40%.

Try FLUX.2 on BestPhoto →

Kandinsky 5.0: 10-Second Videos, Open Source

A quieter release that's gaining attention: Kandinsky Lab open-sourced their 5.0 family of image and video models under Apache 2.0 license. That means full commercial freedom with no restrictions.

Model Lineup:

Image Lite (6B) — text-to-image generation competing with Stable Diffusion
Video Lite (2B) — 10-second videos at 24fps, runs on consumer GPUs with just 12GB VRAM
Video Pro (19B) — premium quality for text-to-video and image-to-video

The key innovation is NABLA (Neighborhood Adaptive Block-Level Attention), a sparse attention algorithm that enables longer video generation without proportionally more compute. On an H100, 5-second videos take about 30 seconds. On an RTX 4090, expect 2-3 minutes.

Google Flow Gets Major Updates

While Runway was taking the crown, Google quietly rolled out significant improvements to Flow, its AI filmmaking tool. Over 275 million videos have been generated in Flow since launch.

New Flow Features:

Insert — add objects, characters, or visual elements into existing scenes while maintaining shadow and lighting
Remove — eliminate unwanted objects and fill backgrounds convincingly
Audio across all features — synchronized sound effects, dialogue, and ambient audio
Veo 3.1 integration — richer audio, enhanced realism, better narrative control

Veo 3.1 is available via the Gemini app, Gemini API, and Vertex AI. If you're already in the Google ecosystem, it's worth checking out the improvements.

YouTube's AI Push (and Controversy)

YouTube integrated Google DeepMind's Veo 3 Fast into Shorts, letting creators generate video backgrounds and clips with sound for free. New features include "Edit with AI" (auto-editing raw footage), AI video ideation, and a speech-to-song tool.

Privacy Concern:

YouTube's "likeness detection" tool — designed to help creators remove deepfakes of themselves — requires uploading government ID and biometric video. The sign-up form's language suggested this data could train Google's AI models. YouTube told CNBC they've "never used creators' biometric data to train AI" and are reviewing the language.

What the Community Is Talking About

Across Reddit's r/StableDiffusion, r/singularity, and AI art communities, a few themes dominated this month:

Open Source Gains

FLUX.2 and Kandinsky 5.0 releases sparked excitement about open-source catching up to closed models. The 12GB VRAM requirement for Kandinsky Video Lite means consumer GPUs can now generate 10-second videos locally.

Audio Integration Race

Multiple models (Runway 4.5, Kling 2.6, Veo 3.1) now generate synchronized audio. This eliminates the awkward workflow of creating silent video, then adding sound separately. Reddit users are comparing quality and sync accuracy.

China vs. U.S. Competition

DeepSeek matching GPT-5 benchmarks despite chip restrictions generated significant discussion. The architectural innovations (Sparse Attention) suggest hardware isn't destiny — algorithmic efficiency matters too.

AI Slop Concerns

Growing anxiety about AI-generated content flooding platforms. Disney CEO Bob Iger's announcement that Disney+ will allow AI content sparked backlash. The term "clanker" (borrowed from Star Wars) has become Gen Z slang for AI replacing human jobs.

What This Means for Creators

Good News

• Video quality is improving rapidly
• Audio-video sync is now standard
• Open-source options are increasingly viable
• More competition = better tools, lower prices

Challenges

• Hardware requirements remain high for best results
• Platform policies are still evolving
• Copyright concerns persist
• Quality detection is difficult

What to Watch in January

Runway Gen-4.5 full rollout — currently in gradual release, should hit all users soon
More Kling 2.6 adoption — expect comparison videos and tutorials to proliferate
FLUX.2 Klein announcement — the Apache 2.0 variant is "coming soon" and could democratize access further
CES 2025 announcements — hardware and software reveals incoming