LTX 2.3 Now Available: 4K Open-Source Video with Native Audio
LTX 2.3 from Lightricks delivers 4K video at up to 50fps, native stereo audio, and 20-second clips. The top open-source video model, now available on BestPhoto.

LTX 2.3 is now live on BestPhoto. Lightricks' latest video model is the first open-source model to deliver true 4K generation at up to 50fps with native stereo audio — and it does it 18x faster than Wan 2.2. If you've been waiting for an open-weights video model that can compete with the closed-source leaders, this is it.
The Big Picture: LTX 2.3 is a 19-billion parameter model with a dual-stream architecture — 14B parameters for video and 5B for audio. It generates both modalities simultaneously using bidirectional cross-attention, meaning audio and video are synchronized at the architectural level. All training data comes from licensed sources (Getty Images and Shutterstock partnerships), and the model is fully open-source under Apache 2.0.
Try LTX 2.3
Generate 4K videos with native audio from the top open-source video model.
What's New in LTX 2.3
LTX 2.3 isn't a minor patch — it rebuilds core components from LTX-2. The VAE, text connector, and vocoder have all been upgraded, and new capabilities like native portrait mode and last-frame interpolation round out the release.
Rebuilt VAE & Sharper Details
- ✓New latent space with updated VAE trained on higher-quality data
- ✓Fine textures, hair, and edge detail preserved across full frame
- ✓Native 4K (2160p) at up to 50fps
- ✓Native portrait mode (1080x1920) — not cropped
4x Larger Text Connector
- •Gemma3-12B multilingual text encoder
- •Learnable "thinking tokens" for better semantic grounding
- •Complex multi-subject prompts resolve more accurately
- •Spatial relationships and stylistic instructions handled better
Native Audio-Video Generation
Like Seedance and Kling, LTX 2.3 generates audio and video together in a single pass. The 5B-parameter audio stream uses 1D temporal RoPE for sequential processing, connected to the video stream via bidirectional cross-attention layers. The result is synchronized sound effects, ambient audio, and speech without post-processing.
Audio Improvements in 2.3
- • New HiFi-GAN vocoder modified for stereo synthesis at 24 kHz
- • Filtered training data for fewer artifacts and silence gaps
- • Cleaner dialogue and ambient sound separation
- • Audio-to-video workflow: generate video synchronized to existing audio
Six Generation Workflows
LTX 2.3 supports more generation modes than any other model on BestPhoto:
Text-to-Video
Generate from natural language prompts
Image-to-Video
Animate still images with motion
Video-to-Video
Edit and transform existing videos
Audio-to-Video
Generate video synced to audio
Extend Video
Continue clips beyond their end
Retake Video
Re-generate specific segments
Try All Six Workflows
Text-to-video, image-to-video, audio-to-video, and more.
LTX 2.3 vs. The Competition
Here's how LTX 2.3 stacks up against the other video models on BestPhoto:
| Feature | LTX 2.3 | Kling 3.0 | Seedance 1.5 Pro |
|---|---|---|---|
| Max Resolution | 4K (2160p) | 4K (with upscaling) | 720p - 1080p |
| Max Duration | 20 seconds | 15 seconds | 12 seconds |
| Frame Rate | Up to 50fps | Up to 60fps | 24fps |
| Native Audio | Yes (stereo 24 kHz) | No | Yes (joint generation) |
| Parameters | 19B (14B video + 5B audio) | Undisclosed | 4.5B |
| Open Source | Yes (Apache 2.0) | No | No |
| LoRA Training | Yes | No | No |
| Portrait Mode | Native 9:16 | Yes | Yes |
| Best For | 4K cinematic, customizable workflows, cost-conscious teams | Action, physics, multi-shot storyboards | Dialogue, lip sync, multilingual content |
Where LTX 2.3 Wins
- • Resolution: Only model generating true native 4K
- • Duration: 20 seconds continuous — longest on BestPhoto
- • Cost: 75-80% cheaper than Runway's premium pricing
- • Customization: Full LoRA support for style/character training
- • Open source: Full weights, code, and training scripts
Where Others Still Lead
- • Perceptual quality: Kling 3.0 and Runway Gen-4.5 rank higher on Elo
- • Lip sync: Seedance 1.5 Pro has better dialogue precision
- • Multi-shot: Kling 3.0's storyboard system is unique
- • Speech synthesis: Quality varies across languages
Bottom Line: LTX 2.3 is the best open-source video model available — period. It's the only open-weights model that can generate 4K video with native audio. For teams that need customization (LoRA training), cost efficiency, or self-hosting, it's the clear choice. For raw perceptual quality in a managed service, Kling 3.0 and Runway Gen-4.5 still have an edge.
Prompting Tips for LTX 2.3
LTX 2.3 responds best to concrete, cinematographic prompts. Here's what works:
- 1.Establish the shot — use cinematography terms (dolly, orbit, jib, tracking, snorkel lens)
- 2.Describe lighting and color — specify conditions, palette, and atmosphere
- 3.Describe action as a sequence — from beginning to end, not as a static scene
- 4.Add lens specs — explicit focal length and aperture reduce edge shimmer by ~18%
- 5.Limit texture complexity — avoiding high-frequency patterns cuts moire events by ~30%
See It In Action
These examples are generated directly from LTX 2.3. Pay attention to the detail preservation, camera movement, and audio synchronization:
Ethiopian Runner
Ground-level tracking shot with hyper-detailed slow motion and dust particles
"Snorkel lens ground-scraping tracking shot following a barefoot Ethiopian long-distance runner training on a dirt road at dawn, camera inches from the ground racing alongside, her feet kicking up red dust in slow motion at 240fps, Rift Valley landscape blurred in the background, the texture of earth and callused skin in hyper-detail, 40mm snorkel lens at ground height"
Soviet Telescope Drone Shot
FPV drone descent through a derelict radio telescope with vertigo spiral
"Drone descent through the open oculus of a derelict Soviet-era radio telescope dish, spiraling downward into the rusted parabolic bowl where a lone botanist catalogs wildflowers growing through cracked concrete, her red jacket the only color against oxidized metal and grey sky, 24mm on a caged FPV drone"
Indian Wedding
Through-the-veil shot with fabric overlay, bokeh marigolds, and intimate lighting
"Through-the-veil shot of a bride's face during an Indian wedding ceremony, camera positioned behind the sheer red dupatta fabric, the embroidered pattern creating a textured overlay on her face, her eyes lined with kohl looking down at henna-covered hands, marigold garlands in soft background bokeh, 85mm f/1.2"
Whispering Woman (Audio Demo)
Native audio generation with synchronized whispered speech
"A woman whispering into the microphone"
New VAE: Sharper Fine Details
Rebuilt latent space preserves textures, hair, and edge detail across the full frame
"Feature showcase: new VAE architecture for sharper output"
Native Audio Generation
Improved vocoder with filtered training data for cleaner, artifact-free audio
"Feature showcase: native synchronized audio with improved vocoder"
Flexible Workflows
Text-to-video, image-to-video, audio-to-video, extend, and retake in one model
"Feature showcase: multiple generation workflows"
Create Your Own Videos
Try LTX 2.3 with your own prompts and images.
Best Use Cases for LTX 2.3
Cinematic Content
True 4K output with complex camera movements. Film-grade quality for short films, trailers, and music videos.
Custom Characters
LoRA fine-tuning for brand mascots, recurring characters, or specific visual styles. The only video model with full LoRA support.
Product Videos
Cost-effective product demos and ads at 4K resolution. Multiple aspect ratios including portrait for social.
Audio-Synced Content
Generate videos from existing voiceovers, music tracks, or sound effects with automatic synchronization.
Video Editing
Extend, retake, and transform existing videos. Add motion to still images while preserving visual identity.
Budget-Conscious Teams
Starting at $0.04/second for fast variants. Self-hosting option for high-volume workflows.
Technical Details
Architecture
LTX 2.3 uses an Asymmetric Dual-Stream Diffusion Transformer (DiT) with 19 billion total parameters. The 14B video stream uses 3D Rotary Positional Embeddings (RoPE) for spatiotemporal dynamics, while the 5B audio stream uses 1D temporal RoPE for sequential processing. Both streams are connected by bidirectional cross-attention layers with temporal positional embeddings.
- •19B parameters (14B video + 5B audio)
- •Gemma3-12B text encoder with multi-layer feature extraction
- •Up to 4K (2160p) at 24/25/48/50 FPS
- •1.22 sec/step on H100 — 18x faster than Wan 2.2-14B
- •Apache 2.0 license — fully open-source with licensed training data
When to Use LTX 2.3 vs. Kling vs. Seedance
Use LTX 2.3 when:
- • You need true 4K resolution output
- • Custom characters/styles via LoRA training
- • Cost is a priority ($0.04-0.16/sec vs $0.20+)
- • You want 20-second continuous clips
- • Audio-to-video sync from existing audio tracks
Use Kling 3.0 when:
- • Physics realism matters most (action, sports)
- • You need multi-shot storyboard sequences
- • Creating fast-moving dynamic content
- • Character consistency across scenes (Elements system)
Use Seedance 1.5 Pro when:
- • Lip sync precision is critical (ads, dialogue)
- • Creating multilingual content
- • Advanced camera moves (Hitchcock dolly zoom)
- • Emotional talking-head content
4K Open-Source Video Is Here
LTX 2.3 is available now on BestPhoto. Generate 4K videos with native audio, LoRA customization, and six generation workflows — at a fraction of the cost of closed-source alternatives.
No credit card required
Frequently Asked Questions
What makes LTX 2.3 different from LTX-2?
LTX 2.3 rebuilds three core components: a new VAE for sharper detail, a 4x larger text connector for better prompt understanding, and an improved HiFi-GAN vocoder for cleaner audio. It also adds native portrait mode (9:16), last-frame interpolation, and 24/48 FPS options. Note that LTX-2 LoRAs need to be retrained for 2.3's new latent space.
Is LTX 2.3 really open source?
Yes. Full weights, code, and training scripts are released under Apache 2.0 on HuggingFace and GitHub. Companies under $10M annual revenue can use it freely. Larger companies embedding it into products need a commercial license from Lightricks.
How does the quality compare to Kling 3.0 or Runway Gen-4.5?
LTX 2.3 is the #1 open-source video model on the Artificial Analysis leaderboard, but closed-source models like Kling 3.0 (Elo 1,244) and Runway Gen-4.5 (Elo 1,225) rank higher in perceptual quality. Where LTX 2.3 wins is resolution (true 4K), duration (20s), cost, and customizability.
Can I train custom LoRAs on LTX 2.3?
Yes — LTX 2.3 is one of the only video models with full LoRA support. You can train custom styles, characters, or brand identities. However, LoRAs trained on LTX-2 need to be retrained due to the new latent space in 2.3.
What are the camera control options?
LTX 2.3 supports built-in camera LoRAs for dolly in/out, jib up/down, and static shots. You can also describe camera movements in your prompts using cinematography terms like tracking shots, orbital moves, and snorkel lens angles.
What are the known limitations?
Speech synthesis quality varies across underrepresented languages. Multi-speaker scenarios may inconsistently assign dialogue. Temporal coherence can degrade beyond ~20 seconds. The base model can be inconsistent — the distilled variant generally delivers better results.
Ready to Transform Your Photos?
Join thousands of users creating amazing AI-generated photos with BestPhoto