Product Updates

LTX 2.3 Now Available: 4K Open-Source Video with Native Audio

LTX 2.3 from Lightricks delivers 4K video at up to 50fps, native stereo audio, and 20-second clips. The top open-source video model, now available on BestPhoto.

BestPhoto Team
March 5, 2026
8 min read
LTX 2.3 Now Available: 4K Open-Source Video with Native Audio

LTX 2.3 is now live on BestPhoto. Lightricks' latest video model is the first open-source model to deliver true 4K generation at up to 50fps with native stereo audio — and it does it 18x faster than Wan 2.2. If you've been waiting for an open-weights video model that can compete with the closed-source leaders, this is it.

The Big Picture: LTX 2.3 is a 19-billion parameter model with a dual-stream architecture — 14B parameters for video and 5B for audio. It generates both modalities simultaneously using bidirectional cross-attention, meaning audio and video are synchronized at the architectural level. All training data comes from licensed sources (Getty Images and Shutterstock partnerships), and the model is fully open-source under Apache 2.0.

Try LTX 2.3

Generate 4K videos with native audio from the top open-source video model.

Try Video Generator

What's New in LTX 2.3

LTX 2.3 isn't a minor patch — it rebuilds core components from LTX-2. The VAE, text connector, and vocoder have all been upgraded, and new capabilities like native portrait mode and last-frame interpolation round out the release.

Rebuilt VAE & Sharper Details

  • New latent space with updated VAE trained on higher-quality data
  • Fine textures, hair, and edge detail preserved across full frame
  • Native 4K (2160p) at up to 50fps
  • Native portrait mode (1080x1920) — not cropped

4x Larger Text Connector

  • Gemma3-12B multilingual text encoder
  • Learnable "thinking tokens" for better semantic grounding
  • Complex multi-subject prompts resolve more accurately
  • Spatial relationships and stylistic instructions handled better

Native Audio-Video Generation

Like Seedance and Kling, LTX 2.3 generates audio and video together in a single pass. The 5B-parameter audio stream uses 1D temporal RoPE for sequential processing, connected to the video stream via bidirectional cross-attention layers. The result is synchronized sound effects, ambient audio, and speech without post-processing.

Audio Improvements in 2.3

  • • New HiFi-GAN vocoder modified for stereo synthesis at 24 kHz
  • • Filtered training data for fewer artifacts and silence gaps
  • • Cleaner dialogue and ambient sound separation
  • • Audio-to-video workflow: generate video synchronized to existing audio

Six Generation Workflows

LTX 2.3 supports more generation modes than any other model on BestPhoto:

📝

Text-to-Video

Generate from natural language prompts

📷

Image-to-Video

Animate still images with motion

🎥

Video-to-Video

Edit and transform existing videos

🎧

Audio-to-Video

Generate video synced to audio

➡️

Extend Video

Continue clips beyond their end

🔄

Retake Video

Re-generate specific segments

Try All Six Workflows

Text-to-video, image-to-video, audio-to-video, and more.

Try Video Generator

LTX 2.3 vs. The Competition

Here's how LTX 2.3 stacks up against the other video models on BestPhoto:

FeatureLTX 2.3Kling 3.0Seedance 1.5 Pro
Max Resolution4K (2160p)4K (with upscaling)720p - 1080p
Max Duration20 seconds15 seconds12 seconds
Frame RateUp to 50fpsUp to 60fps24fps
Native AudioYes (stereo 24 kHz)NoYes (joint generation)
Parameters19B (14B video + 5B audio)Undisclosed4.5B
Open SourceYes (Apache 2.0)NoNo
LoRA TrainingYesNoNo
Portrait ModeNative 9:16YesYes
Best For4K cinematic, customizable workflows, cost-conscious teamsAction, physics, multi-shot storyboardsDialogue, lip sync, multilingual content

Where LTX 2.3 Wins

  • Resolution: Only model generating true native 4K
  • Duration: 20 seconds continuous — longest on BestPhoto
  • Cost: 75-80% cheaper than Runway's premium pricing
  • Customization: Full LoRA support for style/character training
  • Open source: Full weights, code, and training scripts

Where Others Still Lead

  • Perceptual quality: Kling 3.0 and Runway Gen-4.5 rank higher on Elo
  • Lip sync: Seedance 1.5 Pro has better dialogue precision
  • Multi-shot: Kling 3.0's storyboard system is unique
  • Speech synthesis: Quality varies across languages

Bottom Line: LTX 2.3 is the best open-source video model available — period. It's the only open-weights model that can generate 4K video with native audio. For teams that need customization (LoRA training), cost efficiency, or self-hosting, it's the clear choice. For raw perceptual quality in a managed service, Kling 3.0 and Runway Gen-4.5 still have an edge.

Prompting Tips for LTX 2.3

LTX 2.3 responds best to concrete, cinematographic prompts. Here's what works:

  • 1.Establish the shot — use cinematography terms (dolly, orbit, jib, tracking, snorkel lens)
  • 2.Describe lighting and color — specify conditions, palette, and atmosphere
  • 3.Describe action as a sequence — from beginning to end, not as a static scene
  • 4.Add lens specs — explicit focal length and aperture reduce edge shimmer by ~18%
  • 5.Limit texture complexity — avoiding high-frequency patterns cuts moire events by ~30%

See It In Action

These examples are generated directly from LTX 2.3. Pay attention to the detail preservation, camera movement, and audio synchronization:

Cinematic

Ethiopian Runner

Ground-level tracking shot with hyper-detailed slow motion and dust particles

"Snorkel lens ground-scraping tracking shot following a barefoot Ethiopian long-distance runner training on a dirt road at dawn, camera inches from the ground racing alongside, her feet kicking up red dust in slow motion at 240fps, Rift Valley landscape blurred in the background, the texture of earth and callused skin in hyper-detail, 40mm snorkel lens at ground height"

Cinematic

Soviet Telescope Drone Shot

FPV drone descent through a derelict radio telescope with vertigo spiral

"Drone descent through the open oculus of a derelict Soviet-era radio telescope dish, spiraling downward into the rusted parabolic bowl where a lone botanist catalogs wildflowers growing through cracked concrete, her red jacket the only color against oxidized metal and grey sky, 24mm on a caged FPV drone"

Cinematic

Indian Wedding

Through-the-veil shot with fabric overlay, bokeh marigolds, and intimate lighting

"Through-the-veil shot of a bride's face during an Indian wedding ceremony, camera positioned behind the sheer red dupatta fabric, the embroidered pattern creating a textured overlay on her face, her eyes lined with kohl looking down at henna-covered hands, marigold garlands in soft background bokeh, 85mm f/1.2"

Audio

Whispering Woman (Audio Demo)

Native audio generation with synchronized whispered speech

"A woman whispering into the microphone"

Feature

New VAE: Sharper Fine Details

Rebuilt latent space preserves textures, hair, and edge detail across the full frame

"Feature showcase: new VAE architecture for sharper output"

Audio

Native Audio Generation

Improved vocoder with filtered training data for cleaner, artifact-free audio

"Feature showcase: native synchronized audio with improved vocoder"

Feature

Flexible Workflows

Text-to-video, image-to-video, audio-to-video, extend, and retake in one model

"Feature showcase: multiple generation workflows"

Create Your Own Videos

Try LTX 2.3 with your own prompts and images.

Try Video Generator

Best Use Cases for LTX 2.3

Cinematic Content

True 4K output with complex camera movements. Film-grade quality for short films, trailers, and music videos.

Custom Characters

LoRA fine-tuning for brand mascots, recurring characters, or specific visual styles. The only video model with full LoRA support.

Product Videos

Cost-effective product demos and ads at 4K resolution. Multiple aspect ratios including portrait for social.

Audio-Synced Content

Generate videos from existing voiceovers, music tracks, or sound effects with automatic synchronization.

Video Editing

Extend, retake, and transform existing videos. Add motion to still images while preserving visual identity.

Budget-Conscious Teams

Starting at $0.04/second for fast variants. Self-hosting option for high-volume workflows.

Technical Details

Architecture

LTX 2.3 uses an Asymmetric Dual-Stream Diffusion Transformer (DiT) with 19 billion total parameters. The 14B video stream uses 3D Rotary Positional Embeddings (RoPE) for spatiotemporal dynamics, while the 5B audio stream uses 1D temporal RoPE for sequential processing. Both streams are connected by bidirectional cross-attention layers with temporal positional embeddings.

  • 19B parameters (14B video + 5B audio)
  • Gemma3-12B text encoder with multi-layer feature extraction
  • Up to 4K (2160p) at 24/25/48/50 FPS
  • 1.22 sec/step on H100 — 18x faster than Wan 2.2-14B
  • Apache 2.0 license — fully open-source with licensed training data

When to Use LTX 2.3 vs. Kling vs. Seedance

Use LTX 2.3 when:

  • • You need true 4K resolution output
  • • Custom characters/styles via LoRA training
  • • Cost is a priority ($0.04-0.16/sec vs $0.20+)
  • • You want 20-second continuous clips
  • • Audio-to-video sync from existing audio tracks

Use Kling 3.0 when:

  • • Physics realism matters most (action, sports)
  • • You need multi-shot storyboard sequences
  • • Creating fast-moving dynamic content
  • • Character consistency across scenes (Elements system)

Use Seedance 1.5 Pro when:

  • • Lip sync precision is critical (ads, dialogue)
  • • Creating multilingual content
  • • Advanced camera moves (Hitchcock dolly zoom)
  • • Emotional talking-head content

4K Open-Source Video Is Here

LTX 2.3 is available now on BestPhoto. Generate 4K videos with native audio, LoRA customization, and six generation workflows — at a fraction of the cost of closed-source alternatives.

No credit card required

Frequently Asked Questions

What makes LTX 2.3 different from LTX-2?

LTX 2.3 rebuilds three core components: a new VAE for sharper detail, a 4x larger text connector for better prompt understanding, and an improved HiFi-GAN vocoder for cleaner audio. It also adds native portrait mode (9:16), last-frame interpolation, and 24/48 FPS options. Note that LTX-2 LoRAs need to be retrained for 2.3's new latent space.

Is LTX 2.3 really open source?

Yes. Full weights, code, and training scripts are released under Apache 2.0 on HuggingFace and GitHub. Companies under $10M annual revenue can use it freely. Larger companies embedding it into products need a commercial license from Lightricks.

How does the quality compare to Kling 3.0 or Runway Gen-4.5?

LTX 2.3 is the #1 open-source video model on the Artificial Analysis leaderboard, but closed-source models like Kling 3.0 (Elo 1,244) and Runway Gen-4.5 (Elo 1,225) rank higher in perceptual quality. Where LTX 2.3 wins is resolution (true 4K), duration (20s), cost, and customizability.

Can I train custom LoRAs on LTX 2.3?

Yes — LTX 2.3 is one of the only video models with full LoRA support. You can train custom styles, characters, or brand identities. However, LoRAs trained on LTX-2 need to be retrained due to the new latent space in 2.3.

What are the camera control options?

LTX 2.3 supports built-in camera LoRAs for dolly in/out, jib up/down, and static shots. You can also describe camera movements in your prompts using cinematography terms like tracking shots, orbital moves, and snorkel lens angles.

What are the known limitations?

Speech synthesis quality varies across underrepresented languages. Multi-speaker scenarios may inconsistently assign dialogue. Temporal coherence can degrade beyond ~20 seconds. The base model can be inconsistent — the distilled variant generally delivers better results.

Ready to Transform Your Photos?

Join thousands of users creating amazing AI-generated photos with BestPhoto