volume-highAudio models

We integrate state-of-the-art audio models to transform your visual content with professional-grade sound.

Audio is often the final piece of your workflow, adding the emotional depth and immersion that turns good content into exceptional content.

Whether you're creating UGC videos, product explainers, social ads, or brand storytelling, the right audio transforms how your audience experiences your message.

We've curated audio models that balance quality, speed, and control for different marketing needs.

AI Speech

Generate natural, expressive voice content with precise control over emotion, delivery, and tone.

ElevenLabs v3

The most expressive text-to-speech model available. ElevenLabs v3 delivers human-like speech with unprecedented emotional range and contextual understanding across 70+ languages.

What makes it exceptional:

  • 70+ languages: Maintain consistent voice quality and personality across all supported languages

  • Audio tags: Control emotion, pacing, and delivery with inline tags like [excited], [whispers], [laughs], [dramatic]

  • Multi-speaker dialogue: Generate natural conversations between multiple characters with contextual awareness

  • Emotional depth: Full spectrum of human emotion from subtle nuance to dramatic performance

Best for:

  • Explainer videos requiring emotional storytelling

  • UGC-style voiceovers with authentic human reactions

  • International campaigns requiring multilingual voice consistency

  • Podcast intros, outros, and ad reads

How to make the most of it:

  1. Pick your voice: Browse our curated voice library, we've shortlisted our favorite voices ("Pletor's picks") for faster selection and iteration:

  1. Test the voice: Generate a short sample with your brand's typical messaging to verify fit

  2. Enhance with audio tags: Use our dedicated Creative Assistant to automatically structure your script with emotion and pacing tags (or read about themarrow-up-right yourself).

  1. Fine-tune: Adjust the Stability parameter to control consistency (higher = more predictable, lower = more expressive variation)

  2. Couple it with the right video model, depending on your use case (e.g., Veed Fabricarrow-up-right for UGC videos)


AI Sound

Add professional sound design to your video content without manual audio editing.

Mirelo 1.5

Mirelo 1.5 analyzes your video content and generates synchronized, professional-grade sound effects automatically, no sound design expertise required.

Particularly valuable for AI-generated videos which typically output without audio (or with low quality audio outputs). Transforms silent content into immersive experiences.

What makes it exceptional:

  • Video-aware: Analyzes visual action to generate contextually appropriate sound effects, with or without any text prompt

  • Automatic synchronization: Sound effects match video timing and intensity

  • Long-form support: Process videos up to 10 minutes

  • Professional quality: Studio-grade SFX without sound designers or audio libraries

Best for:

  • (AI-generated) videos that need professional sound design (Sora, Veo, Kling outputs)

  • Social media content requiring attention-grabbing audio

  • Video ads where sound effects drive emotional response

When to use Mirelo 1.5: Use when your video content lacks sound effects or requires professional audio post-production.

Mirelo 1.5 output sample

Last updated