Immersive AI unifies image, video, audio, and 3D model generation under one API, offering developers and creators seamless access to advanced tools for diverse creative needs.
Your Complete AI Toolkit Just Got Bigger
Immersive AI brings image generation, video creation, audio production, 3D modeling, and text generation together under one unified API. Whether you're a developer integrating AI into your product or a creator building assets for your business, everything you need is in one place — one API key, consistent endpoints, and dozens of state-of-the-art models at your fingertips.
Here's what's new.
Image Generation — Nano Banana 2
Our latest addition for image generation is Nano Banana 2 — a state-of-the-art model that supports up to 14 reference images for unmatched style consistency. Feed it your brand assets, character designs, or product photos, and generate new images that stay true to your visual identity.

Nano Banana 2 joins a growing lineup that includes Flux, Seedream, Z-Image Turbo, and many more — giving you the right model for every use case, from rapid prototyping to production-quality output.
Video Generation — P-Video & Veo 3.1
Create videos from text prompts or transform still images into motion. Veo 3.1 delivers high-quality video generation, while P-Video offers a fast, versatile alternative. Both support text-to-video and image-to-video workflows.
With additional models like WAN, Seedance, and Sora 2 also available, you can choose the right balance of speed, quality, and style for your project.
Audio — ElevenLabs Speech & Music
Full audio capabilities powered by ElevenLabs. Generate natural-sounding speech with text-to-speech, transcribe audio with speech-to-text, transform voices with speech-to-speech conversion, or create original music tracks — all through the same API. These can also be used as a reference audio for your videos!

3D Model Generation — Hunyuan v2.1 & Meshy v6
Turn images or text descriptions into production-ready 3D models. Hunyuan v2.1 excels at detailed mesh generation, while Meshy v6 offers fast text-to-3D and image-to-3D with PBR textures. Export as GLB or USDZ, ready for games, AR, e-commerce, or web experiences.

Built for AI Agents — MCP Server & x402 Payments
This is where it gets interesting. Immersive AI now includes a full MCP (Model Context Protocol) server, allowing AI agents to discover and use every tool in the API programmatically. Your agent can generate images, create videos, produce audio, and build 3D models — all autonomously.
Even better: with x402 protocol support, agents can pay per request using Base USDC — no API key required. On-chain, permissionless, pay-as-you-go. This opens up AI-to-AI commerce where agents can access powerful creative tools without any human setup.

What's Next
We're adding Grok Imagine for video generation, and continuing to expand our model lineup across every category. The best AI tools, one API.
Ready to build?