WAN 2.7 Text To Video API

Alibaba WAN 2.7 integration

WAN 2.7 is Alibaba’s latest video generation model, delivering cinematic motion, high visual fidelity, audio-guided generation, and automatic prompt expansion across 5 aspect ratios.

WAN 2.7 Text-to-Video is an AI video generation API that creates MP4 videos from text descriptions. It produces smooth, high-fidelity video with cinematic motion at 720P (1280x720) or 1080P (1920x1080) resolution. The model supports durations from 2 to 15 seconds, 5 configurable aspect ratios, optional audio input for sound-guided generation, and automatic prompt expansion for richer output.

Key capabilities

Resolution options: 720P (1280x720) and 1080P (1920x1080) output
5 aspect ratios: 16:9 landscape, 9:16 portrait, 1:1 square, 4:3 standard, 3:4 standard portrait
Flexible durations: 2 to 15 seconds of video output
Audio-guided generation: Provide a WAV or MP3 audio file (2-30 seconds, max 15MB) to guide video creation
Prompt expansion: AI optimizer expands short prompts into detailed scripts for richer, more cinematic output
Negative prompts: Exclude unwanted elements like watermarks, blur, or distortion (max 500 characters)
Reproducible results: Fixed seed support (0 to 2147483647) for consistent generation
Async processing: Webhook notifications or polling for task completion

Use cases

Marketing videos: Create product showcases and brand content from text descriptions
Social media content: Generate short-form videos for TikTok, Instagram, and YouTube in portrait or landscape
Music visualization: Use audio-guided generation to create videos synchronized with a soundtrack
Concept visualization: Transform ideas and scripts into motion for rapid prototyping
Educational content: Illustrate concepts with AI-generated video explanations
Creative exploration: Experiment with text prompts and aspect ratios for unique visual content

API operations

Generate videos by submitting a text prompt to the API. The service returns a task ID for async polling or webhook notification.

POST /v1/ai/text-to-video/wan-2-7

Create a new text-to-video generation task

GET /v1/ai/text-to-video/wan-2-7

List all WAN 2.7 T2V tasks with status

GET /v1/ai/text-to-video/wan-2-7/{task-id}

Get task status and results by ID

Parameters

Parameter	Type	Required	Default	Description
`prompt`	`string`	Yes	-	Text description of the video to generate. Max 5000 characters
`negative_prompt`	`string`	No	-	Elements to avoid (e.g., “blurry, watermark”). Max 500 characters
`audio_url`	`string`	No	-	URL of audio file (WAV/MP3, 2-30s, max 15MB) to guide generation
`aspect_ratio`	`string`	No	`"16:9"`	Output ratio: `"16:9"`, `"9:16"`, `"1:1"`, `"4:3"`, `"3:4"`
`resolution`	`string`	No	`"1080P"`	Output resolution: `"720P"` or `"1080P"`
`duration`	`integer`	No	`5`	Video length in seconds: 2 to 15
`seed`	`integer`	No	Random	Seed for reproducibility (0 to 2147483647)
`additional_settings.prompt_extend`	`boolean`	No	`true`	Enable AI prompt expansion for richer output
`webhook_url`	`string`	No	-	URL for async status notifications

Frequently Asked Questions

What is WAN 2.7 Text-to-Video and how does it work?

WAN 2.7 Text-to-Video is an AI video generation API developed by Alibaba. You submit a text prompt describing your desired video, receive a task ID immediately, then poll for results or receive a webhook notification when processing completes. The model generates MP4 video at 720P or 1080P resolution in durations from 2 to 15 seconds.

What aspect ratios does WAN 2.7 support?

WAN 2.7 supports 5 aspect ratios: 16:9 (landscape widescreen), 9:16 (portrait/mobile), 1:1 (square), 4:3 (standard landscape), and 3:4 (standard portrait). The default is 16:9.

How does audio-guided generation work?

Provide a WAV or MP3 audio file URL via the audio_url parameter. The audio must be 2-30 seconds long and under 15MB. WAN 2.7 uses the audio to guide the visual content and motion of the generated video. If no audio is provided, the model may auto-generate audio.

What is prompt expansion and when should I use it?

Prompt expansion (additional_settings.prompt_extend) uses AI to transform short prompts into detailed video scripts before generation. It is enabled by default. Disable it when you need precise control over exactly what the model generates.

How long does video generation take?

Processing time depends on resolution, duration, and server load. Higher resolution (1080P) and longer durations take more time. For production workflows, use webhooks instead of polling for scalable integration.

What are the rate limits for WAN 2.7?

Rate limits depend on your subscription tier. See the Rate Limits page for current limits by plan.

How much does WAN 2.7 cost?

See the Pricing page for current rates and subscription options.

What is the difference between WAN 2.7 and WAN 2.6?

WAN 2.7 adds audio-guided generation, 5 aspect ratios (vs limited options in 2.6), extended duration range of 2-15 seconds, and higher prompt limits (5000 characters). WAN 2.6 offers multi-shot sequences. Choose WAN 2.7 for the latest capabilities and audio input support.

Best practices

Prompt writing: Be specific about scenes, camera movements (zoom, pan, tilt), lighting, and atmosphere. Detailed prompts produce better results than vague descriptions.
Audio input: Use clean audio files with clear rhythm or speech for best audio-guided results. Ensure audio duration aligns with your target video duration.
Negative prompts: Always include common artifacts to avoid: “blurry, low quality, watermark, text, distortion, extra limbs”
Duration selection: Start with shorter durations (2-5 seconds) for quick iterations, then increase for final outputs.
Prompt expansion: Leave enabled (default) for short prompts. Disable for precise control over generation.
Reproducibility: Save the seed value from successful generations to recreate similar results.
Production integration: Use webhooks for scalable applications instead of polling.
Error handling: Implement retry with exponential backoff for 503 errors during high-demand periods.

WAN 2.7 Image-to-Video: Animate images or extend existing videos with WAN 2.7
WAN 2.7 Reference-to-Video: Generate videos featuring characters from reference images or videos
WAN 2.6 Text-to-Video: Previous WAN generation with multi-shot sequences
WAN 2.5 Text-to-Video: WAN 2.5 with 480p, 720p, and 1080p options

Get Started

APIs

WAN 2.7 Text To Video API

Alibaba WAN 2.7 integration

Key capabilities

Use cases

API operations

POST /v1/ai/text-to-video/wan-2-7

GET /v1/ai/text-to-video/wan-2-7

GET /v1/ai/text-to-video/wan-2-7/{task-id}

Parameters

Frequently Asked Questions

Best practices

Get Started

APIs

Alibaba WAN 2.7 integration

​Key capabilities

​Use cases

​API operations

POST /v1/ai/text-to-video/wan-2-7

GET /v1/ai/text-to-video/wan-2-7

GET /v1/ai/text-to-video/wan-2-7/{task-id}

​Parameters

​Frequently Asked Questions

​Best practices

​Related APIs

Key capabilities

Use cases

API operations

Parameters

Frequently Asked Questions

Best practices

Related APIs