Alibaba WAN 2.7 integration
WAN 2.7 is Alibaba’s latest video generation model, delivering cinematic motion, high visual fidelity, audio-guided generation, and automatic prompt expansion across 5 aspect ratios.
Key capabilities
- Resolution options: 720P (1280x720) and 1080P (1920x1080) output
- 5 aspect ratios:
16:9landscape,9:16portrait,1:1square,4:3standard,3:4standard portrait - Flexible durations: 2 to 15 seconds of video output
- Audio-guided generation: Provide a WAV or MP3 audio file (2-30 seconds, max 15MB) to guide video creation
- Prompt expansion: AI optimizer expands short prompts into detailed scripts for richer, more cinematic output
- Negative prompts: Exclude unwanted elements like watermarks, blur, or distortion (max 500 characters)
- Reproducible results: Fixed seed support (0 to 2147483647) for consistent generation
- Async processing: Webhook notifications or polling for task completion
Use cases
- Marketing videos: Create product showcases and brand content from text descriptions
- Social media content: Generate short-form videos for TikTok, Instagram, and YouTube in portrait or landscape
- Music visualization: Use audio-guided generation to create videos synchronized with a soundtrack
- Concept visualization: Transform ideas and scripts into motion for rapid prototyping
- Educational content: Illustrate concepts with AI-generated video explanations
- Creative exploration: Experiment with text prompts and aspect ratios for unique visual content
API operations
Generate videos by submitting a text prompt to the API. The service returns a task ID for async polling or webhook notification.POST /v1/ai/text-to-video/wan-2-7
Create a new text-to-video generation task
GET /v1/ai/text-to-video/wan-2-7
List all WAN 2.7 T2V tasks with status
GET /v1/ai/text-to-video/wan-2-7/{task-id}
Get task status and results by ID
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
prompt | string | Yes | - | Text description of the video to generate. Max 5000 characters |
negative_prompt | string | No | - | Elements to avoid (e.g., “blurry, watermark”). Max 500 characters |
audio_url | string | No | - | URL of audio file (WAV/MP3, 2-30s, max 15MB) to guide generation |
aspect_ratio | string | No | "16:9" | Output ratio: "16:9", "9:16", "1:1", "4:3", "3:4" |
resolution | string | No | "1080P" | Output resolution: "720P" or "1080P" |
duration | integer | No | 5 | Video length in seconds: 2 to 15 |
seed | integer | No | Random | Seed for reproducibility (0 to 2147483647) |
additional_settings.prompt_extend | boolean | No | true | Enable AI prompt expansion for richer output |
webhook_url | string | No | - | URL for async status notifications |
Frequently Asked Questions
What is WAN 2.7 Text-to-Video and how does it work?
What is WAN 2.7 Text-to-Video and how does it work?
WAN 2.7 Text-to-Video is an AI video generation API developed by Alibaba. You submit a text prompt describing your desired video, receive a task ID immediately, then poll for results or receive a webhook notification when processing completes. The model generates MP4 video at 720P or 1080P resolution in durations from 2 to 15 seconds.
What aspect ratios does WAN 2.7 support?
What aspect ratios does WAN 2.7 support?
WAN 2.7 supports 5 aspect ratios:
16:9 (landscape widescreen), 9:16 (portrait/mobile), 1:1 (square), 4:3 (standard landscape), and 3:4 (standard portrait). The default is 16:9.How does audio-guided generation work?
How does audio-guided generation work?
Provide a WAV or MP3 audio file URL via the
audio_url parameter. The audio must be 2-30 seconds long and under 15MB. WAN 2.7 uses the audio to guide the visual content and motion of the generated video. If no audio is provided, the model may auto-generate audio.What is prompt expansion and when should I use it?
What is prompt expansion and when should I use it?
Prompt expansion (
additional_settings.prompt_extend) uses AI to transform short prompts into detailed video scripts before generation. It is enabled by default. Disable it when you need precise control over exactly what the model generates.How long does video generation take?
How long does video generation take?
Processing time depends on resolution, duration, and server load. Higher resolution (1080P) and longer durations take more time. For production workflows, use webhooks instead of polling for scalable integration.
What are the rate limits for WAN 2.7?
What are the rate limits for WAN 2.7?
Rate limits depend on your subscription tier. See the Rate Limits page for current limits by plan.
How much does WAN 2.7 cost?
How much does WAN 2.7 cost?
See the Pricing page for current rates and subscription options.
What is the difference between WAN 2.7 and WAN 2.6?
What is the difference between WAN 2.7 and WAN 2.6?
WAN 2.7 adds audio-guided generation, 5 aspect ratios (vs limited options in 2.6), extended duration range of 2-15 seconds, and higher prompt limits (5000 characters). WAN 2.6 offers multi-shot sequences. Choose WAN 2.7 for the latest capabilities and audio input support.
Best practices
- Prompt writing: Be specific about scenes, camera movements (zoom, pan, tilt), lighting, and atmosphere. Detailed prompts produce better results than vague descriptions.
- Audio input: Use clean audio files with clear rhythm or speech for best audio-guided results. Ensure audio duration aligns with your target video duration.
- Negative prompts: Always include common artifacts to avoid: “blurry, low quality, watermark, text, distortion, extra limbs”
- Duration selection: Start with shorter durations (2-5 seconds) for quick iterations, then increase for final outputs.
- Prompt expansion: Leave enabled (default) for short prompts. Disable for precise control over generation.
- Reproducibility: Save the
seedvalue from successful generations to recreate similar results. - Production integration: Use webhooks for scalable applications instead of polling.
- Error handling: Implement retry with exponential backoff for 503 errors during high-demand periods.
Related APIs
- WAN 2.7 Image-to-Video: Animate images or extend existing videos with WAN 2.7
- WAN 2.7 Reference-to-Video: Generate videos featuring characters from reference images or videos
- WAN 2.6 Text-to-Video: Previous WAN generation with multi-shot sequences
- WAN 2.5 Text-to-Video: WAN 2.5 with 480p, 720p, and 1080p options