Skip to main content

Alibaba WAN 2.7 integration

WAN 2.7 is Alibaba’s latest video generation model, delivering cinematic motion, high visual fidelity, audio-guided generation, and automatic prompt expansion across 5 aspect ratios.
WAN 2.7 Text-to-Video is an AI video generation API that creates MP4 videos from text descriptions. It produces smooth, high-fidelity video with cinematic motion at 720P (1280x720) or 1080P (1920x1080) resolution. The model supports durations from 2 to 15 seconds, 5 configurable aspect ratios, optional audio input for sound-guided generation, and automatic prompt expansion for richer output.

Key capabilities

  • Resolution options: 720P (1280x720) and 1080P (1920x1080) output
  • 5 aspect ratios: 16:9 landscape, 9:16 portrait, 1:1 square, 4:3 standard, 3:4 standard portrait
  • Flexible durations: 2 to 15 seconds of video output
  • Audio-guided generation: Provide a WAV or MP3 audio file (2-30 seconds, max 15MB) to guide video creation
  • Prompt expansion: AI optimizer expands short prompts into detailed scripts for richer, more cinematic output
  • Negative prompts: Exclude unwanted elements like watermarks, blur, or distortion (max 500 characters)
  • Reproducible results: Fixed seed support (0 to 2147483647) for consistent generation
  • Async processing: Webhook notifications or polling for task completion

Use cases

  • Marketing videos: Create product showcases and brand content from text descriptions
  • Social media content: Generate short-form videos for TikTok, Instagram, and YouTube in portrait or landscape
  • Music visualization: Use audio-guided generation to create videos synchronized with a soundtrack
  • Concept visualization: Transform ideas and scripts into motion for rapid prototyping
  • Educational content: Illustrate concepts with AI-generated video explanations
  • Creative exploration: Experiment with text prompts and aspect ratios for unique visual content

API operations

Generate videos by submitting a text prompt to the API. The service returns a task ID for async polling or webhook notification.

POST /v1/ai/text-to-video/wan-2-7

Create a new text-to-video generation task

GET /v1/ai/text-to-video/wan-2-7

List all WAN 2.7 T2V tasks with status

GET /v1/ai/text-to-video/wan-2-7/{task-id}

Get task status and results by ID

Parameters

ParameterTypeRequiredDefaultDescription
promptstringYes-Text description of the video to generate. Max 5000 characters
negative_promptstringNo-Elements to avoid (e.g., “blurry, watermark”). Max 500 characters
audio_urlstringNo-URL of audio file (WAV/MP3, 2-30s, max 15MB) to guide generation
aspect_ratiostringNo"16:9"Output ratio: "16:9", "9:16", "1:1", "4:3", "3:4"
resolutionstringNo"1080P"Output resolution: "720P" or "1080P"
durationintegerNo5Video length in seconds: 2 to 15
seedintegerNoRandomSeed for reproducibility (0 to 2147483647)
additional_settings.prompt_extendbooleanNotrueEnable AI prompt expansion for richer output
webhook_urlstringNo-URL for async status notifications

Frequently Asked Questions

WAN 2.7 Text-to-Video is an AI video generation API developed by Alibaba. You submit a text prompt describing your desired video, receive a task ID immediately, then poll for results or receive a webhook notification when processing completes. The model generates MP4 video at 720P or 1080P resolution in durations from 2 to 15 seconds.
WAN 2.7 supports 5 aspect ratios: 16:9 (landscape widescreen), 9:16 (portrait/mobile), 1:1 (square), 4:3 (standard landscape), and 3:4 (standard portrait). The default is 16:9.
Provide a WAV or MP3 audio file URL via the audio_url parameter. The audio must be 2-30 seconds long and under 15MB. WAN 2.7 uses the audio to guide the visual content and motion of the generated video. If no audio is provided, the model may auto-generate audio.
Prompt expansion (additional_settings.prompt_extend) uses AI to transform short prompts into detailed video scripts before generation. It is enabled by default. Disable it when you need precise control over exactly what the model generates.
Processing time depends on resolution, duration, and server load. Higher resolution (1080P) and longer durations take more time. For production workflows, use webhooks instead of polling for scalable integration.
Rate limits depend on your subscription tier. See the Rate Limits page for current limits by plan.
See the Pricing page for current rates and subscription options.
WAN 2.7 adds audio-guided generation, 5 aspect ratios (vs limited options in 2.6), extended duration range of 2-15 seconds, and higher prompt limits (5000 characters). WAN 2.6 offers multi-shot sequences. Choose WAN 2.7 for the latest capabilities and audio input support.

Best practices

  • Prompt writing: Be specific about scenes, camera movements (zoom, pan, tilt), lighting, and atmosphere. Detailed prompts produce better results than vague descriptions.
  • Audio input: Use clean audio files with clear rhythm or speech for best audio-guided results. Ensure audio duration aligns with your target video duration.
  • Negative prompts: Always include common artifacts to avoid: “blurry, low quality, watermark, text, distortion, extra limbs”
  • Duration selection: Start with shorter durations (2-5 seconds) for quick iterations, then increase for final outputs.
  • Prompt expansion: Leave enabled (default) for short prompts. Disable for precise control over generation.
  • Reproducibility: Save the seed value from successful generations to recreate similar results.
  • Production integration: Use webhooks for scalable applications instead of polling.
  • Error handling: Implement retry with exponential backoff for 503 errors during high-demand periods.