Kling 3.0 Prompt Guide
What's New in Kling 3.0
Discover the powerful new capabilities that make Kling 3.0 the most advanced AI video generation model yet.
Smart Storyboard
AI director capabilities that automatically plan multi-shot videos. The system intelligently captures scene transitions from your prompt, orchestrating shot types and camera positions—from classic shot-reverse-shot dialogues to advanced cross-cutting and voice-overs. Generate mature cinematic narratives in one go, making complex visual storytelling accessible to every creator.
Image-to-Video + Subject Reference
World's first image-to-video with subject reference capability. Built on deep multimodal understanding, add multiple subject images or videos on top of your first frame to lock specific elements. Like a professional casting director, the model maintains character, prop, and scene features consistently—no matter how the camera moves.
Multi-Character Reference
Major upgrade to audio-visual synchronization with precise text-to-character mapping. In multi-person scenes, direct exactly who speaks when, completely solving reference confusion. Support for 3+ characters simultaneously, perfect for group scenes and ensemble casts with accurate character-to-dialogue assignment.
Multi-Language Support
Support for multiple languages including Chinese, English, Japanese, Korean, and Spanish, plus authentic regional dialects and accents. Even mix languages within a single video. Whether it's bilingual workplace conversations or dialect-rich everyday dialogue, lip-sync and expressions remain natural and seamless.
Native Text Capability
Preserve or generate text content with precision. Whether maintaining signs, subtitles, and details from original images, or creating new text content, the model ensures clear characters and rigorous structure. Enhances physical realism while meeting high-fidelity text requirements for e-commerce ads and other scenarios.
Extended Duration & Flexible Control
Unlock up to 15 seconds of continuous video generation with flexible 3-15 second duration control. This isn't just longer videos—it's narrative reconstruction. Within a 15-second window, the model accommodates complex action logic and environmental evolution. From nuanced long takes to multi-layered plot development, complete stories unfold in a single generation cycle.
Kling 2.6 vs Kling 3.0
See how Kling 3.0 expands capabilities with new features for advanced video creation.
| Feature | Kling 2.6 | Kling 3.0 |
|---|---|---|
| Text-to-Video | √ | √ |
| Image-to-Video | √ | √ |
| First/Last Frame to Video | √ | √ |
| Native Audio-Video Sync | √ | √ |
| Smart Storyboard | × | √ |
| First Frame + Subject Reference | × | √ |
| 3+ Character Reference | × | √ |
| Multi-Language Support | × | √ |
| Dialects and Accents | × | √ |
| Generate up to 15s | × | √ |
| Flexible Custom Duration | × | √ |
Master Kling 3.0 with Real Examples
🎬Smart Storyboard
With the Smart Storyboard feature enabled, Kling 3.0 automatically plans scene transitions, shot types, and camera positions based on your prompt. The model intelligently decides when to use multiple shots versus a single continuous take, creating cinematic narratives that feel professionally directed.
Prompt
Outdoor terrace scene in a European villa. At the dining table covered with blue and white plaid tablecloth, a young white woman wears a blue and white striped short-sleeved shirt, khaki shorts, and a brown belt, sitting barefoot. Opposite is a young white man wearing a white T-shirt. The camera advances and the woman shakes the juice in the glass, looking toward the distant woods and saying, "These trees will turn yellow in a month, won't they?". The camera closes up and the man lowers his head and says, "but they'll be green again next summer." Then the woman turned her head, looked at the man opposite with a smile, and said, "Are you always this optimistic? Or just about summer?" Then the man raised his head, looked at the girl and said, "Only about summers with you."
Picture

Kling 3.0 Generated Video
Prompt
A middle-aged man is ordering food at a Western restaurant, speaking in English with an Indian accent: "Excuse me, I would like to order a seafood pasta, and a filet mignon. Medium-rare", then looking up and continuing: "And, do you have any drink recommendations?"
Picture

Kling 3.0 Generated Video
🎥Custom Storyboard
With Custom Storyboard enabled, you can precisely control each shot's content and duration. The model strictly follows your instructions to generate multi-shot videos that match your exact vision, giving you complete creative control over the narrative structure.
Prompt
Shot 1 - Low-angle follow-up shot from the rear, with the rider driving forward. Shot 2 - Low-angle close-up shot from the side, close-up shot of the motorcycle wheel. Shot 3 - First-person subjective perspective of the rider, with the motorcycle handlebar and dial in front. Shot 4 - Follow-up shot of the motorcycle facing the front, medium shot. The rider's helmet is facing the camera. Shot 5 - sideways flat shot with follow-up shot (slight follow-up shift). Shot 6 - high-altitude slightly zoomed-in perspective. The camera is zoomed up to shoot the snowmobile driving deep into the snowfield. The ruts draw winding lines on the pure white snow, and there are forests covered with snow on both sides.
Picture

Kling 3.0 Generated Video
Prompt
Shot 1, the woman looked into the distance and said: "I am here today!" Then the man continued to look ahead and said: "Let's see who can bully my good lord!" Shot 2, a close-up of Man 1 shyly and weakly leaning on the woman, and said very gently: "Thank you for having you." Shot 3, the man and woman are slightly blurred in the foreground of the shot, zooming in very fast to get a close-up to the surprised eyes of the old man watching.
Picture

Kling 3.0 Generated Video
🖼️Image-to-Video + Subject Reference
Kling 3.0's groundbreaking feature allows you to bind specific subjects on top of image-to-video generation. Lock visual core elements to ensure characters, props, and scenes remain consistent throughout the video—even with complex camera movements. Perfect for maintaining brand identity and character continuity.
Prompt
The texture of the real workplace, long shots without switching from end to end, the mid-range shots of working women are steadily followed throughout the whole process, the camera moves synchronously with the characters: when the characters walk, the camera follows them simultaneously, and when the characters pause, the camera freezes immediately, the movements are natural and coherent, and the camera moves smoothly. The woman walked forward out of the elevator, and the elevator door closed slowly and naturally behind her. She walked into the office area, took off her sunglasses with her hands, put them into her commuting bag, and nodded naturally to the colleagues she passed along the way. She paused briefly, and the camera stopped simultaneously. She hung her commuting bag on a hanger in the office area, then took off her outer coat and hung it on the same hanger. After hanging her clothes, she continued to move forward, the camera follows simultaneously; a boy wearing a standard shirt walks forward and hands over a document and a signature pen. She stops and the camera remains still, and then signs the document; after signing, she continues to move forward, and the camera follows simultaneously; finally she walks to the desk, sits down by the chair, reaches out to pick up a cup of tea on the table, lowers her head and sips it, her movements are relaxed and natural.
Picture

Subject Reference



Kling 3.0 Generated Video
Prompt
The camera gradually panned to the front of the girl, and then the girl raised her head and smiled warmly at the camera, as if she was seeing her long-time friend.
Picture

Subject Reference



Kling 3.0 Generated Video
👥Multi-Character Reference
Clearly specify dialogue for each character in your prompt, and Kling 3.0 automatically parses character-to-dialogue relationships. Easily solve multi-character reference confusion and achieve directional speech in multi-person scenes. Kling 3.0 excels at handling 3+ characters, delivering superior narrative effects.
Prompt
In a home environment, there is a slight wind sound from the living room air conditioner in the background, which is suitable for realistic daily life. Mom (softly sighing with surprise): Wow, I didn't expect this plot at all. Dad (agreeing in a low voice with a calm tone): Yeah, it's totally unexpected, never thought that would happen. Boy (with an excited tone): It's the best twist ever! Girl (nodding along with an excited tone): I can't believe they did that!
Picture

Kling 3.0 Generated Video
Prompt
A female teacher and three students are standing in the classroom, the camera is fixed, the female teacher says: "We don't say it, it's for sure, but you can say it", then the speaker switches to a female student wearing a school uniform, the female student raises one hand and clenches a fist and says: "We are determined to win", then the speaker switches to the first male student wearing a uniform, the male student smiles and says: "Foolproof", then the speaker switches to a male student standing on the far right, wearing a uniform, the male student raises his hands and says, "You are sure to win."
Picture

Kling 3.0 Generated Video
🌍Multi-Language Support
Kling 3.0 supports dialogue output in Chinese, English, Japanese, Korean, and Spanish. You can even mix multiple languages within a single video. After inputting the corresponding text, the model automatically matches pronunciation and achieves seamless multi-language switching. For dialogue in non-supported languages, the model automatically translates to English.
Prompt
In the scene on the rooftop of a Korean high school, there are distant city lights and slight wind sounds in the background, and stars twinkle in the night sky. The female protagonist was leaning against the railing in a daze. The male protagonist came over with two cans of Coke and handed them to the girl. The female protagonist took the Cokes and opened them. Male protagonist (relaxed tone, Korean): "숙제 다 했어? 왜 여기 있어?" Female protagonist (sigh, Korean): "시험이 너무 무서워." Male protagonist (gentle, Korean): "걱정 마, 넌 잘할 거야."
Picture

Kling 3.0 Generated Video
Prompt
The camera focuses on the interaction between the two. The lady's eyes are gentle and the maid lowers her head to listen. The noble lady raised her hand to flick her sleeves and spoke in a gentle tone: "오늘 후원에서 피어난 꽃을 보니, 시원한 바람이 분다. 너도 함께 걸어볼까?" The maid leaned forward slightly and responded respectfully: "네, 아씨님. 따라갈게요."
Picture

Kling 3.0 Generated Video
Prompt
The sun was shining all over the old streets of Madrid. In front of the bakery on the street, a female Chinese tourist and a boy wearing a gray hoodie walked towards the shop assistant, both of them smiling politely. Female tourist (speaks slowly, with a bad accent, in Spanish): Disculpe, ¿dónde está la plaza mayor? The white-haired Spanish clerk (pointing sideways, speaking briskly, in Spanish): Por allí, a dos calles. Muy cerca. The female tourists nodded in thanks, and the male tourists nodded in agreement (Spanish): Muchas gracias. The clerk smiled and nodded in response, and the two turned and walked in the direction indicated.
Picture

Kling 3.0 Generated Video
🗣️Dialects and Accents
Specify the dialect or accent in your prompt, and the model can restore character intonation and rhythm for authentic dialect and accent performance. Kling 3.0 supports Chinese dialects (Northeast, Beijing, Taiwan, Cantonese, Sichuan, etc.) and English accents (American, British, Indian, etc.). Simply note the desired accent or dialect type in the dialogue section.
Prompt
Fixed shot of a young barista wiping his cup. The barista raised his head and looked at the camera and said in Sichuan dialect: "Hey, you're here. What do you want to drink today, iced Americano or latte? New beans just arrived today."
Picture

Kling 3.0 Generated Video
Prompt
In a high-end office building, the man leaned back and said in Cantonese with a kind of tired disgust: "Actually... I'm really not good at buying your logic. Aligning a proposal can't achieve our core value at all. Your flow is so messy, how can you convince a client? Why don't you go back and re-think the next angle? I'll see a final version in the morning."
Picture

Kling 3.0 Generated Video
📝Native Text Capability
Kling 3.0's native text capability precisely preserves text details from original images, perfect for e-commerce ads and creative short dramas. The model automatically recognizes text content in uploaded images (signs, subtitles, logos, etc.) and maintains text consistency, effectively avoiding text drift and blur to ensure complete information presentation.
Prompt
In the scene by the window of a Paris apartment, there is a soft French piano BGM in the background, and the gilded afternoon sunlight shines through the blinds on the perfume bottle, creating mottled light and shadow. The camera slowly advances from the scattered rose petals, and the focus moves to the cut surface of the Kling perfume bottle. The narration (lazy French female voice, British accent, speaking at a slow pace): Bathe in the golden hour. The camera circles the perfume bottle in slow motion, capturing the flow of light and shadow on the golden lettering and the bottle. The narration: Kling, a whisper of Parisian elegance. The camera zooms out to freeze the complete scene (the perfume bottle stands on a velvet pedestal, and Paris buildings are looming outside the window), with the narration: Wrap yourself in luxury with every breath.
Picture

Kling 3.0 Generated Video
Prompt
The camera always focuses on the "KLING" on the baseball bat, and the player hits the baseball
Picture

Kling 3.0 Generated Video
⏱️15s Long Shot Generation
Kling 3.0 unlocks 15-second long shot video generation with flexible 3-15 second duration control. This accommodates more complex action logic and plot changes, presenting complete story arcs from beginning to end. Say goodbye to fragmented video stitching and improve both creative efficiency and work coherence.
Prompt
The super wide-angle mid-range horizontal tracking shot opens with the stabilizer moving low to the ground. The cold blue night and the silvery starry sky form a high-contrast romantic movie tone, with strong poetic realism and classical epic temperament. The subject is a young woman wearing a dark green dress, running with all her strength on the garden grass illuminated by the moonlight, and the skirt is lifted by the wind to form a turbulent dynamic curve. Thread, holding a small white flower in her right hand, lifting the hem of her skirt with her left hand, breathing quickly but with a firm gaze; at the 4th second, the camera accelerates as she moves forward, and in the background, many men and women wearing old-era dresses enter the frame from the left and right sides, running side by side with her. Some people try to get closer, some turn around and call out, but no one actually touches her, suggesting pursuit and escape; At the 8th second, the camera gradually zooms in to the medium shot, pans to the front of the protagonist, and slightly raises the camera. She turns back briefly to look at a young male character behind her. Their eyes meet for a moment, and emotions burst out during the run. The woman and the man hold hands and run together; at the 12th second, the music and action reach a climax, and the camera closely follows her profile and flying hair, she let go and threw the white flowers into the air, and the flowers slowly fell and were passed by the crowd behind them; for the last three seconds, the camera kept moving forward, and the woman and the man rushed out of the crowd and ran towards the starry sky at the end of the garden. Their figures gradually occupied the center of the picture. The overall atmosphere was fiery, romantic and decisive, like an explosive narrative about fate, choice and freedom.
Picture

Kling 3.0 Generated Video
Prompt
This is a 15-second cinematic long shot. One shot to the end, no editing transitions. The scene is set inside a plaster tower with mottled light and shadow, surrounded by huge white plaster statues, creating a mysterious and depressing atmosphere. When the picture begins, the protagonist has just finished running violently and stops suddenly in the center of the scene, with his chest heaving, his expression dazed and helpless, and his eyes showing fear. The lens is centered on the protagonist and performs smooth 360-degree panning shooting. While the camera rotates, the protagonist looks around anxiously and shouts: "Alex! Alex where are you! Are you here?" Then a cute dinosaur cry sounds in the background, and then the camera moves over the shoulder to behind the protagonist, and a small and medium-sized cute little dinosaur comes out from behind a plaster pillar and makes a cute cry. The protagonist turned around suddenly when he heard the sound, burst into tears when he saw the dinosaur, and rushed forward desperately to hug the dinosaur tightly. The dinosaur snuggles in the protagonist's arms obediently. The protagonist is crying and caressing the dinosaur tenderly, and tremblingly says: "I find you! Thanks for god, I'm so scared!". The overall lighting and shadows have a cinematic quality, and the mood changes from despair to an extremely touching reunion.
Picture

