“Veo3”調(diào)研分析

0 評論 1330 瀏覽 2 收藏 37 分鐘

不少人在用Veo3,卻不知道它背后的設(shè)計為什么這么“順手”。這篇文章從用戶視角出發(fā),用調(diào)研和分析帶你看懂:一個工具變好用,其實背后藏著不少“看不見的決策”。

一、產(chǎn)品概述

1.1 產(chǎn)品簡介

Veo3:Veo3于北京時間 2025 年 5 月 21 日在I/O 開發(fā)者大會上發(fā)布。是一款由Google開發(fā),功能強大的 AI 視頻生成器,可以創(chuàng)建具有原生音頻、精確運動控制和基于參考的生成功能的高質(zhì)量視頻。

二、產(chǎn)品背景和目標(biāo)

2.1 背景

Veo3源自谷歌在2025年I/O開發(fā)者大會發(fā)布的最新一代視頻生成AI模型,旨在突破傳統(tǒng)視頻創(chuàng)作中畫面與聲音分離、缺乏同步配音和環(huán)境音效的瓶頸。在此之前,整個AI視頻應(yīng)用市場上視頻生成多為無聲或需后期復(fù)雜配音處理,Veo3首次實現(xiàn)了原生音視頻同步生成,支持自動生成背景音效、人物對白及唇形同步,極大提升了視頻的真實感和沉浸感

2.2 目標(biāo)

Veo3的目標(biāo)其實就是旨在打造一個高質(zhì)量、全流程自動化的多模態(tài)AI視頻生成平臺,可以通過文字或圖像提示,一站式生成帶有同步音頻(包括環(huán)境音效、人物對白及口型同步)的高清視頻內(nèi)容,徹底改變傳統(tǒng)視頻創(chuàng)作中畫面與聲音分離、后期繁瑣配音的現(xiàn)狀,能夠賦能各級視頻創(chuàng)作者。

三、Veo3的核心技術(shù)和設(shè)計理念

3.1 Veo3的產(chǎn)品核心理念

徹底革新視頻創(chuàng)作方式,實現(xiàn)從文字提示到高質(zhì)量視聽內(nèi)容的一站式自動生成,告別傳統(tǒng)“無聲視頻”時代。

3.2 Veo3的核心技術(shù)表現(xiàn)

  • 核心技術(shù)突破在于實現(xiàn)了視覺、語音和音效的完美同步生成。Veo3能夠根據(jù)文字描述,同時生成高質(zhì)量的視頻畫面和與之匹配的對白、環(huán)境音效、背景音樂,徹底省去了傳統(tǒng)視頻后期配音和音效處理的繁瑣流程。
  • 在唇形同步方面,Veo3通過先進的深度學(xué)習(xí)模型確保人物口型與語音高度一致,被認(rèn)為是目前市場上唇形同步效果最佳的模型之一。
  • Veo3具備物理效果模擬能力,如流體、水動、光影變化和物體運動等,畫面更加符合現(xiàn)實世界物理規(guī)律,提升了視覺真實感。
  • 具備對電影語言的深刻理解,能夠執(zhí)行復(fù)雜鏡頭指令(推、拉、搖、移等),生成多樣化且具有藝術(shù)表現(xiàn)力的鏡頭,滿足專業(yè)影視制作需求。
  • 產(chǎn)品設(shè)計以用戶體驗和產(chǎn)品性能為核心。其界面直觀,工作流程科學(xué),集成幫助系統(tǒng)和優(yōu)化提示,使用戶即使無復(fù)雜專業(yè)背景也能高效創(chuàng)建專業(yè)級視頻內(nèi)容。
  • 技術(shù)架構(gòu)支持多技術(shù)并行實時處理,確保數(shù)據(jù)吞吐量、精度和速度的平衡,配合64位Linux操作系統(tǒng)和高速存儲,適合大規(guī)模高質(zhì)量視頻生成。
  • Veo3與GoogleFlow平臺緊密結(jié)合,形成從文字輸入到視頻輸出的端到端解決方案,降低創(chuàng)作者的技術(shù)門檻,適用于電影制作、廣告、教育等多場景應(yīng)用。
  • 設(shè)計中重視內(nèi)容安全,內(nèi)置數(shù)字浮水印和內(nèi)容安全篩選,避免技術(shù)濫用及虛假信息的傳播。

四、AI視頻市場行情分析

4.1 AI視頻生成市場規(guī)模

AI視頻生成的市場規(guī)模將進一步擴大。據(jù)Fortunebusinessinsights測算,2024年全球AI視頻生成市場規(guī)模約為6.1億美元,預(yù)計到2032年將達25.6億美元,2024-2032年的年復(fù)合增長率約為19.5%。AI視頻生成市場規(guī)模持續(xù)增長的主要驅(qū)動因素如下:

  • AI生成視頻的成本低:AI生成視頻的成本遠低于現(xiàn)有視頻內(nèi)容的制作成本,據(jù)量子位智庫數(shù)據(jù),頂級動畫電影(迪士尼、皮克斯等制作)每分鐘的制作成本約達200萬美元,而AI視頻生成的內(nèi)容成本每分鐘約300美元,降本效果明顯;
  • 應(yīng)用場景廣泛:AI視頻已逐步在影視制作、廣告營銷、短視頻、電商、動漫等多個領(lǐng)域進行應(yīng)用,有效提升各領(lǐng)域視頻制作的效果,同時降低制作成本;
  • 內(nèi)容視頻化是主流趨勢:據(jù)QuestMobile顯示,截至2024年9月,移動視頻行業(yè)總體月活用戶規(guī)模達11.36億,視頻已逐步成為流量的核心形式。此外,在2024年10月的中國移動(111.690,-0.07,-0.06%)全球合作伙伴大會上,華為董事長梁華表示,目前在線視頻流量占據(jù)網(wǎng)絡(luò)流量的70%,用戶對視頻內(nèi)容的依賴度高;
  • 技術(shù)創(chuàng)新:深度學(xué)習(xí)、神經(jīng)網(wǎng)絡(luò)、自然語言處理等關(guān)鍵技術(shù)的突破,為AI視頻生成提供了強大的技術(shù)支持,使AI在視頻生成和處理時更為高效和準(zhǔn)確,進而生成更加逼真的視頻內(nèi)容;
  • 政策支持:隨著AI產(chǎn)業(yè)的快速發(fā)展,國家和地方政府出臺了一系列政策文件,在資金、人才、政策等方面均給予大力支持,加快推動人工智能相關(guān)技術(shù)與產(chǎn)業(yè)的融合。

圖表 1:2023-2032E全球AI視頻生成市場規(guī)模(億美元)

數(shù)據(jù)來源:Fortune businessinsights、RimeData來覓數(shù)據(jù)整理

4.2 AI視頻生成投融動態(tài)

AI視頻生成領(lǐng)域技術(shù)不斷迭代升級,逐步可生成視頻時長更長、場景更復(fù)雜的視頻內(nèi)容,應(yīng)用范圍進一步拓寬,增加了投資者的信心。2024年,全球AI視頻生成領(lǐng)域融資規(guī)模合計已超600億元,整體以早期融資為主,行業(yè)仍處于快速發(fā)展階段。下表是2024年AI視頻生成賽道億元及以上投融事件,感興趣的讀者可以登錄Rime PEVC平臺獲取AI視頻生成領(lǐng)域全量融資案例、被投項目及深度數(shù)據(jù)分析。

圖表 2:2024年AI視頻生成賽道億元及以上投融事件

數(shù)據(jù)來源:RimeData來覓數(shù)據(jù)

4.3 AI視頻生成行業(yè)應(yīng)用市場分析

市場分為培訓(xùn)和教育、營銷和廣告、社交媒體和其他。2024年,營銷和廣告領(lǐng)域占據(jù)了最大的市場份額。這得益于人工智能視頻生成器的使用日益增多,它能夠以經(jīng)濟高效的方式優(yōu)化廣告和營銷內(nèi)容的質(zhì)量。采用人工智能視頻內(nèi)容生成工具也有助于提供高質(zhì)量的視頻,以滿足目標(biāo)受眾的特定營銷需求,并提升品牌知名度。預(yù)測期內(nèi),社交媒體領(lǐng)域?qū)⒁宰罡咚俣仍鲩L。這得益于深度偽造圖像處理和自然語言處理等多媒體技術(shù)的日益普及,這些技術(shù)旨在生成更全面、更引人入勝的視頻內(nèi)容,并提升用戶參與度。

4.4 面臨的主要挑戰(zhàn)

1)技術(shù)與質(zhì)量瓶頸

  • 時間一致性與角色連貫性不足:當(dāng)前模型難以保證跨幀角色一致性,尤其是人物面貌、表情、衣著等細節(jié)容易出現(xiàn)跳幀、失真現(xiàn)象,影響整體視覺效果
  • 短視頻為主,長片仍難:生成內(nèi)容通常限定在幾秒到幾十秒范圍,長視頻中故事連貫性、場景轉(zhuǎn)換、鏡頭語言處理仍是瓶頸
  • 語義與計數(shù)控制有限:對關(guān)鍵詞、數(shù)量、數(shù)量布局等指令響應(yīng)不穩(wěn)定,如“生成五個人”時常失敗;也容易誤解上下文意圖,生成與prompt不符的視頻內(nèi)容

2)數(shù)據(jù)偏差與倫理問題

  • 模型訓(xùn)練帶來偏見:訓(xùn)練數(shù)據(jù)若缺乏多樣性,會導(dǎo)致性別、種族、文化等偏見在生成內(nèi)容中反映,加劇不公平表達
  • 深偽(Deepfake)濫用風(fēng)險:具備高度真實感的視頻可能被用于制造假新聞、冒充公眾人物、傳播不實信息,惡化社會信任危機

3)法律與監(jiān)管挑戰(zhàn)

  • 版權(quán)歸屬未明確:AI生成內(nèi)容的著作權(quán)歸屬尚無統(tǒng)一標(biāo)準(zhǔn),AI本身不能作為法律作者,很多細節(jié)仍待界定
  • 法規(guī)尚未完善執(zhí)行:歐盟AIAct、美國加州AB3211、丹麥擬授予公民面貌版權(quán)等法規(guī)雖已出臺,但不同國家標(biāo)準(zhǔn)不一,技術(shù)適配滯后
  • 監(jiān)管執(zhí)行復(fù)雜:AI內(nèi)容跨境傳播難以追責(zé),平臺監(jiān)管難度高,法律適用性、證據(jù)鏈條等都存在挑戰(zhàn)

4)覆蓋資源與成本壓力

  • 算力消耗高:高質(zhì)量視頻生成需要大量GPU、存儲、能源,對非企業(yè)用戶或研究者形成明顯門檻
  • 規(guī)模生產(chǎn)成本攀升:隨著批量化內(nèi)容生成,如何在保證質(zhì)量的前提下降低時間和經(jīng)濟成本成為難題

5)用戶接受度與產(chǎn)業(yè)融合障礙

  • 品牌/制作方對接不緊密:主流品牌仍對AI-generated視頻持謹(jǐn)慎態(tài)度,擔(dān)心質(zhì)量不穩(wěn)定、品牌形象受損或原創(chuàng)性不足
  • 產(chǎn)業(yè)融合體驗不足:將AI視頻納入傳統(tǒng)制作流程的實踐尚在萌芽階段,接口、插件、培訓(xùn)、流程兼容等缺少完善方案

五、主要競品分析

六、用戶畫像

6.1 Veo 3 主要用戶類型比例(推測數(shù)據(jù))

七、產(chǎn)品功能結(jié)構(gòu)

7.1 產(chǎn)品功能亮點

  • 原生音頻生成:Veo3能夠在生成視頻的同時同步生成環(huán)境音效(如雨聲、風(fēng)聲)、物理交互音效(腳步聲、敲擊聲)、氛圍音樂和多角色對話,徹底擺脫了傳統(tǒng)視頻“無聲時代”的限制。
  • 精準(zhǔn)唇形同步:通過V2A(Video-to-Audio)技術(shù)和深度學(xué)習(xí)模型,Veo3實現(xiàn)了多人物對白時唇形與語音的精確匹配,提升數(shù)字人形象的自然度和真實感,非常適合數(shù)字人物創(chuàng)作、虛擬主播、教育培訓(xùn)等多場景應(yīng)用。
  • 高解析度視頻生成:支持最高1080p高清畫質(zhì),能夠生成長達約60秒的復(fù)雜敘事片段,包含自然運動、動態(tài)構(gòu)圖和復(fù)雜鏡頭表達,如縮時、空拍、長鏡頭等。
  • 多語種支持:除英文外,還支持中文、日文、韓文等多種語言輸入,提升了跨地區(qū)和多語言內(nèi)容創(chuàng)作的靈活性。
  • 一次性全流程生成:Veo3可基于文字提示直接生成視頻畫面、配音、音效、音樂和對口型,簡化傳統(tǒng)繁瑣的后期制作流程,大幅提高創(chuàng)作效率并降低技術(shù)門檻。
  • 物理效果與真實感提升:光影效果、反射折射、流體布料模擬等物理渲染更加逼真,人物和動物動作流暢自然,增強視覺沉浸感。
  • 集成和可用性:目前搭載于GoogleAI平臺如VertexAI和Flow,支持實時預(yù)覽與調(diào)整,同時面向商業(yè)和創(chuàng)作者用戶開放,訂閱價格等細節(jié)也已公布。

7.2 產(chǎn)品功能結(jié)構(gòu)圖

八、Veo3實操案例

  1. prompt:

{ “character_name”: “Nyx Cipher”, “character_profile”: { “age”: 27, “height”: “5’8\” / 173 cm”, “build”: “lean, athletic, swimmer’s shoulders”, “skin_tone”: “deep bronze with a subtle sun-kissed glow”, “hair”: “jet-black, shoulder-length, slicked straight back and dripping”, “eyes”: “almond-shaped hazel with faint gold flecks”, “distinguishing_marks”: “tiny star tattoo tucked behind her right ear; gold stud in upper left helix”, “demeanour”: “playfully self-assured, almost dare-you smirk” }, “global_style”: { “camera”: “smooth gimbal 35 mm, medium close-ups with occasional waist-up pull-backs”, “color_grade”: “hyper-saturated neon-tropic (hot-pink, aqua, tangerine)”, “lighting”: “mid-day pool reflections, specular highlights on wet skin”, “outfit”: “metallic-coral bikini, mirrored sunglasses, gold hoop earrings”, “max_clip_duration_sec”: 8, “aspect_ratio”: “16:9”, “mouth_shape_intensity”: 0.85, “eye_contact_ratio”: 0.7, “audio_defaults”: { “format”: “wav”, “sample_rate_hz”: 48000, “channels”: 2, “style”: “trap-pop rap, 145 BPM, swung hats, sub-bass” } }, “clips”: [ { “id”: “S1_SplashCash”, “shot”: { “composition”: “Medium close-up, 35 mm lens, deep focus, smooth gimbal”, “camera_motion”: “slow dolly-in 60 cm”, “frame_rate”: “24 fps”, “film_grain”: 0.05 }, “subject”: { “description”: “Nyx Cipher — 27-year-old, 173 cm, toned-athletic build; deep-bronze skin glistening with water; jet-black slicked-back hair; almond hazel eyes behind mirrored sunglasses; small star tattoo behind right ear; wearing metallic-coral bikini and gold hoop earrings”, “wardrobe”: “metallic-coral bikini, mirrored sunglasses, gold hoop earrings” }, “scene”: { “location”: “rooftop infinity pool overlooking a neon-tropic city skyline”, “time_of_day”: “mid-day”, “environment”: “sunlit pool water reflecting shifting patterns; floating dollar-sign inflatables” }, “visual_details”: { “action”: “Nyx leans on pool edge and, on beat four, fans her hand cheekily toward camera as droplets sparkle in the air”, “props”: “floating dollar-sign inflatables” }, “cinematography”: { “lighting”: “high-key mid-day sunlight with specular highlights on wet skin”, “tone”: “vibrant, playful, confident” }, “audio_track”: { “lyrics”: “Splash-cash, bling-blap—pool water pshh! Charts skrrt! like my wave, hot tropics whoosh!”, “emotion”: “confident, tongue-in-cheek”, “flow”: “double-time for first bar, brief half-time tag”, “wave_download_url”: null, “youtube_reference”: null, “audio_base64”: null }, “color_palette”: “hyper-saturated neon-tropic (hot-pink, aqua, tangerine)”, “dialogue”: { “character”: “Nyx Cipher”, “line”: “Splash-cash, bling-blap—pool water pshh! Charts skrrt! like my wave, hot tropics whoosh!”, “subtitles”: false }, “performance”: { “mouth_shape_intensity”: 0.85, “eye_contact_ratio”: 0.7 }, “duration_sec”: 8, “aspect_ratio”: “16:9”, } ] }

效果:

https://www.bilibili.com/video/BV1Hrg8zAEWt/?vd_source=54b47fd35fdcc4ac899eedbc59fdfa85

2. prompt:

{ “shot”: { “composition”: “Selfie-style medium close-up of a young woman walking, camera at arm’s length facing her”, “camera_motion”: “slight bounce with each step, occasionally panning to show the street around her”, “frame_rate”: “30fps (phone camera feel)”, “film_grain”: “sharp digital clarity, slight phone camera auto-stabilization” }, “subject”: { “description”: “A vibrant 21-year-old Israeli TikTok influencer with long dark curly hair under a white bucket hat and small gold Star-of-David huggies. Warm olive skin, freckles, and bright hazel eyes.”, “wardrobe”: “Light-wash denim cropped jacket over a sand-colored ribbed tank, high-waisted beige cargo pants, white chunky sneakers, and a small woven shoulder bag with colorful Tel-Aviv-market patterns.” }, “scene”: { “location”: “a bustling Tel-Aviv sidewalk along Rothschild Boulevard”, “time_of_day”: “morning”, “environment”: “busy street lined with Bauhaus cafés, eucalyptus trees, cyclists, and dog-walkers; golden morning light reflecting off white fa?ades” }, “visual_details”: { “action”: “She walks casually and greets a familiar barista with a free-hand wave while her right hand keeps the phone stable. No props in her left hand—both hands remain visible at all times.” }, “cinematography”: { “lighting”: “warm golden-hour sunlight, even on her face”, “tone”: “upbeat, personal, candid”, “notes”: “vlog style; she speaks directly to camera in fluent Hebrew. Handheld feel, no filters, no on-screen text.” }, “audio”: { “ambient”: “Tel-Aviv street sounds: distant scooters, bicycle bells, light Hebrew chatter, rustling eucalyptus leaves”, “voice”: { “tone”: “cheerful, conversational”, “style”: “fluent Hebrew with native Tel-Aviv intonation and rhythm” } }, “dialogue”: { “character”: “Vlogger”, “line”: “??? ?????! ??? ???? ???? ???? ????? ????—????? ???? ???? ????. ???? ???? ???!”, “subtitles”: false }, “visual_rules”: { “prohibited_elements”: [ “subtitles”, “captions”, “text overlays”, “user interface elements”, “watermarks” ] } }

效果:

3. prompt:

{ “shot”: { “composition”: “starts with extreme close-up on dancers’ feet then moves to full shot”, “camera_motion”: “low tracking along the wet floor following fast footwork, then a smooth arc upward into an overhead orbit around the dancers”, “frame_rate”: “24fps”, “film_grain”: “clean digital with slight motion blur for realism” }, “subject”: { “description”: “Competing street dancers locked in an energetic battle, bodies in sync and expressive”, “wardrobe”: “casual streetwear with bright accents and sneakers” }, “scene”: { “location”: “gritty warehouse set with graffiti-covered walls and puddles on the floor”, “time_of_day”: “night under neon lights”, “environment_details”: “water splashes with each movement, strobe lights pulse in the background” }, “visual_details”: “Sweat glistens, water sprays up from the floor, neon reflections shimmer, dancers freeze mid-move”, “cinematography”: { “lighting”: “neon lighting in pinks and blues reflecting off puddles, balanced fill lights to maintain detail at 720p”, “tone”: “high-energy and edgy”, “style”: “music video inspired dance battle” }, “audio”: { “ambient_sounds”: [ “crowd cheering and clapping”, “shoes squeaking on wet concrete” ], “music”: “upbeat hip-hop beat synced to choreography”, “effects”: “reverb that matches warehouse acoustics” }, “color_palette”: “bold neon pinks, blues, and purples against dark greys”, “dialogue”: {}, “visual_rules”: { “prohibited_elements”: [ “text overlays”, “captions”, “subtitles” ] } }

效果:

4. prompt:

{ “shot”: { “composition”: “Medium close-up, 50mm lens, shot on ARRI Alexa Mini LF, slight push-in, shallow depth of field”, “camera_motion”: “slow push-in”, “frame_rate”: “24fps”, “film_grain”: “subtle Kodak Vision3 250D overlay” }, “subject”: { “description”: “A young woman with large icy-blue doll-like eyes, flawless porcelain skin, and long platinum blonde hair in high twin ponytails tied with black satin ribbons. She has straight-cut bangs above her eyes. Her makeup is delicate: light pink blush, glossy lips, a shimmer in her eye corners, and subtly winged eyeliner. She wears a deep violet satin off-shoulder corset dress trimmed with black lace, puffed satin sleeves, a wide black belt with a gold buckle, long black opera gloves, sheer thigh-high stockings, and a velvet choker tied in a small bow.” }, “wardrobe”: “Deep violet satin off-shoulder corset mini dress, black lace trim, puffed sleeves, gold-buckled black belt, black opera gloves, sheer thigh-highs, black velvet choker”, “scene”: { “location”: “anime convention stage”, “time_of_day”: “interior with theatrical stage lighting”, “environment”: “behind her, a massive curved LED screen plays a dreamy galactic animation with drifting stars and glowing nebulae” }, “visual_details”: { “action”: “She raises her right hand in a friendly wave, then clasps it over her heart while smiling and speaking to the audience”, “props”: “LED cosmic backdrop, side-fill spotlights, soft light haze on stage” }, “cinematography”: { “lighting”: “cool front beauty lighting with soft fill; galaxy screen adds ambient blue-violet glow; rear hair light creates rim effect”, “tone”: “idol-like, dreamy, playful” }, “audio”: { “ambient”: “soft hum from the stage screen, faint ethereal chime tones in the background”, “voice”: “Ani (playful, high-pitched Japanese anime girl tone with melodic cadence): ‘Hiii minna-san~! Ani da yo~! Yoroshiku ne! My DLC is coming soon… tanoshimi ni shite neee~!'”, “subtitles”: false }, “color_palette”: “cosmic blues and purples with deep violet accents, subtle shimmer on fabrics and skin highlights”, “dialogue”: { “character”: “Ani”, “line”: “Hiii minna-san~! Ani da yo~! Yoroshiku ne! My DLC is coming soon… tanoshimi ni shite neee~!”, “subtitles”: false } }

效果:

5. prompt:

{ “runtime_sec”: 8, “captions”: { “burn_in”: false, “generate”: false, “force_no_captions”: true }, “postprocess”: { “strip_text_layers”: true, “remove_layers”: [“Text”] }, “shot”: { “composition”: “Selfie-vlog, neon shop signs behind”, “camera_motion”: “handheld sidestep, gentle roll”, “frame_rate”: “30fps”, “camera_model”: “Galaxy S24 Ultra, HDR10+”, “lens”: “23 mm equiv f/1.8”, “white_balance”: “3800K”, “film_grain”: “mobile sensor noise 8 %” }, “subject”: { “name”: “??”, “age”: 21, “ethnicity”: “Korean”, “appearance”: “short ash-brown bob, silver hoop earrings”, “wardrobe”: “oversized lilac hoodie, black pleated mini, platform sneakers”, “emotion”: “slightly anxious but upbeat”, “movement”: “pivots to show mural, returns to lens” }, “scene”: { “location”: “Hongdae side-street”, “time_of_day”: “21:30”, “environment”: “busker bass line, cafe chatter, colored LEDs” }, “audio”: { “ambient”: “street music low, cafe cups clink”, “mix_level_db”: -14, “voice_over”: { “language”: “ko-KR”, “voice_profile”: { “id”: “KoreanFemale_NaturalV1”, “tier”: “studio”, “accent”: “KR-Seoul”, “emotion”: “gentle_encourage”, “speech_speed”: “fast_105” }, “script”: [ { “timestamp”: 0.5, “text”: “?? ??? ?? ?? ? ???. ??? ?? ??? ? ??? ? ??? ?? ???.” }, { “timestamp”: 5.0, “text”: “?? ?? ???, ???” } ] }, “audio_master”: { “target_lufs”: -14, “true_peak_db”: -2 } }, “color_palette”: “magenta highlights, cyan shadows, natural skin” }

效果:

6. promot:

{ “shot”: { “composition”: “Medium handheld shot, 35mm lens, shot on ARRI Alexa Mini, shallow depth of field, natural handheld sway”, “camera_motion”: “swaying slightly with her movements as she leans against the wall”, “frame_rate”: “24fps”, “film_grain”: “Kodak 5219 500T film grain” }, “subject”: { “description”: “Young woman with long tousled dark brown hair and soft fringe, natural rosy blush and lips, wearing a deep red ribbed long-sleeve V-neck top”, “wardrobe”: “deep red ribbed V-neck top, casual urban look” }, “scene”: { “location”: “narrow, dimly lit urban alley”, “time_of_day”: “night”, “environment”: “gritty brick walls, garbage bins, scattered wet debris, faint neon glow spilling from behind” }, “visual_details”: { “action”: “the woman hides behind a wall, breathing heavily, chest rising and falling, eyes scanning in panic; condensation escapes her mouth in the cold night air”, “props”: “wet pavement, flickering neon sign, old metal fire escape” }, “cinematography”: { “lighting”: “low-key lighting with cold bluish fill from above, and red rim light bleed from distant neon signage”, “tone”: “intense, survival-driven, claustrophobic” }, “audio”: { “ambient”: “urban night ambiance with distant sirens, wind between buildings, her heavy breathing close to mic”, “sfx”: “subtle heartbeat pulsing with her breath, faint rustling” }, “color_palette”: “cool teal and muted reds with high contrast shadows”, “dialogue”: { “character”: null, “line”: null, “subtitles”: false } }

效果:

九、總結(jié)

最后總結(jié)一下Veo3這個產(chǎn)品,給我個人帶來的感官還是挺震撼的。由于其增強了原生音頻生成和唇形精準(zhǔn)同步的功能,在目前的AI視頻生成的各款產(chǎn)品中算是脫穎而出,很大程度上彌補了目前市面中這類產(chǎn)品的而短板。而其在獨特的高畫質(zhì)視頻生成的功能模板上依舊表現(xiàn)非常良好,運鏡自然,人物主體形象和動態(tài)交互上表現(xiàn)比較清晰,能夠滿足多場景的復(fù)雜需求。但是我覺得他很強大的一點在于一次性全流程生成的功能,可以基于文字提示直接生成視頻畫面、配音、音效、音樂和對口型,簡化傳統(tǒng)繁瑣的后期制作流程,大幅度提高人們的創(chuàng)作效率。

不過這款產(chǎn)品目前依舊會存在一些問題,比如偶爾會出現(xiàn)人物主體和動態(tài)動作不連貫的問題,或者對提示詞理解細節(jié)偏差等問題,在某些時刻會比較明顯看出來有“AI制作”的標(biāo)簽,當(dāng)然這也是目前市面上所有AI視頻生成產(chǎn)品的通用“痛點”,還是非常期待下一次的大版本迭代。

本文由 @莊懶懶 原創(chuàng)發(fā)布于人人都是產(chǎn)品經(jīng)理。未經(jīng)作者許可,禁止轉(zhuǎn)載

題圖來自Veo3 官網(wǎng)截圖

該文觀點僅代表作者本人,人人都是產(chǎn)品經(jīng)理平臺僅提供信息存儲空間服務(wù)

更多精彩內(nèi)容,請關(guān)注人人都是產(chǎn)品經(jīng)理微信公眾號或下載App
評論
評論請登錄
  1. 目前還沒評論,等你發(fā)揮!