Research shows that Google AI model Veo-3 can fake surgical videos, lacking substantial medical understanding-gonglubian

Nov. 9 — Researchers tested Google’s latest video-generation AI model, Veo-3, using real surgical footage and found that, while the model can produce highly realistic visuals, it severely lacks substantive understanding of medical procedures.

In the study, researchers provided the model with a single surgical image as input and asked Veo-3 to predict the next eight seconds of surgical progress. To systematically evaluate performance, an international research team built a dedicated benchmark called SurgVeo, covering 50 real laparoscopic and neurosurgical video clips. Four experienced surgeons independently rated the AI-generated videos on four dimensions (each scored out of 5): visual realism, instrument-use plausibility, tissue-response depiction, and the medical logicality of the operation.

谷歌AI模型Veo-3-2

Veo-3’s generated videos were initially highly deceptive—some surgeons even described the image quality as “shockingly clear.” However, deeper analysis showed the content’s logic quickly fell apart: in the laparoscopic tests, the model’s visual plausibility at 1 second still scored 3.72, but medical-accuracy–related scores dropped sharply—the instrument-use score was only 1.78, tissue-response 1.64, and the most critical surgical-logic score was the lowest at 1.61. In short, while the model can render convincingly realistic imagery, it cannot reproduce the procedural workflows and causal relationships that occur in a real operating room.

In the precision-demanding neurosurgical scenarios, Veo-3 performed even worse. From the first second it struggled to capture the exacting maneuvers required in neurosurgery: instrument-use scores fell to 2.77 (versus 3.36 for laparoscopic cases), and surgical-logic scores dropped to as low as 1.13 by 8 seconds.

The team further categorized error types and found that over 93% of errors stemmed from medical-logic failures—for example, inventing non-existent surgical instruments, fabricating tissue responses that violate physiology, or performing clinically meaningless actions—while only a tiny fraction of errors were related to image quality (6.2% for abdominal surgery, 2.8% for brain surgery).

Researchers attempted to give the model additional contextual cues (such as surgical type and specific procedural stage), but this did not produce significant or consistent improvements. The team concluded that the core problem is not missing information but the model’s fundamental lack of medical knowledge and reasoning ability.

谷歌AI模型Veo-3

The SurgVeo study makes clear that current video-generation AIs remain far from genuine medical understanding. Although such systems might one day assist with physician training, preoperative planning, or intraoperative guidance, current models are nowhere near safe or reliable enough for those applications—they can generate images that “look” real but lack the knowledge foundation needed to support correct clinical decisions.

The research team plans to open-source the SurgVeo benchmark dataset on GitHub to encourage the field to improve models’ medical understanding.

The study also warns of serious risks in using AI-generated videos for medical training. Unlike NVIDIA’s use of AI video to train general-purpose task robots, in medicine these “hallucinations” can have grave consequences—if systems like Veo-3 generate videos that appear plausible but violate medical standards, they could mislead surgical robots or medical trainees into learning incorrect techniques.

The results also indicate that viewing current video models as “world models” is premature. Present systems can imitate surface motion and shape changes but cannot reliably grasp anatomy, biomechanics, or the causal logic of surgical procedures. Their outputs may be superficially convincing yet fail to capture the true physiological mechanisms and operative reasoning behind a surgery.

The copyright of the article belongs to the author, please do not reprint without permission.

THE END

AI
# AI # Google

Research shows that Google AI model Veo-3 can fake surgical videos, lacking substantial medical understanding

Arm’s’ clever trick ‘to accelerate the popularization of end-to-end AI, increasing the AI performance of Lumex CPU by 5 times

Robots are real! Chinese robots break through traditional imagination

The threshold for entrepreneurship in the AI era has really been greatly lowered

Perplexity under legal threat from Amazon for AI shopping tool

AI glasses are making a huge profit, and Chinese startups have secured record breaking financing, with the entire round potentially reaching 800 million yuan

What else can we do after being replaced by AI?

请登录后发表评论

1I Let Gemini Access My Gmail, and It’s Downright Creepy

2Sailis, a popular figure in the domestic car manufacturing industry and a benchmark for high-end new energy vehicle brands, has stepped onto a new stage in the capital market.

3The Norwegian opposition is hindering Musk’s $1 trillion salary

4Apple will launch an entry-level MacBook in 2026

5China takes the lead in formulating the world’s first industrial 5G international standard, which has been officially released

6Tiktok E-commerce released new regulations on marketing videos, and the embedded advertisements require that the information in the chapters be marked to show users