In the current era of intensifying competition in artificial intelligence, the debut of Gemini 3 undoubtedly puts Google at the center of the spotlight once again.
After the launch of this generation of models, it quickly sparked discussions in the technology circle, showing significant improvements in inference, multimodal processing, and tool call stability. It is regarded by many as Google's most robust and mature upgrade in recent years.
While the popularity continues to rise, Google DeepMind CEO Demis Hassabis gave an interview where he systematically talked about the development process behind Gemini 3, the team's ongoing capabilities, the direction in which it is still in the prototype stage internally, and Google's vision for the next generation of intelligent devices.
As the interview gradually unfolded, the topic extended from the enhancement of model capabilities to more forward-looking themes, including memory and personalization systems, the application of tool level agents, the positioning of Antigravity in the development ecosystem, the current status of internal high cost model experiments, and the potential value of multimodality in the fields of medicine and research. The details revealed the true judgment of Google's internal model evolution.
Regarding this, AI Technology Review has compiled the interview content without changing its original meaning:
After the debut of Gemini 3, Hassabis plans to 'transform' Google's entire product line
Podcast address: https://podcasts.apple.com/us/po ... cheung/id1689006106
Core Progress of Gemini 3
Host: Dennis, thank you very much for taking the time out of your busy schedule to talk to us. Today we want to focus on Gemini 3, which is currently Google's most advanced flagship model. If it can only be summarized in one sentence, where do you think the significance of this release lies?
Hassabis: If I could only say one sentence, I would think its importance lies in the fact that it continues and further strengthens the technological development direction that Gemini has been adhering to in the past few years. We are very excited about the overall performance improvement of this generation of models.
I believe users will also be quite surprised in actual use, because from various benchmark tests to different categories of tasks, we have seen comprehensive improvements in almost every aspect, including reasoning ability, stability and reliability of tool use, accuracy and creativity of language expression, etc. The strengthening of each dimension is very significant.
Host: If we go back to the moment when Gemini 2.5 was released and compare it to Gemini 3 now, what key breakthroughs have occurred during this period that have enabled the model to reach its current level in benchmark testing?
Hassabis: We have put in tremendous efforts on many levels. 2.5, as the previous generation model, is already very mature and performs quite satisfactorily in both the developer ecosystem and various Gemini applications. However, we are not satisfied with this and still hope to continue moving forward in many core competencies.
For example, the accuracy of tool calls, consistency and robustness during usage are all parts that users heavily rely on. For programmers and technicians, these abilities directly determine the quality of the experience, and also have a strong impact on general reasoning tasks and daily user usage.
In addition, we also spent a lot of time optimizing the style, expression, and personality traits of the model. We hope that its dialogue style can be more direct, clear, and focused on the true needs of users, while also making people feel natural and trustworthy.
According to extensive internal testing, the new version of the model has a more relaxed and enjoyable communication experience than before, and many testers have expressed willingness to interact with it for a longer period of time because the new expression is closer to real human communication.
Host: The improvement in programming and reasoning is indeed outstanding. But for ordinary users who are not developers, they have become accustomed to the old Gemini. After using the new version, what things will they suddenly discover that they cannot do today?
Hassabis: It depends on the specific usage of each user, but in our testing covering different fields, almost all types of experiences have shown a qualitative leap. For example, if you use it for brainstorming, it will provide more diverse, precise, and contextually relevant suggestions.
When writing code, it can quickly grasp your intentions and handle complex logic more reliably, thereby reducing back and forth communication. In common tasks such as creative writing, text polishing, material summarization, and daily assistance, the new model not only significantly improves accuracy, but also greatly enhances the naturalness and fluency of language, giving an overall feeling of a significant increase in intelligence level.
In terms of communication style, the new model is more natural, with a response rhythm that resembles real conversations between people. In terms of tool usage, you will feel that it has more and more detailed steps in the background processing, making it more suitable for the use of search and other tools. With the significant improvement in the stability and reliability of tool calls, the final content presented to users is also more accurate and trustworthy.
Overall, if you are an ordinary Gemini user, you will intuitively feel that it has become stronger, smarter, and better to use in all aspects, and you will be more willing to keep people in constant conversation with it because the overall experience it brings is smoother and more trustworthy.
Host: I noticed that the information released this time did not specifically mention the memory function, which made me particularly curious. Google's advantage in cross product ecosystems is very obvious, from Gmail to YouTube, to maps and other services, you have massive user data and great potential for integration space.
To be honest, if I were to find the most compelling reason for me to continue using ChatGPT, it would be its recently added small memory feature, which has significantly improved my personal experience. How do you think about promoting similar capabilities in Gemini's long-term plan?
Hassabis: We are currently investing very deeply in personalization, memory ability, and long-term contextual understanding. I think this will be one of the core themes after we enter the Gemini 3 era, in other words, we are focusing on strengthening these capabilities and will gradually demonstrate more practical progress in the future. As the Gemini 3 series continues to improve, you will see more discussions and revelations in these areas.
Of course, what is currently appearing is only a part of the model family, and it will continue to expand in the future. We have pre-set many capabilities and potential structures within the model, which will gradually be opened up to users and developers in the future, allowing them to truly use these enhanced features in actual products and development interfaces.
The advancement in these directions will include deeper personalized experiences, enabling models to gradually understand users' long-term preferences and habits. At the same time, it will also be more closely connected to Google's various services, such as Gmail, Calendar, etc. In fact, you can already see some preliminary integration effects now, but that is only a very small part of the overall plan. The blueprint for the future is much richer than what is presented at the current stage.
The capability foundation of Gemini 3 is already sufficient to undertake this series of large-scale propulsion tasks, and the stability and reliability of the model in tool calling and tool usage will also become the fundamental condition for it to securely connect to external services.
Host: From benchmark testing and overall performance, it is evident that it has very strong capabilities. I just feel like it came a bit late. I am a heavy user of ChatGPT myself, and Gemini is leading in many benchmarks, supported by Google's vast ecosystem. I understand that you cannot provide an exact timeline, but can you give a rough range of when true memory abilities will be introduced in the 3.0 series?
Hassabis: We are currently continuously testing various designs and solutions internally, and iterating in different directions. When these abilities have been thoroughly polished and we feel confident in their stability and reliability, we will announce them to the public as soon as possible. We are very aware of users' expectations and how important memory ability is to the user experience.
At the same time, we are also advancing more efficient model versions, including versions that are lighter in size but still maintain high levels of performance. This is the only way to provide large-scale services at lower costs globally and benefit more users. The various prototype experiments we are currently conducting are very exciting, and you will soon see these efforts gradually translate into practical results.
Another point that I must emphasize is that the performance of the new model in multimodal aspects has left a deep impression on me. You know, Gemini has always been at the forefront of the multimodal field, maintaining top-notch standards in cross modal inference, cross modal understanding, and joint generation of images and text. Tasks such as image analysis, video understanding, and complex structure recognition have already performed exceptionally well in the previous generation, and this time we have further elevated our overall capabilities in these areas to a whole new level.
I believe that ordinary users will clearly feel the direct improvement brought by these multimodal capabilities in daily use. As time progresses, we will also integrate these capabilities more deeply into more products and scenarios, such as YouTube, AI Studio, and other types of applications. In the future, you will see them gradually land and truly come into play, and these new multimodal capabilities will allow users to experience many interaction methods that were previously impossible to achieve. I am full of expectations for this.
The role of Antigravity
Host: I am also looking forward to fully testing it and seeing what kind of results developers and users around the world will create with these models. Meanwhile, in addition to the new model of the 3.0 generation, you have also launched Antigravity, a brand new intelligent agent development platform.
From the introduction, its positioning is almost like giving every developer an exclusive AI colleague who can assist in completing tasks in the editor, terminal, and browser environments simultaneously. But in your opinion, what is the biggest difference and value of Antigravity compared to the already mature intelligent coding tools on the market?
Hassabis: I think Antigravity will continue to evolve rapidly in the future, but our core philosophy has always been very clear, which is to reimagine the entire development experience from the perspective of intelligent agents.
We are asking ourselves a fundamental question, what form should a truly ideal IDE take if intelligent agents become the central role in development. We have a very clear roadmap for the long-term development direction of Gemini, and Antigravity is an indispensable key structure in it.
It should also be emphasized that in the system of Antigravity, you can use different models and it does not rely on a single choice. What we really want to achieve is to rebuild a development environment that operates around proxy capabilities from the bottom, so that all functions and interactions can naturally revolve around intelligent agents.
The team responsible for this direction has many experts from the past who have built complex editor tools, such as members of the original Windsurf team, whose experience and expertise in the relevant field are extremely deep, providing us with a strong foundation for redesigning and developing tools.
We are really excited about this direction, and currently there are many teams within Google actually using Antigravity, which is the most important first step for us to drive any development tool. Internal engineers have generally provided feedback that the experience of using it is very smooth and the efficiency improvement is significant, which makes us more confident that we are moving in the right direction.
However, I believe that what we are seeing now is still only the beginning of the entire journey. As model capabilities continue to enhance and become more reliable, we must also rethink what a complete development experience professional developers truly need. This is no longer just for lightweight tool enthusiasts, but for a deep development ecosystem aimed at professional engineers.
What kind of collaborative support, automated processes, code insights, and problem diagnosis do professional developers truly need in their environment? Antigravity is our first serious attempt to answer these questions and build a complete roadmap based on them.
At the same time, we also have AI Studio, which may be a more suitable entry point for individual developers, interest creators, and general users. In the future, we will provide product interfaces and tool combinations in different directions based on users' professional backgrounds, team sizes, usage scenarios, and collaboration complexity. I believe that Antigravity will be a crucial part of it and will truly excite professional developers.
Host: So overall, the positioning of Antigravity is indeed closer to professional developers, rather than the kind of lightweight coding that leans towards experiential nature?
Hassabis: Currently, that's true. Our main target audience is professional developers. However, we also hope that in the future, developers at different levels can benefit from this system, whether they are beginners, interest driven amateur developers, or experienced senior engineers, they can find their own way in this system.
Internal Model and Research Layout
Host: Speaking of the large-scale use of AI tools within your organization, I have a long-standing question. I heard that Google has already relied on AI to generate code in a large number of scenarios internally.
So I'm curious if you have any models or tools that are not available to the outside world and are only open to the inside, so that you can benefit in advance before the official release. How do you usually test these tools internally before launching new features? Will there be some features that are temporarily only used internally to maintain a leading edge?
Hassabis: We have indeed been running many additional experimental models and tools internally, but there are also some that cannot be immediately opened to the public due to technical difficulties or cost issues.
A typical example is Genie, which is currently an ability that cannot be publicly disclosed on a large scale. We certainly hope to make it smooth for all users to use, but at this stage, its inference and service costs are still very high and not suitable for simultaneous operation on a global scale. We are developing more efficient versions, hoping to gradually reduce costs to a level that can be open to a wider range of users.
There are also some deep reasoning models that can only be used in high-end levels such as Ultra, due to their extremely expensive resource consumption. We are constantly optimizing their execution efficiency, with the goal of reducing their costs to a level that can provide services to more users.
So overall, this is not a deliberate retention of certain capabilities, but rather limited by computing power, hardware, and physical resources. As long as we can deploy a certain feature at a reasonable cost, we usually open it up to all users as soon as possible. What limits us is not strategy, but reality.
Of course, at the research level, we have been conducting extensive exploration internally. This is the daily routine of a top-notch cutting-edge research laboratory. Our research scope is both broad and profound, and can be said to be very leading on a global scale.
We are constantly searching for the next major breakthrough, such as a fundamental technological leap like AlphaGo or Transformers. The world model is one of the important directions for the future, and we continue to conduct extensive experiments in this area. When they are mature enough and have stable and reliable performance, we will bring these abilities to users. Prior to this, they will continuously iterate and improve in the form of internal prototypes.
In addition, we are also actively exploring the interaction between hardware and software, such as future products like eyeglass assistants. These types of products will undergo a long period of testing and polishing internally. Only when we feel that they are truly prepared, will they be officially presented to global users.
Gemini's productization and vision
Host: I noticed that your release pace seems to be getting faster and faster. As soon as 3.0 was launched, it entered the search directly, which had never happened before. I'm curious, how do you view the issue of release speed now?
Hassabis: Your observation is very accurate, and this is indeed a core goal we are vigorously promoting. I think 2.5 is a particularly critical node, as it was the first time we quickly and deeply integrated the world model into Google's core product system.
The presentations you saw at the developer conference shocked many people at the time about the integration speed. And with Gemini 3, we once again elevated the pace to a higher level, launching directly in search and AI modes from the beginning. This is the direction we have been very focused on optimizing in the past few months.
If you see Google DeepMind as Google's technological engine, then our responsibility is to ensure that all major products can be accelerated, enhanced, and reshaped by these models. Google has a vast and deeply integrated product ecosystem that touches billions of users every day, from maps to YouTube, to search and workspaces.
Our goal is to continuously inject Gemini and its various capabilities into these products, allowing users to directly experience the upgrades brought by the model in their daily lives and work. Now this positive cycle has begun to emerge. I think we have probably reached the middle of this journey, and there is still a lot of exciting development space ahead, and we are fully confident in continuing to improve our integration speed.
Search is a typical demonstration that showcases our ideal way of integrating technology. And next, we need to continuously push the entire product system in this direction.
Host: Speaking of truly impactful products, the monthly active users of Gemini application have recently reached 650 million. Congratulations on achieving such results.
Hassabis: Thank you, we are truly proud of this number, which represents that more and more people are truly using and relying on these abilities in their daily lives.
Host: With such a large user base, I am curious. Apart from the coding scenarios that everyone is already familiar with, have you observed any particularly prominent usage methods that have been widely adopted among ordinary users?
Hassabis: Actually, we have seen a lot of interesting trends in the data and feedback. I personally believe that multimodal capability is one of the most core and differentiated advantages of Gemini applications. For example, after the launch of the Nana Banana feature, it clearly drove a wave of user growth.
Users can do a wide range of things with it, from planning a surprise birthday party for their family, to designing small sculptures with local characteristics for certain countries or regions, to creating comic stories with continuous storyboards. Various creative ideas are constantly emerging.
These rely on multimodal capabilities to combine images, text, and even videos, opening up many application spaces that were previously unimaginable. Gemini's performance in cross modal tasks such as visual understanding, image generation, and video analysis is outstanding, and these features also make it present more and more novel gameplay in practical use.
We also noticed that the enthusiasm and frequency of users are very high in terms of health and education related needs. Therefore, we are investing heavily in these directions, hoping to truly achieve first-class standards in the industry. I believe that Gemini 3 will become a very important foundational platform in these fields.
As for my personal daily habits, I really enjoy using Gemini for brainstorming. Whether it's naming a new project or asking it to help check if an idea holds up, it can provide valuable feedback with high efficiency. Gemini applications perform exceptionally well in this type of creative and thinking assistance.
Host: One point you just mentioned that particularly interests me is your belief that Gemini has the potential to become a foundational platform in the health field. Can you talk more about this part of the idea. After all, you have a lot of experience in healthcare and life sciences in your background.
Hassabis: Of course. In fact, we have many specific projects underway in this direction, such as tools like Co Scientist that help with scientific research and experimental processes. We also have a medical diagnostic system called Amy, developed by a more research-oriented team. Our goal is to gradually integrate these dispersed capabilities into a complete Gemini architecture in the future.
I hope scientists and researchers can use Gemini as a true thinking partner in the future, helping them inspire new ideas, organize research processes, and analyze complex problems. In my opinion, Gemini 3 has provided a solid enough foundation to support this type of serious application scenario.
Next, you will see that these capabilities will gradually be released in different versions of Gemini 3, including systems that are more oriented towards deep research and deep reasoning, all of which continue to extend from the overall structure of Gemini 3.
Due to the significantly improved reliability of Gemini 3 in reasoning and tool invocation, its performance in referencing materials, understanding academic papers, and organizing professional knowledge structures will also improve accordingly. Multimodal capability happens to be a crucial element in the fields of medicine and education. For example, users can upload a diagnostic related image and ask what it may represent; Or give it an academic paper that requires an explanation of the correspondence and logical structure between the charts and text in the text.
In educational settings, students may need to design a poster for a course. They can first output the text content and then have the model generate appropriate visual elements and layout suggestions based on the theme. This type of task fully embodies the value of multimodality.
I am looking forward to people making more unprecedented attempts with Gemini 3 in these scenarios. Throughout the entire process, Gemini applications will naturally become the primary and most intuitive entry point.
Host: I am also very excited about these directions, especially in the fields of healthcare and education. Looking further into the future, would you consider allowing AI to play a role in proactive preventive healthcare?
Hassabis: Within our scientific and health teams, this is indeed the direction we are researching, which is to build a truly medical grade system. Such systems typically require strict regulatory scrutiny and very high security standards, and must ensure extremely high reliability before they can be put into practical use.
Obviously, Gemini applications themselves are not medical grade tools, they are more suitable as daily assistance for users. When encountering health problems, users still need to consult professional doctors. But it does have the potential to play a huge role in many resource scarce areas, especially in regions lacking basic healthcare or education services. With Google's global coverage and Android ecosystem, they have taken on a critical role in digital infrastructure in these regions. I believe Gemini can provide the most basic level of knowledge and assistance, and provide tangible help to the local people.
At the same time, we will continue to explore higher-level application scenarios such as medical assistants or research assistants. However, these applications require the model to reach a higher reliability threshold. Gemini 3 has taken a solid step towards this, but there is still a lot of work to be done to meet high-risk scenarios such as healthcare.
Medical and scientific research are my personal areas of great concern, and we hope that Gemini can become the core foundation of these abilities, thereby driving the continuous expansion of the entire system. I am satisfied with the progress of Gemini 3, but this is just the beginning of the entire journey. If we want the model to truly take on a medical grade role, we must further build multi-level security and reliability, and we are investing heavily in research to make all of this possible.
Host: Understood. This will indeed affect the lives of billions of people, and I am very much looking forward to it. Next, let's take a different perspective and talk about the actual usage scenarios of Gemini in the real world, which is what users are currently able to achieve.
One ability that caught my special attention in this release is the newly added proxy system in the Gemini application. This feature allows you to connect to services such as Gmail. Although Gemini used to be able to access Gmail, the experience is completely different now. It can not only list the steps for you, but also directly perform tasks for you, such as completing email sending in Gemini.
As we gradually move towards a more complete era of artificial intelligence, Gemini looks increasingly like a true life assistant, almost embedded in users' digital lives. I am curious about what the ultimate form of this digital colleague is in your imagination. Would you want Gemini to become an independent platform like Slack that must be opened every day and always accompanied? Or do you prefer it as one of the many tools?
Hassabis: Of course, I hope it can grow into that kind of existence. We have been brainstorming a universal assistant internally, which can also be seen as the future form of Gemini, capable of playing a role in every stage of users' daily lives.
It is not only the best assistant for you to handle complex tasks in your work, but also can accompany you during leisure, entertainment, or exploration of interests, providing you with advice, inspiration, and engaging in natural, relaxed, and inspiring communication with you.
Meanwhile, it should not be limited to a single device, but should accompany you in multiple forms. You can use it on your computer or call it in your browser; You can rely on it at work and easily interact with it at home. It will appear on your phone and is likely to exist in the form of next-generation smart devices such as smart glasses in the future. I am very confident that this will be one of the directions for the future.
The most important foundation for achieving such a goal is a truly powerful multimodal model. The significance of Gemini lies in its ability to understand the real world and real-time context in which users are located, as well as its ability to invoke external tools. In the initial stage, we will focus on Google's own applications, such as maps Workspace、 Email, but ultimately it must be able to connect to any tool to become a truly universal intelligent agent.
When these abilities mature, we will enter a brand new era. At that time, users would have a digital companion like the best personal assistant in reality. Our vision is to make this kind of assistance accessible to all, so that everyone has such intelligent support, not just a few people can enjoy it.
This will profoundly improve the way people manage affairs, allowing us to regain time and attention, and focus more energy on truly important and valuable things, rather than time-consuming and laborious repetitive processes. This is a goal that I particularly value, and I believe Gemini is laying a solid foundation for this future.
|