Those wild technological imaginations are turning from dreams into reality.
On November 5th, the 7th Xiaopeng Technology Day was held as scheduled, with the event taking place in Xiaopeng’s newly relocated headquarters park. From an early entrepreneurial team living in a cramped urban village in Guangzhou, to now owning a self built technology park, behind the change in office location lies the vision of Xiaopeng to move from standing firm to running towards the world after its 11th anniversary.
Xiaopeng Motors is becoming more like an AI company.
It focuses its business on four directions: smart cars, robots, Robotaxi, and flying cars. These four sectors are not simply parallel, but different evolutionary paths centered around intelligent driving. Automobiles are the starting point of commercialization, Robotaxi is the urbanization extension of algorithms, and robots and flying cars are the spillover results of embodied intelligence and spatial intelligence.
Among them, Xiaopeng Motors has launched the first mass-produced physical world model, the second-generation VLA, which is also a key technological breakthrough for Xiaopeng’s exploration in the field of physical AI. In He Xiaopeng’s view, as AI continues to evolve, it will no longer be limited to responding and generating, but will directly participate in, guide, and even reshape the operation of the physical world. AI’s capabilities need to extend from the digital world to the physical world.
Compared to the traditional industry standard version of VLA, Xiaopeng’s second-generation VLA eliminates the intermediate translation step of language. After reducing the delay in language translation, visual signals directly reach action instructions, and Xiaopeng’s second-generation VLA uses a large amount of long tail video data, transforming the big model from learning imitation to smarter understanding.
The ability of machines to gradually understand, interact, and change the world brought by the second-generation VLA is also the source of confidence for Xiaopeng to connect multiple business lines such as automobiles, robots, and flying cars, because they all face the same physical world and the decisions they need to solve are based on physical world information.
According to He Xiaopeng, co-founder, executive director, chairman and CEO of Xiaopeng Motors, the breakthrough progress of the second-generation VLA has gone through a highly challenging research and development period.
In 2024, Xiaopeng developed two generations of VLA internally, but did not achieve a breakthrough. The R&D team faced enormous pressure and the development process was full of uncertainty. However, with the team’s persistent efforts, the second-generation VLA suddenly showed surprising comprehension abilities. Xiaopeng internally believes that this change stems from the natural explosion of technological accumulation after reaching a critical point.
He Xiaopeng stated that the physical AI capabilities demonstrated by the second-generation VLA are just the beginning. The real challenge is not the emergence of models, but whether stable mass production can be achieved faster. That means not only making the ‘long board’ longer, but also filling in the middle and short boards together. Mass production is the threshold that Xiaopeng must cross next.
In his view, physical AI will become the new competition for future technology companies. In the era of physical AI, data is the new oil. Whoever can first obtain a large amount of high-quality data and form a closed loop of “data experience more data” will gain an advantage. The starting point and key links of this cycle depend on the deep investment of enterprises in hardware and engineering capabilities, ultimately forming a closed loop of software hardware collaborative evolution.
In this way of thinking, Xiaopeng will extend its hardware business from automobiles to sectors such as robotics, flying cars, Robotaxi, etc., which may bring more data and form greater software and hardware cycling capabilities.
Xiaopeng’s launch of Robotaxi business is more based on thinking about future travel. He Xiaopeng firmly believes that the future four wheel transportation will be a combination of “sharing” and “private enjoyment”.
At the same time, the maturity of the second-generation VLA large model also provides an advantage for Xiaopeng to enter this market. The second-generation VLA has the ability to break free from the dependence on high-precision maps and can directly support pre production solutions rather than post production modifications. At the same time, by sharing the R&D system with mass-produced models, Xiaopeng is able to share the cost pressure while ensuring performance, and control the cost of individual vehicles in a more economical range.
Under the logic of favorable trends and conditions, it is also very reasonable for Xiaopeng to layout Robotaxi based on its existing automotive business.
He Xiaopeng also made it clear that Xiaopeng Motors will switch its positioning to a “globally oriented embodied intelligent enterprise”, rather than just a Chinese car company.
In addition to expanding its business to a larger market, Xiaopeng is also opening up some of its core technologies in an open source format. Its second-generation VLA model is open sourced to global business partners, with Volkswagen becoming the first customer for this technology. In the future, it will promote the implementation of the second-generation Xiaopeng VLA and Turing chip on Volkswagen brand models.
In the commercial implementation plan, He Xiaopeng stated that multiple technologies announced on Science and Technology Day will enter an explosive cycle in 2026: second-generation VLA end-to-end assisted driving, small path NGP, Super LCC without navigation roaming, second-generation humanoid robot IRON, and front mounted mass-produced Robotaxi.
In the future of technology, Xiaopeng Motors is known as the most Tesla like company, with Xiaopeng Motors’ market value returning to $22 billion and Tesla’s market value being approximately $1.5 trillion.
There is still a huge gap between the two, but from another perspective, Xiaopeng’s open SDK in the fields of robotics and autonomous driving, mining the data oil of physical AI, is also helping it seize the high ground of physical AI.
On the path of physical AI, Xiaopeng has a broader upward space.
Lei Fengwang and others had multiple exchanges with He Xiaopeng, Chairman of Xiaopeng Motors, Gu Hongdi, Vice Chairman and Co President of Xiaopeng Motors, Liu Xianming, Organization Leader of Xiaopeng Motors Autonomous Driving Center, Mi Liangchuan, Vice President of Xiaopeng Motors Robotics and Head of AI Technology Committee, regarding topics such as physical AI, robotics, and business development.
The following is an excerpt of the edited content:
Q: Why does Xiaopeng insist on highly anthropomorphizing robots? How to make trade-offs and trade-offs when highly anthropomorphized, corresponding to very high investment costs?
He Xiaopeng: In the future, high-level robots will come in various forms, some resembling humans, and some not resembling humans.
A more anthropomorphic robot has three major benefits: firstly, in order for robots to be intelligent today, they cannot rely on rules, but must be driven by AI, and can only learn the most data from the human world.
Secondly, most of our homes and factories are actually designed, built, and operated for the convenience of human use, so the more human like they are, the easier it is to adapt to this world.
Thirdly, from a business perspective, being like a person makes it easier for everyone to have a sense of affinity and a stronger desire to purchase. Selling more brings economies of scale, with lower costs leading to more sales and forming a positive cycle.
Q: Why did Xiaopeng Robotaxi choose Gaode as its first global ecological partner? The second question is, what are the differences among the three autonomous taxis that will be released next year?
He Xiaopeng: Gaode is a very large travel ecosystem platform in China. Unlike many Robotaxi companies, Xiaopeng does not pursue its full operation whether in China or globally. I hope Xiaopeng can make Robotaxi like a “toolbox”, with cars, software, and SDK interfaces, open to partners, so that he can operate the local Robotaxi with the “toolbox”.
Gaode is a travel operator, so it operates. We provide a ‘toolbox’, which is in line with the strategic positioning of both companies. The three Robotaxis come in different price ranges, offering 5, 6, and 7 seats to meet the needs of different users.
Q: Many large Chinese companies are often compared to Tesla, but their valuations are much lower than Tesla’s. In the future, how should we enable the capital market to better enhance the valuation of Xiaopeng Motors?
Gu Hongdi: What we are currently pursuing is actually more about technology and products, some of which are indeed similar to Tesla. They all focus on physical AI, while also focusing on building intelligent cars, autonomous driving, humanoid robots, and more. We started even earlier than Tesla in some areas, such as flying cars and humanoid robots. To some extent, we are focused on using technology and AI to create more physical world scenarios.
Regarding the valuation of the capital market, the current situation of Xiaopeng and Tesla is very different, with many variables involved.
Firstly, China and the United States are different in terms of markets, technology companies, and capital markets. Secondly, Tesla has some advantages, especially as they started earlier in the electric vehicle field and have a high level of media exposure.
Tesla has different products, with electric cars being just one part, and they also have AI models FSD, And many ecological enterprises, combined to form their valuation, Xiaopeng is constantly launching various products and technologies, hoping that Xiaopeng can gain the same international reputation in the future.
Q: What is the reuse ratio of Xiaopeng’s new generation robots and cars in terms of components?
He Xiaopeng: I don’t have a precise answer, but many processes are the same, such as perception and domain controllers, which are mostly the same. 70% of AI software is the same, but the joints and skin of robots are not on automotive parts.
Q: What is the percentage of revenue that Xiaopeng hopes to achieve in the physical AI business compared to the automotive industry?
He Xiaopeng: In the future, my view on robots may be more optimistic. The global market for automobiles is 10 trillion US dollars, with an annual production of 90 million vehicles; And robots are a $20 trillion market. Of course, it won’t be implemented so quickly, it may take 10-20 years, at which point there may be 200 million or more humanoid robots.
I haven’t thought about how many robots can be sold in a year after 10 years, but they will definitely surpass cars, over 1 million, from a 10-year perspective. From a short-term perspective, the mass production of robots still needs to go through many, many hurdles.
Q: Many Robotaxi companies have not yet achieved profitability. How do you ensure profitability when promoting the Robotaxi plan?
He Xiaopeng: Xiaopeng may be a different Robotaxi company because we make pre installed cars and mass-produced cars. Xiaopeng Robotaxi also has a Robo (private L4) that can be sold to C and can significantly share BOM and R&D expenses.
In addition, based on the second-generation VLA, Xiaopeng Robotaxi does not require high-precision maps, street sweeping, or LiDAR. It is more like a person in the physical world thinking, so it is more extensive and generalized, and does not require deployment costs. Xiaopeng Motors has a natural advantage of tens of percent or even several times in research and development expenses and BOM compared to other companies in the field of Robotaxi.
Q: Did Xiaopeng really eliminate the ‘L’ in VLA? If it is really eliminated, you are still called VLA now, shouldn’t it be called VA?
He Xiaopeng: When we talk about V+L, the translation process has not become a human language or format, but a new language of the physical world. Therefore, it is not a visible and recognizable language for humans, and it is efficient and more diverse.
Q: There was no mention of L3 at the press conference, and it was directly aimed at L4. Is the progress of L3 products also stuck in this legal regulation?
He Xiaopeng: I think in the future, one will be L2 and the other will be L4, without L3, skipping L3.
Q: Xiaopeng installed solid-state batteries in humanoid robots. Is this solid-state battery from an external supplier?
He Xiaopeng: We don’t develop battery cells, we use those from our partners. Our solid-state battery cells come from two sources, one from overseas and the other from China.
Q: Why did Xiaopeng start Robotaxi at this time? What are the current strategies?
He Xiaopeng: Due to the development of many AI technologies and the improvement of computing power, we are now able to create opportunities for Robotaxi, which is completely different from six months or two years ago.
In addition, we also see that L4 intelligent driving is becoming increasingly mature. In the past 6 months to a year, many companies and industry collaborations have focused on L4 level autonomous driving. When we turn to L4 and Robotaxi, the current situation is completely different from six months or even a year ago.
For Xiaopeng, they also want to provide more economical solutions to better help customers enjoy the convenience of L4. Currently, in the field of Robotaxi, we are collaborating with many ecological partners.
Q: Xiaopeng exhibited a female humanoid robot, why is it female? What are the considerations behind this?
He Xiaopeng: It doesn’t matter whether the humanoid robot is male or female, just like you can buy a black car or a white car. In the first generation of robots, I hoped to create a male and a female robot, which I believe are both necessary.
Q: Tesla and Xiaopeng are the two companies with the highest degree of overlap in business globally. If we summarize them in one word, what is the biggest difference between Xiaopeng and Tesla?
Gu Hongdi: There are two special points in the comparison between Tesla and Xiaopeng. The first thing we have in common is that we both focus on scale, and when we do things, we don’t want to only target a small niche market, but hope to achieve mass production and scale. This is our commonality. The second is that we are different places, Xiaopeng is a very open ecosystem.
Both more open ecosystems and more closed systems have their own advantages and disadvantages. For example, closed systems may be easier to obtain economic benefits, while open systems may be easier to cooperate with partners.
At present, Xiaopeng is more open because we are a young company with our own limitations and do not have the resources to do many things. We open SDKs for robots and autonomous vehicles, which allows us to collaborate with more people to implement many technologies and better help our products and technologies mature.
Q: The press conference mentioned some scenarios of robots, such as screwing screws and doing household chores, which may not be immediately achievable. In your opinion, how many years may it take to achieve these scenarios?
He Xiaopeng: Different companies will choose different commercialization plans for humanoid robots. In our company, there are some rules that prohibit robots from having too much manual operation, and we also want to gradually develop the intelligence or intelligence of robots.
At present, we are still in the early stage of commercialization, and it may be possible to achieve the scenario you just mentioned within 3-5 years, and we can do more different things. But if we want robots to take care of the elderly and children at home, it may take longer, even 5-10 years. Some people say that in 5-10 years, humans may be replaced by robots in many scenarios, but in my opinion, this is impossible to achieve.
Q: How does Xiaopeng view the international market potential of its business lines, including robots, low altitude aircraft, and even Robotaxi?
He Xiaopeng: Xiaopeng’s goal is to sell half of its products outside of China in the next decade. We will consider how to globalize all product lines and most of our products. In fact, some products such as flying cars may have more and faster usage scenarios globally than in China.
Q: In terms of intelligent driving, does Xiaopeng have any plans for XNGP abroad?
He Xiaopeng: In terms of overseas markets, Xiaopeng is also actively promoting the landing process of XNGP. At present, the company has conducted preliminary investigations in multiple countries and regions, and the laws and regulations in some markets have allowed the deployment of higher-level intelligent driving functions, such as high-speed NGP.
It is expected that the relevant functions will be implemented first in Europe next year. Meanwhile, the company is also continuously following up on policy developments in the Hong Kong and Southeast Asian markets. It should be emphasized that the implementation of overseas XNGP not only depends on the maturity of technology, but also closely related to local laws and regulations. Xiaopeng is maintaining communication with local governments and regulatory agencies to promote the compliant application of technology.
Q: What is the production situation of the Magna factory in Austria, and what is the expected or planned annual output for next year? Will we set up factories in more overseas places in the future?
Gu Hongdi: The Austrian factory officially started production in August this year, with a production capacity of several thousand to several thousand vehicles this year, and I think it will be tens of thousands of vehicles next year.
I think it is absolutely necessary to build local chemical plants in other places. A company that hopes to achieve global leadership cannot be achieved solely through exports. It must have a local layout – production, research and development, sales and service, brand building – all of which are necessary for us. So I believe that in the future, we will have the ability to localize production and operation in major sales regions.
Q: What are the challenges of physics AI?
Liu Xianming: The difficulty lies in the approach of the model. The model is modeled using language to discretize tokens (character units), then passed through an architecture, and finally output.
Q: What is the core principle behind this technology?
Liu Xianming: It’s about stacking big models, big computing power, and big data together. The model logic is very simple, and the principles behind it are also very simple, without any complex stories. But it is very difficult to do it well, as very large amounts of data need to be read in one breath, and it also needs to be trained very stably at the level of kilocalories and ten thousand calories to ensure that it does not collapse.
Q: What specific business advantages can data annotation bring?
Liu Xianming: For example, if I want to go to overseas markets such as Europe and land a Robotaxi in a new place, there is no need for data annotation. As long as there is a Xiaopeng car in this place that can collect data, I can handle this matter. There is no need to collect a lot of data in a targeted manner, and there is no need to hire many people to annotate the data, which will significantly reduce costs.
Q: Why can Xiaopeng’s data be annotated without any need? How did you collect a large amount of long tail data through Infra?
Liu Xianming: Our biggest advantage lies in the data. There are two aspects to collecting long tail data: one is the Infra on the vehicle side, and we have done a very important job – identifying which data is needed and which is not. On average, a car drives for 1.7 hours per day and can encounter many good and extreme driving scenarios, as long as there is a way to identify them. Secondly, there is a relatively large data loop in the cloud, and data quality and distribution are being optimized to avoid too much duplicate data in the same scenario.
Q: How do you rethink the essence of autonomous driving?
Liu Xianming: Looking back, autonomous driving is essentially a problem of physical AI. You need to try to understand the world, do 3D modeling for it, then deduce what will happen, make predictions, and then make the safest and most conscious choice based on these. This is the essence of physical AI.
Autonomous driving itself is the simplest problem in physical AI or Robotaxi, with only two degrees of freedom: forward acceleration and steering wheel angle. Compared to normal robots, it has much smaller degrees of freedom, smaller data space, and easier data acquisition. So the first thing that physical AI does is to do autonomous driving.
Q: Why throw away language?
Liu Xianming: The biggest driving force for AI development in recent years has come from scaling and data scaling, which means constantly training with larger amounts of data. We have seen very good results in language models.
The physical model also has the same problem: if you want to use data on a larger scale, you must dismantle all separation (module boundaries) and make it a self supervised pattern that does not require manual annotation. As long as there is language, it must involve manual filtering or annotation, so I dismantled it and turned it into a very extreme data-driven model.
Q: What are the problems with the existing VLA architecture?
Liu Xianming: Many VLA architectures basically input images and output a Meta action (high-level abstract action instruction) through a large language model. Meta actions are usually text and then processed through text before output. The biggest advantage of this approach is that you have many open-source models to use, and you can directly use open-source NLP models for inference.
But the problem it brings is that you introduce a discretized language output in the middle, which becomes a bottleneck and limits the scale of data usage. A system can only be developed on a large scale without any intermediate bottlenecks.
Q: What is the essence of interaction in the physical world?
Liu Xianming: The essence of interaction in the physical world is actually the direct output of control signals. Why does this large model based on end-to-end video input and action output work? Because when humans perform any action, they need to go through several processes: first, they need to understand how the 3D of the scene is constructed, then make judgments about the future based on past historical information, and finally make the final action according to their own instructions.
If my final output signal is directly behavioral, then it actually includes all the processes of reconstruction, understanding, generation, and final advancement. As long as larger scale data and larger models are used to solve this problem, theoretically it can be solved.
Q: What are the challenges faced in deployment from model to mass production?
Liu Xianming: This is just a demo, a model. From model to final mass production involves deployment issues. In addition to traditional pruning quantization methods, it is more important to place it on the end and chip, so a deployment plan with low latency, high frame rate, and localization is needed. We conducted joint optimization and collaborative design from model to software to compiler to hardware.
Q: Did you choose the world model or the VLA path?
Liu Xianming: People have been asking us which path we have chosen regarding this matter. Actually, there isn’t much difference between these two in essence, they are both extreme end-to-end systems. We still need to return to the fundamental essence and solve the problems of the foundation itself.
Q: How does the concept of “emergence” manifest in the field of robotics?
Milikawa: As the ancient saying goes, ‘quantitative change leads to qualitative change’. After our release last year, we adopted the most difficult generative approach to make the controller. We have been iterating and optimizing from October last year until March this year. Throughout this process, the entire team, including myself, frequently and continuously optimized data and various things, but there was never a qualitative change. On the evening of March 26th, when the team was testing walking backwards, they noticed in the monitor video that it seemed interesting to walk backwards. That day was the turning point of our controller.
It’s unclear which optimization brought about this change, but persist for a sudden leap in the future.
Q: What technology is behind the dancing and cat walking demonstrated by robots?
Mi Liangchuan: The cat steps you see walking are using our third-generation controller; The Tai Chi that everyone just saw is actually the fourth generation.
Q: Can you introduce the intergenerational evolution of controllers?
Mi Liangchuan: Initially, it was a model base and was expected to be used around 2023, but in reality, we had already abandoned it in 2024. We also support MPCC (Model Predictive Control), which is a widely used technology in the industry. Our third generation chose a relatively difficult path, which is human simulation – whether you see cat steps or natural walking, its gait and style are actually embedded in the control model. Its stride itself is not trajectory following or posture following, but generative. For example, the posture of cat steps is always cat steps, including left and right turns.
Q: Why do you choose scenarios such as tour guide, shopping guide, and reception as entry points?
Mi Liangchuan: We determine that with the current capability status of the robot, it can generate effective value in these scenarios. At the same time, new problems will definitely be discovered in practical scenarios, and only in this “integration of knowledge and action” practice process can it truly promote the improvement of abilities. When the ability is improved to a certain extent, new applications will naturally be discovered.
Q: You mentioned that the current movement is completely generative. Can it be understood as robots autonomously climbing up and down, without a remote control behind them, and with an activated large model already working?
Mi Liangchuan: The current control principle of robots, including controllers, is supported as a whole. To operate it, people basically need to tell it the direction and speed, it only needs these two things.
Direction and speed can be generated from upper level models, for example, our navigation model is directly integrated with direction. If it is a remote control, it is actually operated by one person on the joystick; If it is a predetermined arrangement trajectory, it is also done in this way.
Q: When mass producing robots, cost considerations must be taken into account. What are the changes in cost compared to the previous generation?
Mi Liangchuan: The cost issue is divided into two parts, one is what we can do, and the other is what needs to be relied upon by the entire industry.
All the screws of our robots are basically self-developed. This has given us the efficiency of iteration and also provided us with an opportunity to reduce technical costs. But the other part, the most effective cost reduction is to wait until the industry is relatively mature and the supply chain can share and settle down.
Q: Do Xiaopeng Robotics have any cooperative relationships with some robot companies?
Mi Liangchuan: We are also working hard to cooperate with more peers, but at this stage, we are still mainly focusing on self research. Our strategic partners collaborate more on hardware, including some local technologies.


















暂无评论内容