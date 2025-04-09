Dublin, April 09, 2025 (GLOBE NEWSWIRE) -- The "Research Report on AI Foundation Models and Their Applications in Automotive Field, 2024-2025" report has been added to ResearchAndMarkets.com's offering.



Research on AI foundation models and automotive applications: reasoning, cost reduction, and explainability



Reasoning capabilities drive up the performance of foundation models.



Since the second half of 2024, foundation model companies inside and outside China have launched their reasoning models, and enhanced the ability of foundation models to handle complex tasks and make decisions independently by using reasoning frameworks like Chain-of-Thought (CoT).



In 2024, reasoning technologies of mainstream foundation models introduced in vehicles primarily revolved around CoT and its variants (e.g., Tree-of-Thought (ToT), Graph-of-Thought (GoT), Forest-of-Thought (FoT)), and combined with generative models (e.g., diffusion models), knowledge graphs, causal reasoning models, cumulative reasoning, and multimodal reasoning chains in different scenarios.



In 2025, the focus of reasoning technology will shift to multimodal reasoning. Common training technologies include instruction fine-tuning, multimodal context learning, and multimodal CoT (M-CoT), and are often enabled by combining multimodal fusion alignment and LLM reasoning technologies.



Explainability bridges trust between AI and users



Before users experience the 'usefulness' of AI, they need to trust it. In 2025, the explainability of AI systems therefore becomes a key factor in increasing the user base of automotive AI. This challenge can be addressed by demonstrating long CoT. The explainability of AI systems can be achieved at three levels: data explainability, model explainability, and post-hoc explainability.



In Li Auto's case, its L3 autonomous driving uses 'AI reasoning visualization technology' to intuitively present the thinking process of end-to-end + VLM models, covering the entire process from physical world perception input to driving decision outputted by the foundation model, enhancing users' trust in intelligent driving systems.



In Li Auto's 'AI reasoning visualization technology'

Various reasoning models' dialogue interfaces also employ a long CoT to break down the reasoning process as well. Examples include DeepSeek R1 which during conversations with users, first presents the decision at each node through a CoT and then provides explanations in natural language.



Additionally, most reasoning models, including Zhipu's GLM-Zero-Preview, Alibaba's QwQ-32B-Preview, and Skywork 4.0 o1, support demonstration of the long CoT reasoning process.



DeepSeek lowers the barrier to introduction of foundation models in vehicles, enabling both performance improvement and cost reduction.



Does the improvement in reasoning capabilities and overall performance mean higher costs? Not necessarily, as seen with DeepSeek's popularity. In early 2025, OEMs have started connecting to DeepSeek, primarily to enhance the comprehensive capabilities of vehicle foundation models as seen in specific applications.



In fact, before DeepSeek models were launched, OEMs had already been developing and iterating their automotive AI foundation models. In the case of cockpit assistant, some of them had completed the initial construction of cockpit assistant solutions, and connected to cloud foundation model suppliers for trial operation or initially determined suppliers, including cloud service providers like Alibaba Cloud, Tencent Cloud, and Zhipu. They connected to DeepSeek in early 2025, valuing the following:



Strong reasoning performance: for example, the R1 reasoning model is comparable to OpenAI o1, and even excels in mathematical logic.



Lower costs: maintain performance while keeping training and reasoning costs at low levels in the industry



By connecting to DeepSeek, OEMs can really reduce the costs of hardware procurement, model training, and maintenance, and also maintain performance, when deploying intelligent driving and cockpit assistants:



Low computing overhead technologies facilitate high-level autonomous driving and technological equality, which means high performance models can be deployed on low-compute automotive chips (e.g., edge computing unit), reducing reliance on expensive GPUs. Combined with DualPipe algorithm and FP8 mixed precision training, these technologies optimize computing power utilization, allowing mid- and low-end vehicles to deploy high-level cockpit and autonomous driving features, accelerating the popularization of intelligent cockpits.



Enhance real-time performance. In driving environments, autonomous driving systems need to process large amounts of sensor data in real time, and cockpit assistants need to respond quickly to user commands, while vehicle computing resources are limited. With lower computing overhead, DeepSeek enables faster processing of sensor data, more efficient use of computing power of intelligent driving chips (DeepSeek realizes 90% utilization of NVIDIA A100 chips during server-side training), and lower latency (e.g., on the Qualcomm 8650 platform, with computing power of 100TOPS, DeepSeek reduces the inference response time from 20 milliseconds to 9-10 milliseconds).



Key Topics Covered:



Overview of AI Foundation Models

Introduction to AI Models

Definition and Features of AI Models

Classification of AI Models

Application Process of AI Models

Introduction to Foundation Models

Classification of Foundation Models

Current Development of Foundation Models in Automotive Industry

Application Scenarios of Foundation Models in Automotive Industry

Application of LLM in Autonomous Driving

Application of VFM in Autonomous Driving

Application of MFM in Autonomous Driving

Analysis of AI Foundation Models of Differing Types

Large Language Models (LLM)

Development History of LLM

Key Capabilities of LLM

Cases of Integration with Other Models

Multimodal Large Language Models (MLLM)

Development and Overview of Large Multimodal Models

Large Multimodal Models VS. Large Single-modal Models

Technology Panorama of Large Multimodal Models

Multimodal Information Representation

Multimodal Large Language Models (MLLM)

Architecture and Core Components of MLLM

Status Quo of MLLM

Dataset Evaluation by Different MLLM Representatives

Reasoning Capabilities of MLLM

Synergy between MLLM and Agent

MLLM in VQA

MLLM in Autonomous Driving

Vision-Language Models (VLM) and Vision-Language-Action (VLA) Models

Development History of VLM

Application of VLM

Architecture of VLM

Evolution of VLM in Intelligent Driving

End-to-end Autonomous Driving

Combination with Gaussian Framework

VLM2VLA

VLA Models

Principles of VLA

Classification of VLA Models

Application Cases of VLA

Core Functions of End-to-End Multimodal Model for Autonomous Driving (EMMA)

World Model Construction

Improve Vision-Language Navigation Capabilities

VLA Generalization Enhancement

Computing Overhead of VLA

World Models

Key Definitions of World Models and Application Development

Basic Architecture of World Models

Framework Setup and Implementation Challenges of World Models

Video Generation Methods Based on Transformer and Diffusion Models

Technical Principle and Path of WorldDreamer

World Models and End-to-end Intelligent Driving

Tesla World Model

NVIDIA

InfinityDrive

Worlds Labs Spatial Intelligence

NIO

1X's ' World Model'

Common Technologies in AI Foundation Models

Common Foundation Model Algorithms and Architectures

Comparison of Features and Application Scenarios between Foundation Model Algorithms

Foundation Model Architectures and Related Algorithms

Transformer

KAN

MAMBA

Applicability of CNN in the Era of Foundation Models

Applicability of RNN Variants in the Era of Foundation Models

Visual Processing Algorithms

Common Vision Algorithms

ViT

CLIP Scenarios and Features

CLIP Workflow

LLaVA Model

Training and Fine-Tuning Technologies

Foundation Model Training Process

Training Case: Geely's CPT Enhancement Solution

Instruction Fine-tuning

Training Case: Geely's Fine-tuning Framework for Multi-round Dialogues

Reinforcement Learning

Introduction to Reinforcement Learning

Reinforcement Learning Process

Comparison between Some Reinforcement Learning Technology Routes

Knowledge Graphs

Optimization Directions for Retrieval-Augmented Generation (RAG)

Evolution Directions of RAG : KAG: CAG: GraphRAG

RAG Application Case: Li Auto

RAG Application Case: Geely

Comparison between RAG Routes

Function Call

Reasoning Technologies

Reasoning Process of Transformer Models

Evaluation of Reasoning Capabilities

Three Optimization Directions for Foundation Model Reasoning

Reasoning Task Types

Common Reasoning Algorithm

Comparison between Common Reasoning Algorithms

Reasoning Case 1: Geely

Reasoning Case 2: NVIDIA

Sparsification

Characteristics of MoE Architecture

Principles of MoE Architecture

MoE Training Strategies

Advantages and Challenges of MoE

MoE Models from Different Foundation Model Companies

Evolution Direction of MoE

Generation Technologies

Introduction to Generative Models

Comparison between Generation Technologies

Case 1: Li Auto

Case 2: XPeng

Case 3: SAIC

AI Foundation Model Companies

OpenAI

SORA

Google

Meta

Anthropic

Mistral AI

Amazon

Stability AI

xAI

SenseTime

Alibaba Cloud

Baidu AI Cloud

Tencent Cloud

Huawei

Zhipu AI

Flytek

DeepSeek

Application Cases of AI Foundation Models in Automotive

Cockpit Cases

Lenovo's AI Vehicle Computing Framework Used in Cockpits

In-cabin Functions of Thundersoft's Rubik Foundation Model

LLM Empowers Smart Eye's DMS/OMS Assistance System

Application of DIT in Voice Processing Scenarios

Application of Unisound's Shanhai Model in Cockpits

Phoenix Auto Intelligence's Cockpit Smart Brain

Intelligent Driving Cases

Li Auto

Geely

Waymo: Generative World Model GAIA-1

Tesla

Giga's World Model

Application Trends of AI Foundation Models

Algorithm

Computing Power

Engineering

