Strada-LLM: A Graph-Enhanced Large Language Model for Spatio-Temporal Traffic Forecasting

1. Introduction

Traffic prediction is the cornerstone of Intelligent Transportation Systems (ITS). Accurate prediction directly impacts operational efficiency, safety, and urban planning. The core challenge lies in theheterogeneityof traffic conditions across different locations, which leads to highly diverse data distributions, making it difficult for traditional models to achieve cross-scenario generalization. While Large Language Models (LLMs) have shown potential in few-shot learning for such dynamic scenarios, existing LLM-based solutions often rely on prompt tuning and struggle to fully capture the inherent complexgraph relationships和spatio-temporal dependenciesof traffic networks. This limitation hinders the model's adaptability and interpretability in practical applications.

Strada-LLM is proposed to bridge these gaps. It is a novelmultivariate probabilistic forecasting large language model, capable of explicitly modeling temporal and spatial traffic patterns. By incorporating neighboring traffic information as covariates and employing a lightweight domain adaptation strategy, Strada-LLM aims to surpass existing prompt-based large language models and traditional Graph Neural Network (GNN) models, especially in data-sparse or novel network scenarios.

2. Hanyar Aiki

2.1. Tsarin Samfurin

The architecture of Strada-LLM is designed to integrate the sequence modeling capabilities of LLMs with the structural inductive biases of GNNs. Its core idea is to treat the traffic network as a graph $G = (V, E)$, where nodes $V$ represent sensors or road segments, and edges $E$ represent spatial connectivity. Historical traffic data (e.g., speed, flow) forms a multivariate time series $X \in \mathbb{R}^{N \times T \times C}$ with $C$ channels for $N$ nodes over $T$ time steps.

The model processes this data through a dual-path encoder: (1) aTemporal Encoder(based on LLM backbones like GPT or LLaMA) captures long-term dependencies and periodic patterns within each node's time series. (2) aSpatial Encoder(A lightweight GNN) operates on the graph structure, aggregating information from neighboring nodes to capture the transfer and feedback effects mentioned in the introduction. The outputs of these encoders are fused to create a spatio-temporally information-rich representation.

2.2. Haɗakar Masu Haɗin Kai na Kusa

A key innovation is the use ofproximate traffic information as covariates. Strada-LLM does not rely solely on the historical data of the target node but also conditions its predictions on the recent states of topologically adjacent nodes. Formally, for a target node $i$ at time $t$, the input includes $X_i^{(t-H:t)}$ and $\{X_j^{(t-H:t)} | j \in \mathcal{N}(i)\}$, where $\mathcal{N}(i)$ is the set of neighbors and $H$ is the historical window. This provides crucial contextual signals about emerging congestion or traffic patterns that exist before they fully manifest at the target location.

2.3. Daidaitawar Yanki bisa Rarraba

To address the distribution shift problem (e.g., a model trained in City A applied to City B), Strada-LLM proposes aParameter-Efficient Domain Adaptation Strategy. Instead of fine-tuning all model parameters, it analyzes the statistical distribution of new target data (e.g., mean, variance, autocorrelation) to identify and update only a small subset of parameters derived from it. This enables the model to adapt quickly under few-shot constraints, making it highly suitable for deployment in diverse urban networks.

3. Technical Details and Mathematical Formulas

The prediction objective is to model the conditional probability of future traffic states:

4. Experimental Results and Analysis

4.1. Datasets and Baseline Models

Evaluation is conducted on standard spatio-temporal traffic datasets, such asPeMS和METR-LA, wannan bayanan sun ƙunshi bayanan saurin zirga-zirga/ɗimbin yawa daga hanyoyin sadarwar na'urori masu auna firikwensin. Samfuran tushe sun haɗa da:

Samfuran lokaci-lokaci na gargajiya: ARIMA, VAR.
Samfuran zurfin koyo: TCN, LSTM.
Samfuran SOTA na tushen GNN: DCRNN, STGCN, GraphWaveNet.
Models based on LLM: Prompt-tuned versions of GPT-3 and LLaMA.

4.2. Performance Metrics

The main metrics for point prediction includeRoot Mean Square Error (RMSE)和Mean Absolute Error (MAE), da kuma don hasashen yuwuwarContinuous Ranked Probability Score (CRPS)。

Ingantaccen aiki

17%

A cikin dogon lokaci na hasashe, idan aka kwatanta da SOTA LLM-driven model, RMSE ya ragu.

Ribar inganci

16%

Compared to full fine-tuning of the LLM backbone, it achieves higher parameter usage efficiency.

Robustness

Minimal

Performance degradation is minimal when switching the LLM backbone (e.g., from GPT to LLaMA).

4.3. Key Findings

Exceptional predictive accuracy: Strada-LLM consistently outperforms all baseline models, particularly in long-term forecasting (e.g., 60-90 minutes ahead). Compared to prompt-based LLMs, it achieves a 17% improvement in RMSE, highlighting the importance of explicitly modeling graph structures.

Effective Few-Shot Adaptation: The distribution-based adaptation strategy enables Strada-LLM to achieve over 90% of its peak performance after observing only a few days of sample data from a new city, demonstrating exceptional data efficiency.

Interpretability: By analyzing the attention weights in the LLM time encoder and the edge weights learned in the GNN, the model can provide insights, revealing which historical time points and which neighboring nodes are most influential for specific predictions.

5. Tsarin Bincike: Fahimta ta Tsaki da Sharhi

Core Insights

Strada-LLM ba kawai wani AI model na zirga-zirga ba; shi neHanyar haɗin hankalina dabarar zato. Marubutan sun yi daidai cewa, ga bayanai masu tsari, masu alaƙa kamar hanyoyin zirga-zirga, gyara guda ɗaya na LLM ta hanyar turawa kawai hanya ce marar fita. Babban fahimtarsu ita ce, LLM ya kamata ya zamainjin tunani na lokaci, yayin da GNN ke aiki a matsayinmai tara tsarin sararin samaniya. Wannan hanya ce mafi dacewa da ƙa'idodin ƙira fiye da ƙoƙarin sarrafa komai ta hanyar turawa rubutu, kama da yadda samfuran gani-harshe ke amfani da na'urori masu ɗaukar hoto daban don hoto da rubutu.

Tsarin tunani

Its logic is highly persuasive: 1) Traffic has an inherent graph structure → use GNN. 2) Traffic time series have complex long-term dependencies → use LLM. 3) Simply combining them crudely leads to massive parameters and potential modality misalignment → design a focused fusion mechanism incorporating proximity covariates. 4) Practical deployment faces distribution shifts → invent a lightweight, statistics-based adapter. This is a textbook example of problem decomposition in machine learning system design.

Strengths and Weaknesses

Strengths: Parameter-efficient domain adaptation is the paper's killer feature for real-world feasibility. It directly addresses the "cold-start" problem in city-scale ITS deployment. The focus on probabilistic forecasting is also commendable, moving beyond point estimates towards uncertainty quantification, which is crucial for risk-aware decision-making in transportation.

Weaknesses and Open Questions: The elephant in the room isComputational CostAlthough more efficient than full fine-tuning, running an LLM backbone (even a 7-billion-parameter model) in real-time for hundreds of sensors is no easy task. This paper lacks a rigorous analysis of online prediction latency. Furthermore, it assumes the "graph" is static (the road network). It ignores dynamic graphs that can represent temporary events (such as accidents or road closures), which is a frontier explored in works likeDynamic Graph Neural Networks(Pareja et al., NeurIPS 2020). The evaluation on standard benchmarks is solid, but a true stress test would require involving more diverse urban mixes (e.g., European grid layouts vs. American sprawl layouts).

Actionable Insights

For Practitioners:First pilot this architecture at the corridor-level management, rather than city-wide, to manage computational costs. The domain adaptation module can be extracted and potentially used with other spatio-temporal models. For Researchers: The biggest opportunity lies in usingFoundation models specialized for time series(Kam yadda TimesFM na Google) ya maye gurbin babban tsarin LLM na gama-gari, wannan na iya haɓaka inganci sosai. Wata hanyar kuma ita ce haɓaka bayanan waje (yanayi, al'amura) ba a matsayin masu haɗin kai mai sauƙi ba, amma ta hanyar haɗakar nau'ikan hanyoyin sadarwa da yawa, ƙirƙirar ainihin samfurin "tagwayen dijital na birni".

6. Fata da Hanyoyin Gaba

Gajeren lokaci (shekaru 1-3): A cibiyar sarrafa zirga-zirga, ana amfani da shi donHasashen cunkoso da ragewa. Strada-LLM na iya ba da tallafi ga tsarin sarrafa alamun zirga-zirga mai ƙarfi, wanda ke daidaita lokacin sigina bisa ga hasashen kwararar. Ƙarfin sa na daidaitawa da ƴan samfuri ya sa ya dace daGudanar da Al'amuran Musamman(Wasannin motsa jiki, kide-kide), waɗannan fage suna da ƙarancin bayanan tarihi amma tsarin yana bayyana cikin sauri.

Mid-term (3-5 years): 与Autonomous Vehicle (AV) Path Planning SystemIntegration. Autonomous fleets can leverage Strada-LLM's probabilistic predictions to assess the risk of different routes, optimizing not only current travel time but also the stability and reliability of forecasts. It can also enhanceFreight and Logistics Planning。

Long-term and Research Frontiers:

Generative Urban Planning: Using Strada-LLM as a simulator to evaluate the impact of proposed infrastructure changes (new roads, zoning regulations) on traffic.
Multimodal Integration: Beyond vehicle traffic, modeling integrated mobility—including pedestrian flow, shared bike demand, and public transit occupancy—requires heterogeneous graph representation.
Causal Inference: Shifting from correlation to causality. Can the model answer "what-if" questions, such as the precise impact of closing a specific lane? This aligns with the growing field of causal representation learning.
Mobility Foundation Model: The architecture of Strada-LLM can be scaled and pre-trained on global traffic data to create a foundation model for all spatio-temporal prediction tasks in urban environments.

7. References

Moghadas, S. M., Cornelis, B., Alahi, A., & Munteanu, A. (2025). Strada-LLM: Graph LLM for traffic prediction. Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '25).
Vaswani, A., et al. (2017). Attention is All You Need. Advances in Neural Information Processing Systems 30 (NeurIPS 2017).
Kipf, T. N., & Welling, M. (2017). Semi-Supervised Classification with Graph Convolutional Networks. International Conference on Learning Representations (ICLR).
Li, Y., et al. (2018). Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting. International Conference on Learning Representations (ICLR).
Pareja, A., et al. (2020). EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs. Proceedings of the AAAI Conference on Artificial Intelligence.
Wu, N., et al. (2023). TimesFM: A Foundation Model for Time Series Forecasting. Google Research. [Preprint].
OpenStreetMap contributors. (2024). Planet dump. Retrieved from https://www.openstreetmap.org.
California Department of Transportation (Caltrans). (2024). Performance Measurement System (PeMS). Retrieved from http://pems.dot.ca.gov.