new

Get trending papers in your email inbox!

Subscribe

Daily Papers

byAK and the research community

Nov 12

Facing Off World Model Backbones: RNNs, Transformers, and S4

World models are a fundamental component in model-based reinforcement learning (MBRL). To perform temporally extended and consistent simulations of the future in partially observable environments, world models need to possess long-term memory. However, state-of-the-art MBRL agents, such as Dreamer, predominantly employ recurrent neural networks (RNNs) as their world model backbone, which have limited memory capacity. In this paper, we seek to explore alternative world model backbones for improving long-term memory. In particular, we investigate the effectiveness of Transformers and Structured State Space Sequence (S4) models, motivated by their remarkable ability to capture long-range dependencies in low-dimensional sequences and their complementary strengths. We propose S4WM, the first world model compatible with parallelizable SSMs including S4 and its variants. By incorporating latent variable modeling, S4WM can efficiently generate high-dimensional image sequences through latent imagination. Furthermore, we extensively compare RNN-, Transformer-, and S4-based world models across four sets of environments, which we have tailored to assess crucial memory capabilities of world models, including long-term imagination, context-dependent recall, reward prediction, and memory-based reasoning. Our findings demonstrate that S4WM outperforms Transformer-based world models in terms of long-term memory, while exhibiting greater efficiency during training and imagination. These results pave the way for the development of stronger MBRL agents.

  • 3 authors
·
Jul 5, 2023

A Survey on Structured State Space Sequence (S4) Models

Recent advancements in sequence modeling have led to the emergence of Structured State Space Models (SSMs) as an efficient alternative to Recurrent Neural Networks (RNNs) and Transformers, addressing challenges in long-range dependency modeling and computational efficiency. While RNNs suffer from vanishing gradients and sequential inefficiencies, and Transformers face quadratic complexity, SSMs leverage structured recurrence and state-space representations to achieve superior long-sequence processing with linear or near-linear complexity. This survey provides a comprehensive review of SSMs, tracing their evolution from the foundational S4 model to its successors like Mamba, Simplified Structured State Space Sequence Model (S5), and Jamba, highlighting their improvements in computational efficiency, memory optimization, and inference speed. By comparing SSMs with traditional sequence models across domains such as natural language processing (NLP), speech recognition, vision, and time-series forecasting, we demonstrate their advantages in handling long-range dependencies while reducing computational overhead. Despite their potential, challenges remain in areas such as training optimization, hybrid modeling, and interpretability. This survey serves as a structured guide for researchers and practitioners, detailing the advancements, trade-offs, and future directions of SSM-based architectures in AI and deep learning.

  • 6 authors
·
Mar 21 1

IXPE Observation of the Low-Synchrotron Peaked Blazar S4 0954+65 During An Optical-X-ray Flare

The X-ray polarization observations made possible with the Imaging X-ray Polarimetry Explorer (IXPE) offer new ways of probing high-energy emission processes in astrophysical jets from blazars. Here we report on the first X-ray polarization observation of the blazar S4 0954+65 in a high optical and X-ray state. During our multi-wavelength campaign on the source, we detected an optical flare whose peak coincided with the peak of an X-ray flare. This optical-X-ray flare most likely took place in a feature moving along the parsec-scale jet, imaged at 43 GHz by the Very Long Baseline Array. The 43 GHz polarization angle of the moving component underwent a rotation near the time of the flare. In the optical band, prior to the IXPE observation, we measured the polarization angle to be aligned with the jet axis. In contrast, during the optical flare the optical polarization angle was perpendicular to the jet axis; after the flare, it reverted to being parallel to the jet axis. Due to the smooth behavior of the optical polarization angle during the flare, we favor shocks as the main acceleration mechanism. We also infer that the ambient magnetic field lines in the jet were parallel to the jet position angle. The average degree of optical polarization during the IXPE observation was (14.3pm4.1)%. Despite the flare, we only detected an upper limit of 14% (at 3sigma level) on the X-ray polarization degree; although a reasonable assumption on the X-ray polarization angle results in an upper limit of 8.8% (3sigma). We model the spectral energy distribution (SED) and spectral polarization distribution (SPD) of S4 0954+65 with leptonic (synchrotron self-Compton) and hadronic (proton and pair synchrotron) models. The constraints we obtain with our combined multi-wavelength polarization observations and SED modeling tentatively disfavor hadronic models for the X-ray emission in S4 0954+65.

  • 137 authors
·
Nov 25, 2024