The Mamba model represents a cutting-edge model of Deep Learning (DL) in Large Language Models. The Mamba model has also been applied in Computer Vision, Prediction, Internet of Things, and Compute Compression. The potential of the Mamba model still needs to be continuously explored by the academic community. With optimized algorithms and architecture, Mamba models demonstrate significant advantages in processing and analyzing images with high efficiency, accuracy, and robustness. The Mamba model is one of the prominent models in real-time processing, large-scale data analysis, and complex tasks. This research compares the Mamba, Transformer, Recurrent Neural Network (RNN), and State Space Models (SSMs) through a literature review. This research focuses on the applications of the Mamba model in various domains and analyzes the potential application areas. The review explores the current trend and potential of applying the Mamba model to domains and identifies the shortcomings of Mamba modeling applications. This research bridges the gap in the overview of the Mamba model. The research provided researchers with a complete knowledge framework. Other scholars conducting research on deep learning can use this article as a reference to select more suitable models for their research fields. This research will also continue to monitor the development of the Mamba model and assist the academic community in exploring the limits of the potential of the Mamba model.
📄 Full text (68,751 characters)extracted from the PDF · click to expand
International Journal of Computing and Digital Systems
2024, VOL. 17, NO. 1, 1–10
http://dx.doi.org/10.12785/ijcds/XXXXXX
Evaluating Mamba Model for Future Applications: A Systematic Literature Review
Duan Shiming1, Mustafa Muwafak Alobaedy1, Mohammmd Kazem Chamran1 and Hafizul Othman2
1Faculty of Information Technology, City University Malaysia, Cyberjaya Selangor, Malaysia
2Faculty of Information Science and Engineering, Management Science University, Shah Alam Selangor, Malaysia
Abstract: The Mamba model represents a cutting-edge model of Deep Learning (DL) in Large Language Models. The Mamba model has also been applied in Computer Vision, Prediction, Internet of Things, and Compute Compression. The potential of the Mamba model still needs to be continuously explored by the academic community. With optimized algorithms and architecture, Mamba models demonstrate significant advantages in processing and analyzing images with high efficiency, accuracy, and robustness. The Mamba model is one of the prominent models in real-time processing, large-scale data analysis, and complex tasks. This research compares the Mamba, Transformer, Recurrent Neural Network (RNN), and State Space Models (SSMs) through a literature review. This research focuses on the applications of the Mamba model in various domains and analyzes the potential application areas. The review explores the current trend and potential of applying the Mamba model to domains and identifies the shortcomings of Mamba modeling applications. This research bridges the gap in the overview of the Mamba model. The research provided researchers with a complete knowledge framework. Other scholars conducting research on deep learning can use this article as a reference to select more suitable models for their research fields. This research will also continue to monitor the development of the Mamba model and assist the academic community in exploring the limits of the potential of the Mamba model.
Keywords: Mamba, Transformer, SSMs, Deep Learning
1. INTRODUCTION
Deep Learning (DL) has revolutionized the field of artificial intelligence, enabling remarkable progress in tasks such as image recognition, speech processing, and Natural Language Processing (NLP). However, as the demand for more efficient and scalable solutions grows, researchers continue to explore innovative approaches to optimize DL models for training and inference. The Mamba fundamental model aimed to improve the training and inference, setting a new standard for performance in NLP applications [1].
Mamba is a kind of temporal sequence model. The temporal sequence model is a mathematical approach that involves curve fitting and parameter estimation based on observed time series data from a system. It is an analytical tool for predicting and analyzing time series data, enabling the description of dynamic characteristics over time. The primary objective of employing temporal sequence models is to comprehend data dynamics and forecast future values by capturing underlying trends.
Mamba can efficiently complete computationally intensive tasks, significantly reducing computation time while enhancing processing capacity. Mamba provides robust support for scientific research, machine learning, and data analysis applications. A key innovation of Mamba lies in its ability to address the complex challenges inherent in Transformer models [2] by compressing information and selectively excluding historical noise, thereby improving efficiency and scalability in existing tasks.
To systematically evaluate the impact and potential of Mamba, this review employs a comprehensive analytical approach, examining 113 articles published between the release of Mamba in 2023 and June 2024. The research aims to trace the developmental trajectory of Mamba, comparing its performance and architectural innovations with established models such as Transformer, Recurrent Neural Networks (RNNs) [3], and State Space Models (SSMs) [4]. Additionally, the review explores Mamba's diverse application areas and prospects and highlights its versatility across domains.
The review will employ a multi-stage analytical framework. By systematically analyzing the existing body of literature, this research will provide valuable insights for researchers in selecting appropriate domains for Mamba implementation, ultimately contributing to more effective and targeted utilization of this promising architectural paradigm. The findings will also help in identifying key research directions that could maximize Mamba's potential while addressing current limitations in the field.
E-mail address: duan_shiming@qq.com, new.technology@hotmail.com, kazem.chamran@city.edu.my, hafizul_othman@msu.edu.my
The structure of this paper is organized as follows: Section 2 systematically elaborates the research methodology for the literature review, encompassing the research core, strategy selection, and framework design. Section 3 presents the research findings in detail across multiple dimensions, specifically including temporal distribution analysis, domain-specific characteristics' analysis, portrait analysis, comparative study of research features, and relevant technology evaluation. Section 4 provides a comprehensive discussion of the research conclusions and their academic significance, with particular emphasis on the theoretical implications, practical impacts, and research limitations of the findings, while proposing potential directions for future investigations. The paper concludes with a synthesizing discussion that encapsulates the findings of the research endeavor.
2. Literature Review Methods
This search strategy for the systematic literature review on Mamba is inspired by PRISMA method [5] which involves a comprehensive analysis of the model's potential and application trends through a structured approach.
A. Research Core Dimensions
To thoroughly assess the potential of the Mamba model and its application trends, this review will focus on two core dimensions:
1) Comparative Model Analysis: By systematically comparing the Mamba model with similar models (such as Transformer, RNN, and SSM) in terms of architectural design and performance. This review aims to provide a detailed insight into the Mamba model. This analysis will highlight its unique advantages and limitations, offering a nuanced understanding of its position within the landscape of existing models.
2) Exploration of Application Trends: Based on existing literature, this review will analyze the distribution of the Mamba model across various application domains. By examining keywords and related models, the review will construct the mapping between the design features of the Mamba model and its application areas. This section summarizes the current state of applications and predicts potential future applications, identifying underdeveloped areas, and providing directional guidance for subsequent research and practice.
B. Literature Search Strategy
The Mamba model was introduced in 2023 and, as an emerging DL model, its related research and applications are still in a rapidly evolving exploratory phase. To comprehensively capture the research progress and application trends of the Mamba model, this review employs a systematic literature retrieval approach during the data collection phase. Specifically, Google Scholar serves as the main source of data, with multiple rounds of keyword optimization and expanded search strategies to ensure coverage of diverse literature sources, including academic journals, conference papers, technical reports, and white papers. By adopting a broad search scope, this review retrieves research directly related to the Mamba model and incorporates literature from related technical fields to minimize potential biases due to overlooked studies. Additionally, this review employs citation backtracking methods to identify potentially missed related literatures, thereby constructing a comprehensive and reliable dataset. This approach generates a dataset that supports subsequent model analysis and application trend research, ensuring a scientifically robust basis for the review.
This review constructs the dataset of the literature through a systematic feature framework, which includes the following core components:
1) Research Attributes
This section consists of the criteria used to comprehensively describe the core content of the literature:
- Title: The title of the literature
- Authors: Author information
- Domain: Research field, including primary and subordinate fields, used to analyze the application areas and current status of the literature, and to support predictions of future application trends
- Publication Type: Type of literature (e.g., journal article, conference paper, etc.)
- Source: Source of the literature (e.g., journal name, conference name, etc.)
- Date: Publication date
- Problem Statement: Statement of the research problem
- Objective: Research objective
- Methods/Algorithm: Research methods or algorithms
- Dataset: Datasets used
- Findings: Research findings
- Future Work: Directions for future research
- Comments: Additional comments or notes
- Referenced Count: Number of citations
2) Project Detail
This section records detailed information about the project to which the literature belongs, including:
- Release Date: Project release date
- Number of Followers: Indicator of the project's popularity
- Project URL: Links or resources related to the project
These details are used to assess the practical application value of the methods proposed in the literature and to determine whether they have been validated or adopted by others.
3) Related Models
This field contains six subfields, used to describe models and compare models from literature. Through this field, the innovative aspects of literature and its connections to existing models can be quickly identified.
4) Application Area
This section analyzes the potential application areas of the Mamba model reviewed in the literature. By identifying similar problems across different domains, the applicability of the model to other areas can be inferred, thereby expanding its scope of application.
C. Systematic Literature Review Framework
This systematic review investigates the content of the literature and provides data for evaluating Mamba model applications value, predicting the next type of applications trends, and exploring cross-domain applications. The systematic review flow is divided into five main phases: (i) Data Search Phase, (ii) Determination of Index Keywords, (iii) Determination of Search Results, (iv) Literature Analysis, and (v) Literature Categorization and Presentation as shown in Figure 1.
The following are the detailed explanation of each phase:
1) Data Search Phase
This review conducted literature retrieval through Google Scholar. Given that the Mamba model was in its initial development stage between December 2023 and June 2024, it is a challenge to cover all domains. Therefore, in this stage, the review selected all the indexed literature that could be searched as the initial data source to ensure its comprehensiveness.
2) Determination of Index Keywords
Based on preliminary research, this review filtered out literatures that are unrelated to the Mamba model. The output from this phase are 113 articles that are explicitly contained the word "Mamba" in their title or keywords.
3) Determination of Search Results
Through further analysis of the filtered literature, this review constructed detailed information for each article. The literature dataset includes the following core components:
Research Detail: title, authors, domain, and methods.
Project Detail: release date, number of followers, and project URL.
Keywords: used to summarize the core content of the literature.
Related Model: describing models and solutions related to or reviewed in literature.
Figure 1: Literature Review Flow Chart
4) Literature Analysis
Collecting the literature dataset, this review conducted a systematic analysis of each article to extract its core contributions, technical features, and application potential.
5) Literature Categorization and Presentation
To evaluate the applications and potential trends of the Mamba model, this review adopts a systematic review approach, utilizing multi-dimensional data analysis to construct a comprehensive evaluation framework. In the result's presentation section, the review employs visual analysis methods to display data collected through bibliometric approaches using various chart formats and to build a detailed literature knowledge graph.
The review begins with bibliometric analysis, extracting relevant literature data from research index. Subsequently, based on literature data, the review conducts thematic clustering analysis of research related to the Mamba model to identify major research directions and hotspots. Then, through feature mapping methods, the review systematically compares the literature analysis results with the intrinsic features of the Mamba model, focusing on evaluating its innovative applications in areas such as sequence modeling, long-range dependency handling, and computational efficiency.
The review constructs a literature keyword extraction and analysis using Python. The research begins with NLP techniques to preprocess the collected literature data, including text cleaning preprocessing steps. Subsequently, keywords are extracted to effectively represent the core research content and academic domain of literature.
Based on this, the review applies co-word analysis to statistically analyze keyword frequency analysis. This approach accurately identifies hot topics in Mamba model research and tracks the evolution of research trends. Consequently, it systematically reveals the research trajectory and application trends of the Mamba model. The specific research methodology is illustrated in Figure 2.
In the research process illustrated in Figure 2, the data processing stage primarily focuses on two key steps: feature extraction and text preprocessing.
1) Feature Extraction
At this stage, a structured approach is used to systematically extract research attributes from the literature, covering dimensions.
2) Text Preprocessing
A multi-level text processing framework is constructed, consisting of the following steps:
Standardization: This includes case normalization, removal of special characters, and other preprocessing operations.
Terminology Normalization: A domain-specific lexicon is used to standardize terms.
Stopword Filtering: A stopword list is applied to remove irrelevant words.
3) Application-Oriented Perspective in Analysis
Base on the different researchers may adopt diverse expressions and perspectives for the same research topic, this review places particular emphasis on an application-driven analysis of the literature.
4) Keyword Segmentation Strategy
A combination of multiple independent keywords is employed instead of fixed phrases. Avoiding semantic bias that may arise from predefined word groups.
5) Empirical Validation
Through keyword segmentation strategy demonstrates significant advantages in both accuracy of literature representation and reliability of subsequent analysis.
3. Results of Analysis
This section aims to systematically show the implementation process and analytical results of the review methods proposed. The section will elaborate on five core dimensions: First, time analysis will reveal the development trends of the Mamba model. Second, domain analysis will explore the distribution of the Mamba model across different domains. Third, portrait analysis will be employed to construct a feature of the Mamba model. Fourth, based on the analytical results, a systematic comparison is conducted on the architectural design and features of the Mamba model. Finally, through relational model analysis, the distinctions and connections between the Mamba model and other similar models are thoroughly examined.
A. Time Analysis
As shown in Figure 3, through systematic collection and analysis of literature related to the Mamba model, reveals a notable phased distribution in the timeline of the publications. The literature reviewed was concentrated between February and June 2024. The review results in 113 articles, however, 86 articles were included, and 27 articles were excluded due to the absence of detailed publication dates Notably, since its initial release in December 2023, the Mamba model has garnered significant attention in the academic community, resulting in 86 related research articles within just five months. Particularly, between April and May 2024, the number of publications showed an almost exponential increase, underscoring the strong academic response to the model.
Given the typical academic publishing cycle, it is reasonable to infer that many studies published in April-May 2024 were submitted shortly after the Mamba model's release, reflecting the research community's rapid and enthusiastic response. Early investigations predominantly focused on the model's theoretical underpinnings and application potential, particularly in conjunction with SSMs. Researchers quickly grasped its core capabilities and explored its use in addressing domain-specific challenges. This swift engagement accelerated the model's theoretical development, introduced new methodological perspectives, and affirmed Mamba's emerging significance in deep learning, suggesting strong potential for broader future impact.
Figure 2: Literature Keyword Extraction and Analysis
Figure 3: Distribution of the Documents in each month
B. Domain Analysis
While the Mamba model has shown promising applications across various domains (including CV, prediction, and natural language processing), it is not universally applicable to all DL tasks. Preliminary studies have validated its potential in specific areas; however, its adaptability and performance remain inconsistent across different scenarios. A more systematic investigation is still needed to identify its optimal application domains and fully understand its capabilities in emerging fields.
The Mamba model has demonstrated exceptional capabilities in three key computational areas: data compression, processing acceleration, and efficient information handling [1].
Through rigorous analysis, this section aims to establish a comprehensive understanding of what transformative effects they introduce to existing methodologies. The complete dataset of analyzed publications is presented in Table I.
The bibliometric analysis of the articles cataloged in Table I reveals two significant research trends: (i) Most models develop hybrid architectures by integrating the Mamba model with efficient models from their respective domains. (ii) The Mamba model continues to advance in CV, injecting new vitality through its unique state-space mechanism. Particularly in the medical field, the Mamba model demonstrates a trend of interdisciplinary integration. These publications collectively demonstrate Mamba's substantial potential for driving innovation across multiple disciplines, particularly in domains requiring efficient long-range dependency modeling.
A particularly noteworthy finding emerges from review [45], [48], which provides detailed insights into the Mamba model's behavior regarding sequence order sensitivity. The research indicates that while sequence order does exhibit measurable effects on model performance, these effects remain within statistically insignificant ranges (p >0.05 in all test conditions). This characteristic contributes to the model's robustness and operational flexibility, making Mamba model particularly suitable for applications where sequence-specific dependencies are critical.
The collective three key advantages of the Mamba model: (i) exceptional adaptability to diverse data structures and patterns. (ii) Scalable architecture suitable for both small-scale and large-scale applications. (iii) Consistent performance across varying sequence conditions.
Figure 4 illustrates the distribution of research areas stemming from the Mamba model. The analysis begins with the Mamba model as the root, mapping its application across multiple fields, including CV, language processing, healthcare, point cloud analysis, Internet of Things (IoT), and decision-making domains. The disproportionate focus on CV suggests an alignment between the model's computational efficiency and the practical needs of researchers and industries. In language modeling, 9 studies related to the Mamba language model exist, accounting for about 9.8% of the reviewed articles in this research. The original Mamba models were developed as language models and are known for being more efficient and less complex. However, derivative research has shown a significant concentration in the CV domain, as highlighted in [1]. This trend sum up to the lower computational requirements for CV tasks than language models. Furthermore, the demand for advancements in CV applications is notably higher, driven by a larger and more active research community within this field.
Specifically, Mamba utilizes linear complexity features, which enable efficient processing of large-scale, high-resolution visual data. From the perspective of data features, the data of visual tasks typically exhibit high-dimensional, dense characteristics, so Mamba stands out in handling visual tasks. Mamba's selective information filtering mechanism can more accurately retain and process critical information, which is necessary for object recognition, feature extraction, and noise filtering in visual tasks. In contrast, context-dependent tasks in language tasks may require more complex semantic modeling capabilities. Visual tasks often require real-time or near-real-time responses. Mamba's efficient computing capabilities match these scenarios. That is the reason why several of Mamba's research focuses more on solving visual problems and optimize the algorithm mainly for visual tasks.
TABLE I. Mamba Related Literature
Domain | Sub-domain | Method | Count
CV | Medical | MENTALITY [6]; VM-UNet [7]; Swin-UMamba [8]; Mamba-UNet [9]; FD-Vision [10]; P-Mamba [11]; Weak-Mamba-UNet [12]; Semi-Mamba-UNet [13]; MEDMAMBA [14]; MamMIL [15]; VM-UNET-V2 [16]; MD-Dose [17]; ProMamba [18]; Mambaformer [19]; UltraLight VM-UNet [20]; LightM-UNet [21]; T-Mamba [22]; VMambaMorph [23]; Vim4Path [24]; MambaAhnet [25]; AC-MAMBASEG [26]; HC-MAMBA [27]; VM-DDPM [28]; I2I-Mamba [29]; MambaMIR-GAN [30]; Bi-Mamba+ [31]; UU-Mamba [32]; MUCM-Net [33]; P-BTS [34] | 30
CV | Remote Sensing | RS-Mamba [35]; RSDehamba [36]; CM-UNet [37]; Mamba-in-Mamba [38]; Samba [39]; FMSR [40]; Imagery Vision Mamba Scanning Strategies [41] | 7
CV | Model Explanation | [V]-Mamba [42]; Mamba [43]; vision mamba [44]; Vision-Mamba [45]; MambaOut [46]; MLLA [47] | 6
CV | Multi-modal Image Fusion | FusionMamba Efficient Image [48]; MambaDFuse [49]; Fusion-Mamba block (FMB) [50]; FusionMamba Dynamic Feature [51] | 4
CV | Video Understanding | Video Mamba Suite [52]; Simba [53]; Mamba-FETrack [54] | 3
CV | Biologic | U-Mamba [55]; SegMamba [56]; ViM-UNet [57] | 3
CV | Hyperspectral Image | SSMamba [58]; SpectralMamba [59]; 3DSS-Mamba [60] | 3
CV | Traffic Flow | NetMamba [61]; ST-MambaSync [62] | 2
CV | Video | Vivim [63]; SpikeMba [64] | 2
CV | 3D Modeling | Motion Mamba [65] | 1
CV | Others | VMRNN [66]; PTM-Mamba [67]; ZigMa [68]; MambaR [69]; Vim-F [70]; FreqMamba [71]; Retinexmamba [72]; Pan-Mamba [73]; Diffusion Mamba [74]; CU-Mamba [75]; DVMSR [76]; IRSRMamba [77]; ReMamber [78]; OverlapMamba [79]; Sigma [80]; MIM-ISTD [81]; Simple-Mamba [82]; PlainMamba [83]; Matten [84]; GMSR [85]; DiM [86] | 21
Language Model | | VL-Mamba [87]; Cobra [88]; Meteor [89]; Coupled Mamba [90]; SEMamba [91]; BiMamba [92]; Mambaformer [93]; RankMamba [94]; Bi-Mamba+ [95] | 9
Model Explanation | | HiddenMambaAttn [96]; Mamba [97]; SSMs [98] | 3
Audio | | DUAL-PATH MAMBA [99]; AUDIO MAMBA [100]; SSAMBA [101] | 3
Point Cloud | | Point Cloud Mamba [102]; POINT MAMBA [103]; PoinTramba [104] | 3
Decision | | decision-making [105]; hierarchical decision [106] | 2
Health Care | | NeuroNet [107]; MSSC-BiMamba [108] | 2
IOT | | TRAMBA [109]; HARMamba [110] | 2
Prediction | | MambaFormer [111]; DTMamba [112] | 2
Modeling | | Mamba [1] | 1
Relational Analysis | | Graph-Mamba [113] | 1
Trajectory Optimization | | DeMa [114] | 1
Smart Education | | Mamba4KT [115] | 1
Spatial-Temporal Graph Learning | | STG-Mamba [116] | 1
Figure 4: Mamba Application Distribution
Many researchers have addressed the Transformer issue by creating a new structure using the Mamba model. This new structure has caught a widespread focus on medical fields. Autonomous driving, agriculture, robotics, and other areas have the potential for development from the current point of view. The high potential of models in autonomous driving, agriculture, and robotics is mainly attributed to the high alignment of their architectural features with the requirements of these fields. The details are as shown in Table II:
Scholars have focused on remote sensing [26], [35], [28], [41], [36], [37], and these researches have proved that Mamba can process remote sensing data with high efficiency and accuracy. In the direction of spectral data, some researchers use Mamba to assist in processing data [60], and this method is of research significance in terms of application. In the linguistic domain [67], [88], [89], Mamba models have better results than the mainstream Transformer-derived models and can accomplish similar tasks with less arithmetic. In the audio-video domain, Mamba models also have significant advantages over Transformer models, such as [63], [34], [52], [53], [64], [84], [74], [109], [100], [101], which are the state-of-the-art in audio enhancement, segmentation.
C. Portrait Analysis
Figure 5 visually presents the distribution of high-frequency keywords in Mamba-related literature. These keywords reflect current research hotspots and indicate future development trends. This section conduct a detailed analysis of keyword distribution and potential research trends from multiple perspectives and explore the underlying causes of these phenomena.
Overall, the high-frequency keyword distribution in Figure 5 clearly reflects the main research focuses and trends in Mamba-related studies. These keywords highlight the following aspects:
Theoretical & Modeling Foundations: SSM and Mamba serve as the core components, supporting sequence modeling and dynamic system modeling. SSM is a core mathematical modeling method in Mamba research, SSM provides a robust theoretical foundation for modeling dynamic systems and sequential data. SSM is a cornerstone for constructing new models, particularly in multi-model integration research.
Technological Optimization & Efficiency Enhancement: Keywords such as linear, selective, efficient indicate that reducing computational complexity, optimizing global and local feature extraction are key research priorities. The Mamba architecture employs SSMs to implement a selective mechanism, replacing traditional attention and significantly reducing the computational complexity of sequence modeling. By integrating a hardware-aware parallel scan algorithm that optimizes memory access patterns, the architecture achieves a 8x throughput improvement compared to Transformer-based models in long-sequence tasks [1].
Wide Applications: The presence of keywords such as Vision, Segmentation, Medical, Classification, and Language underscores the Mamba model's broad applicability across diverse domains. This reflects its strong potential for practical deployment, while the sustained academic interest indicates a continued interest in investigating and extending its capabilities.
Integration with Advanced Methods: The connection between Transformer, UNet, RNN, and Structured State Space Model (S4) [117] with Mamba shows that researchers are exploring the integration and optimization of Mamba with existing mainstream models. Mamba maintains backward compatibility with S4's application domains while introducing architectural innovations. By incorporating input-dependent gating mechanisms inspired by RNN architectures, the model achieves computational accuracy comparable to traditional recurrent approaches in temporal modeling tasks. Furthermore, through differentiable model composition frameworks, Mamba demonstrates synergistic integration capabilities with domain-specific models, achieving throughput improvements without compromising original model functionality.
D. Model Features Explanation
The most distinctive mechanisms of the Mamba model include linear complexity mechanism, selective information processing mechanism, and hardware optimization mechanism.
- Enhanced Context Modeling.
The Mamba model employs improved context modeling methods that allow the model to capture long-range dependencies better. This method improves positional encoding approaches or hybrid model architecture and optimizes the self-attention mechanism to improve computational efficiency and reduce memory usage. The traditional self-attention mechanism is computationally complex when processing long sequences.
- Efficient Training and Inference.
Mamba models employ more efficient techniques in the training and inference process to speed up training and reduce the consumption of computational resources. This Feature enables Mamba model to be trained on large-scale datasets and maintain low latency during inference.
Figure 5: Distribution of High-frequency Keywords in Literature
TABLE II. Comparison of Requirements and Advantages of Mamba in Autonomous Driving, Agriculture and Robotics
| Autonomous driving | Agriculture | Robotic
Requirements | Autonomous driving requires processing real-time multimodal data. The system needs accurate perception and dynamic planning of the surrounding environment. | Agricultural scenarios involve large-scale data. The kinds of application tasks are complex. | Robots need to sense the environment in real time and make corresponding decisions. The system needs low latency and high robustness.
Advantages of Mamba | Processing long-sequence data efficiently: The architecture that scales linearly through Mamba enables efficient processing of complex sensor data. Selective memory mechanism: can filter irrelevant information, focus on key environmental features, and improve responsiveness to dynamic scenes. | Low computational resource requirements: Agricultural monitoring usually has limited resources, and Mamba's hardware optimization features make it suitable for deployment on edge devices. Powerful feature extraction capability: potential for processing high-dimensional image data and identifying crop details. | Real-time processing capability: Mamba performs well under low-latency conditions and is well suited for real-time decision-making tasks in dynamic environments. Adapt to complex multimodal data: strong ability to process multi-modal data such as vision and speech.
- Task Adaptability.
The Mamba model is highly task-adaptable and can perform well in a wide range of NLP tasks. Through different pre-training strategies and task-specific fine-tuning techniques, Mamba applies to various application scenarios.
Mamba is based on a sequence modeling to compress the context into smaller states. From this perspective of data compression, Mamba allows the model to select and filter out irrelevant information. To explore the unique attention mechanism of the Mamba model, it is first necessary to analyze the attention mechanism of the Transformer model, which is widely used now. The attention mechanism in Transformers behaves paradoxically on the metric of data compression. It does not compress the context, causing the Transformer's linear time inference and multi-dimensional time training to be slow [1].
In sequential models, efficient models must be small and compressed and contain all the necessary information from the context. Sequence models should balance the contradiction between efficiency and effectiveness with the degree of feature compression. Therefore, Mamba model proposes that the basic principle for constructing sequence models is selectivity or the context-aware ability to attend to or filter out inputs to sequence states. In particular, the selection mechanism controls how information propagates or interacts along the sequence dimension.
The Mamba model evolved iteratively from the S4 model. The Mamba model utilizes a circular selection mechanism, replaces convolutional algorithms, and recursion [1]. After weighing, Mamba uses the hardware-aware algorithm as the solution to improve parallel computation issues.
Hardware-aware algorithms mainly utilize GPU to realize states at more efficient levels in the memory hierarchy. In particular, most operations are limited to memory bandwidth. Scanning operations are limited by memory bandwidth, where Mamba uses kernel fusion to reduce the number of memory of inputs and outputs, which leads to significant speedups compared to standard implementations [1].
Based on the previous analysis, some of the strengths of the Mamba model have not yet been fully utilized, primarily in Table III:
The underutilized strengths of the Mamba model include multi-modal fusion, real-time applications, high-dimensional data processing, generative tasks, reinforcement learning, interpretability and trustworthy AI, and few-shot learning. These areas have broad application prospects, and future research could further explore Mamba's potential in these directions.
TABLE III. Comparison table of Domain, Strength, Underutilized Scenarios
Domain | Strength | Underutilized Scenarios
Multi-modal Fusion | The Mamba model, designed based on SSM, possesses powerful sequence modeling capabilities. | Cross-modal Tasks; Joint Multi-modal Modeling; Multi-modal Pre-training Models
Real-time Applications | The Mamba model's high computational efficiency makes it suitable for processing long-sequence data with low memory usage. | Real-time Video Processing; Real-time Speech Translation or Voice Assistants; Real-time Financial Data Prediction and Trading Decisions
High-dimensional Data Processing | The SSM design of the Mamba model is well-suited for handling high-dimensional data. | 3D Point Cloud Segmentation and Classification; Hyperspectral Image Analysis.
Generative Tasks | The Mamba model's long-sequence modeling capabilities and efficient computation make it suitable for generative tasks. | Text Generation; Video Generation
Reinforcement Learning | The Mamba model's long-sequence modeling capabilities are well-suited for handling state sequences in reinforcement learning. | Decision Optimization in Complex Environments; Multi-step Long-sequence Prediction and Planning; Model-based Reinforcement Learning
Interpretability and Trustworthy AI | The Mamba model's SSM-based design offers stronger interpretability. | Medical Diagnosis; Financial Risk Control; Legal and Ethical AI
Few-shot Learning | The Mamba model's efficient computation and long-sequence modeling capabilities make it suitable for few-shot learning. | Few-shot Classification; Few-shot Generation; Few-shot Reinforcement Learning
E. Models Comparison
Here this section briefly review SSM models, RNN models, Transformer models. Comparing the similarities and differences between the Mamba model and the Transformer, and RNN models first requires a brief understanding of these above models. The Transformer model is a DL model architecture originally proposed by Vaswani et al. in their 2017 paper [2]. It is mainly used for processing sequential data and has achieved significant success in NLP.
SSM model is one kind of model for describing the set of state representations and predicting the next state based on exact inputs. However, instead of applying discrete sequences, SSM takes continuous sequences as input and predicts the output sequence. SSM and Mamba's basic equation can mathematically describe the relationship between time and state [118].
The RNN [119] is a neural network model for processing sequence data. Unlike traditional feed-forward neural networks, RNNs can conduct temporal dependencies in input sequences through their recurrent structure. The RNN is used in NLP and for time series prediction. Long Short-Term Memory (LSTM) Networks are the main variant of RNNs. LSTM addresses the issues of gradient vanishing and explosion encountered by standard RNNs during training. LSTM controls the flow of information by introducing a gating mechanism to capture long-term dependencies.
Based on a brief introduction to Transformer, RNN, SSM. In solving the problem of quadratic complexity of Transformer, Mamba's solution is consistent with SSM, so SSMs is used to show the difference between Mamba core modules and Transformer. The Transformer model will require substantial computational resources to handle the problem [1].
When the study compares the Transformer model and the Mamba model, [47] finds that the Mamba model processes high-precision data faster than the Transformer model because it attention mechanism selects a summary of the historical data instead of compressing it. Additionally, the attention mechanism in image processing does not significantly affect the computation accuracy of the Mamba model. At the same time, since the attention mechanism is higher required in the image processing process, the impact of the attention mechanism very important. "Classification", "Segmentation" and "Vision". Most of these fields require high data accuracy and higher image quality, so the advantages of select data are more prominent.
There are two main problems with RNNs. The first problem is that RNNs suffer from long-range dependency. Even though each hidden state in RNNs is a set of all previous hidden states. The hidden states cause RNNs to have the trend of forgetting certain information over time. Additionally, RNNs cannot be trained in parallel, causing slow training [119].
The Mamba model is the latest iteration of the SSMs model. It improves the problem that the Transformer is not good at handling very long sequences. When the Transformer often requires extensive context lengths, the attention mechanism will cause the issues. The Mamba model is better at this kind of task. The tradeoff between the efficiency and effectiveness of sequential models lies in how much they compress the state. Efficient models have a small state (Example: RNN or S4) and contain all the necessary information from the context. Mamba selectively focuses on which messages are important and filters out which messages are ignorable to balance efficiency and effectiveness. Therefore, the Mamba model is the most competitive among these models.
4. Results and discussion
This section discusses the value and future development trajectory of the Mamba model. Based on current research findings, potential directions for theoretical expansion and technical optimization of the Mamba model are proposed. The core advantages and performance boundaries of the Mamba model are summarized, highlighting its unique competitive strengths. The practical implications of the Mamba model for advancing DL are elucidated. Additionally, the limitations of current research are objectively evaluated, providing clear targets for improvement in subsequent studies.
A. Significance of Mamba Model
As energy efficiency becomes an increasingly critical concern, the Mamba model—characterized by its streamlined architecture and optimized GPU utilization—holds significant promise for deployment in embedded systems and edge devices. Demonstrating strong performance in computer vision, natural language processing, and time-series analysis, Mamba is poised to extend its capabilities to additional modalities such as audio and tactile sensing, thereby enabling real-world applications in complex, multi-modal environments. Its robust state modeling capacity also positions it as a valuable tool in reinforcement learning scenarios, particularly for real-time, dynamic decision-making tasks. In the field of medical image analysis and disease prediction, Mamba is expected to play an increasingly important role, especially in resource-constrained clinical settings where efficient and accurate decision support is essential.
Furthermore, Mamba's effectiveness in modeling long-sequence data makes it a strong candidate for tasks requiring memory and temporal reasoning. Through integration with multi-task and transfer learning frameworks, it has the potential to contribute to the development of general artificial intelligence by facilitating cross-domain knowledge transfer.
B. Result Review
The Mamba model builds on SSMs, which generalize traditional RNNs. As an efficient SSM variant, Mamba retains RNN-like sequential modeling capabilities while overcoming their limitations in capturing long-range dependencies and parallel computation. Unlike Transformers, which suffer from quadratic time complexity due to self-attention, Mamba achieves linear computational efficiency, making it more scalable and resource-efficient for sequence processing.
Mamba excels particularly in vision-related tasks. This is largely due to the inefficiencies of attention-based models when handling long image sequences. As data volume increases, so does sequence length and computational overhead, making Mamba's linear structure more suitable for large-scale visual data processing.
While Mamba shows promise across domains in remote sensing, audio, and video analysis, its adoption remains limited compared to Transformers. Moreover, many derivative models have yet to fully exploit Mamba's capabilities. As such, Mamba represents a high-potential but underutilized framework, offering rich opportunities for future research and cross-domain innovation.
C. Research Influence
This research compares the Mamba model with other state-of-the-art models to bridge the gap between the Mamba model and the Transformer model, RNN model, and SSMs model for the academic community. In terms of Mamba applications, this research provides an overview of relevant articles from the release of the Mamba model to June 2024. This research seeks to fill this gap and offers a more comprehensive understanding of Mamba modeling. From a global perspective, this research shows subsequent researchers an overview of the current state of Mamba model applications, including its existing domains. However, without a global overview, it is challenging for researchers to fully understand the strengths and application characteristics of the Mamba model. This research aims to bridge this gap in the overview of Mamba modeling.
This research can inspire cross-disciplinary collaboration and encourage researchers to develop more practical Mamba applications. The importance of the Mamba model literature review lies in its ability to provide a comprehensive summary of the technical features and application limitations of the model, providing researchers with a complete knowledge framework and inspiring the development of derivative Mamba models and the creation of cross-domain Mamba applications.
D. Research Limitations
As research on the Mamba model continues to evolve, its potential in computer vision, time series prediction, and multimodal learning is progressively being uncovered. These emerging application scenarios may involve entirely new data characteristics and task requirements, extending beyond the scope of the current research framework. Consequently, the methods and conclusions presented in this study may prove insufficient to comprehensively guide future developments. In particular, upcoming research is expected to propose more advanced optimization techniques to enhance the model's computational efficiency, generalization ability, and interpretability, potentially rendering some existing findings obsolete or in need of reassessment. Furthermore, the theoretical understanding of the Mamba model remains in its infancy, and future work leveraging mathematical tools may provide deeper insights into its underlying mechanisms, paving the way for more efficient or flexible variants. The integration of Mamba with other cutting-edge technologies may also catalyze novel interdisciplinary research directions, introducing new challenges and opportunities that current studies may not yet anticipate. Therefore, while this study offers valuable insights into the early-stage exploration of the Mamba model, it is limited in its ability to fully capture the diversity and complexity of future research trajectories.
5. Conclusion
This research compares the existing state-of-the-art models: the Transformer model, Mamba model, RNN model, and SSMs model. By analyzing these models, this research shows the relationships, current situation, and issues between them.
This research complements the relationship between Mamba and other advanced technologies while realizing the analysis of the whole Mamba model ecology from a global perspective utilizing an overview. It includes analysis of applications and application potentials. This research draws a blueprint of Mamba modeling for subsequent scholars, who can supplement the corresponding areas based on this research. However, this research still fills the gap in Mamba's global knowledge. In future work, this study will deepen the analysis of research related to the Mamba model, systematically assisting researchers in uncovering its application potential. Furthermore, it will explore optimization strategies for the deployment of the Mamba model on practical hardware platforms. Finally, this study contends that the Mamba model still presents numerous unexplored research directions, which hold the potential to significantly expand its application scope and enhance its performance limits.
References
[1] A. Gu and T. Dao, "Mamba: Linear-time sequence modeling with selective state spaces," arXiv preprint arXiv:2312.00752, 2023.
[2] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, "Attention is all you need," Advances in neural information processing systems, vol. 30, 2017.
[3] W. Zaremba, I. Sutskever, and O. Vinyals, "Recurrent neural network regularization," arXiv preprint arXiv:1409.2329, 2014.
[4] A. Gu, K. Goel, A. Gupta, and C. Ré, "On the parameterization and initialization of diagonal state space models," Advances in Neural Information Processing Systems, vol. 35, pp. 35 971–35 983, 2022.
[5] M. J. Page, J. E. McKenzie, P. M. Bossuyt, I. Boutron, T. C. Hoffmann, C. D. Mulrow, L. Shamseer, J. M. Tetzlaff, E. A. Akl, S. E. Brennan et al., "The prisma 2020 statement: an updated guideline for reporting systematic reviews," bmj, vol. 372, 2021.
[6] S. Panchavati, C. Arnold, and W. Speier, "Mentality: Amamba-based approach towards foundation models for eeg."
[7] J. Ruan, J. Li, and S. Xiang, "Vm-unet: Vision mamba unet for medical image segmentation," arXiv preprint arXiv:2402.02491, 2024.
[8] J. Liu, H. Yang, H.-Y. Zhou, Y. Xi, L. Yu, C. Li, Y. Liang, G. Shi, Y. Yu, S. Zhang et al., "Swin-umamba: Mamba-based unet with imagenet-based pretraining," in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2024, pp. 615–625.
[9] Z. Wang, J. Zheng, Y. Zhang, G. Cui, and L. Li, "Mamba-unet: Unet-like pure visual mamba for medical image segmentation. arxiv 2024," arXiv preprint arXiv:2402.05079.
[10] Z. Zheng and J. Zhang, "Fd-vision mamba for endoscopic exposure correction," arXiv preprint arXiv:2402.06378, 2024.
[11] Z. Ye, T. Chen, F. Wang, H. Zhang, and L. Zhang, "P-mamba: Marrying perona malik diffusion with mamba for efficient pediatric echocardiographic left ventricular segmentation," arXiv preprint arXiv:2402.08506, 2024.
[12] Z. Wang and C. Ma, "Weak-mamba-unet: Visual mamba makes cnn and vit work better for scribble-based medical image segmentation," arXiv preprint arXiv:2402.10887, 2024.
[13] C. Ma and Z. Wang, "Semi-mamba-unet: Pixel-level contrastive and pixel-level cross-supervised visual mamba-based unet for semi-supervised medical image segmentation," arXiv preprint arXiv:2402.07245, 2024.
[14] Y. Yue and Z. Li, "Medmamba: Vision mamba for medical image classification," arXiv preprint arXiv:2403.03849, 2024.
[15] Z. Fang, Y. Wang, Y. Zhang, Z. Wang, J. Zhang, X. Ji, and Y. Zhang, "Mammil: Multiple instance learning for whole slide images with state space models," in 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, 2024, pp. 3200–3205.
[16] M. Zhang, Y. Yu, S. Jin, L. Gu, T. Ling, and X. Tao, "Vm-unet-v2: rethinking vision mamba unet for medical image segmentation," in International Symposium on Bioinformatics Research and Applications. Springer, 2024, pp. 335–346.
[17] L. Fu, X. Li, X. Cai, X. Wang, Y. Shen, and Y. Yao, "Md-dose: A diffusion model based on the mamba for radiation dose prediction," in 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, 2024, pp. 911–918.
[18] J. Xie, R. Liao, Z. Zhang, S. Yi, Y. Zhu, and G. Luo, "Promamba: Prompt-mamba for polyp segmentation," arXiv preprint arXiv:2403.13660, 2024.
[19] K. S. Sanjid, M. T. Hossain, M. S. S. Junayed, and M. M. Uddin, "Integrating mamba sequence model and hierarchical upsampling network for accurate semantic segmentation of multiple sclerosis legion," arXiv preprint arXiv:2403.17432, 2024.
[20] R. Wu, Y. Liu, P. Liang, and Q. Chang, "Ultralight vm-unet: Parallel vision mamba significantly reduces parameters for skin lesion segmentation," arXiv preprint arXiv:2403.20035, 2024.
[21] W. Liao, Y. Zhu, X. Wang, C. Pan, Y. Wang, and L. Ma, "Lightm-unet: Mamba assists in lightweight unet for medical image segmentation," arXiv preprint arXiv:2403.05246, 2024.
[22] J. Hao, Y. Zhu, L. He, M. Liu, J. K. H. Tsoi, and K. F. Hung, "T-mamba: A unified framework with long-range dependency in dual-domain for 2d & 3d tooth segmentation," arXiv preprint arXiv:2404.01065, 2024.
[23] Z. Wang, J.-Q. Zheng, C. Ma, and T. Guo, "Vmambamorph: a multi-modality deformable image registration framework based on visual state space model with cross-scan module," arXiv preprint arXiv:2404.05105, 2024.
[24] A. Nasiri-Sarvi, V. Q.-H. Trinh, H. Rivaz, and M. S. Hosseini, "Vim4path: Self-supervised vision mamba for histopathology images," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 6894–6903.
[25] K. S. Sanjid, M. T. Hossain, M. S. S. Junayed, and M. M. Uddin, "Optimizing universal lesion segmentation: State space model-guided hierarchical networks with feature importance adjustment," arXiv preprint arXiv:2404.17235, 2024.
[26] V.-T. Nguyen, V.-T. Pham, and T.-T. Tran, "Ac-mambaseg: An adaptive convolution and mamba-based architecture for enhanced skin lesion segmentation," in International Conference on Green Technology and Sustainable Development. Springer, 2024, pp. 13–26.
[27] J. Xu, "Hc-mamba: Vision mamba with hybrid convolutional techniques for medical image segmentation," arXiv preprint arXiv:2405.05007, 2024.
[28] Z. Ju and W. Zhou, "Vm-ddpm: Vision mamba diffusion for medical image synthesis," arXiv preprint arXiv:2405.05667, 2024.
[29] O. F. Atli, B. Kabas, F. Arslan, A. C. Demirtas, M. Yurt, O. Dalmaz, and T. Cukur, "I2i-mamba: Multi-modal medical image synthesis via selective state space modeling," arXiv preprint arXiv:2405.14022, 2024.
[30] J. Huang, L. Yang, F. Wang, Y. Wu, Y. Nan, W. Wu, C. Wang, K. Shi, A. I. Aviles-Rivero, C.-B. Schönlieb et al., "Enhancing global sensitivity and uncertainty quantification in medical image reconstruction with monte carlo arbitrary-masked mamba," Medical Image Analysis, vol. 99, p. 103334, 2025.
[31] Z. Yang, J. Zhang, G. Wang, M. K. Kalra, and P. Yan, "Cardiovascular disease detection from multi-view chest x-rays with bi-mamba," in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2024, pp. 134–144.
[32] T. Y. Tsai, L. Lin, S. Hu, M.-C. Chang, H. Zhu, and X. Wang, "Uu-mamba: uncertainty-aware u-mamba for cardiac image segmentation," in 2024 IEEE 7th International Conference on Multimedia Information Processing and Retrieval (MIPR). IEEE, 2024, pp. 267–273.
[33] C. Yuan, D. Zhao, and S. S. Agaian, "Mucm-net: a mamba powered ucm-net for skin lesion segmentation," arXiv preprint arXiv:2405.15925, 2024.
[34] H. Zhang, J. Wang, Y. Zhao, L. Wang, W. Zhang, Y.-C. Chen, and N. Xiong, "Pocket convolution mamba for brain tumor segmentation," The Journal of Supercomputing, vol. 81, no. 1, pp. 1–23, 2025.
[35] S. Zhao, H. Chen, X. Zhang, P. Xiao, L. Bai, and W. Ouyang, "Rs-mamba for large remote sensing image dense prediction," IEEE Transactions on Geoscience and Remote Sensing, 2024.
[36] H. Zhou, X. Wu, H. Chen, X. Chen, and X. He, "Rsdehamba: lightweight vision mamba for remote sensing satellite image dehazing," arXiv preprint arXiv:2405.10030, 2024.
[37] M. Liu, J. Dan, Z. Lu, Y. Yu, Y. Li, and X. Li, "Cm-unet: Hybrid cnn-mamba unet for remote sensing image semantic segmentation," arXiv preprint arXiv:2405.10530, 2024.
[38] W. Zhou, S.-i. Kamata, H. Wang, M. S. Wong, and H. C. Hou, "Mamba-in-mamba: Centralized mamba-cross-scan in tokenized mamba model for hyperspectral image classification," Neurocomputing, vol. 613, p. 128751, 2025.
[39] Q. Zhu, Y. Cai, Y. Fang, Y. Yang, C. Chen, L. Fan, and A. Nguyen, "Samba: Semantic segmentation of remotely sensed images with state space model," Heliyon, vol. 10, no. 19, 2024.
[40] Y. Xiao, Q. Yuan, K. Jiang, Y. Chen, Q. Zhang, and C.-W. Lin, "Frequency-assisted mamba for remote sensing image super-resolution," IEEE Transactions on Multimedia, 2024.
[41] Q. Zhu, Y. Fang, Y. Cai, C. Chen, and L. Fan, "Rethinking scanning strategies with vision mamba in semantic segmentation of remote sensing imagery: an experimental study," IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2024.
[42] D. Misra, J. Gala, and A. Orvieto, "On the low-shot transferability of [v]-mamba," arXiv preprint arXiv:2403.10696, 2024.
[43] A. S. Sharma, D. Atkinson, and D. Bau, "Locating and editing factual associations in mamba," arXiv preprint arXiv:2404.03646, 2024.
[44] R. Xu, S. Yang, Y. Wang, Y. Cai, B. Du, and H. Chen, "Visual mamba: A survey and new outlooks," arXiv preprint arXiv:2404.18861, 2024.
[45] X. Liu, C. Zhang, and L. Zhang, "Vision mamba: A comprehensive survey and taxonomy," arXiv preprint arXiv:2405.04404, 2024.
[46] W. Yu and X. Wang, "Mambaout: Do we really need mamba for vision?" arXiv preprint arXiv:2405.07992, 2024.
[47] D. Han, Z. Wang, Z. Xia, Y. Han, Y. Pu, C. Ge, J. Song, S. Song, B. Zheng, and G. Huang, "Demystify mamba in vision: A linear attention perspective," in The Thirty-eighth Annual Conference on Neural Information Processing Systems.
[48] S. Peng, X. Zhu, H. Deng, L.-J. Deng, and Z. Lei, "Fusionmamba: Efficient remote sensing image fusion with state space model," IEEE Transactions on Geoscience and Remote Sensing, 2024.
[49] Z. Li, H. Pan, K. Zhang, Y. Wang, and F. Yu, "Mambadfuse: A mamba-based dual-phase model for multi-modality image fusion," arXiv preprint arXiv:2404.08406, 2024.
[50] W. Dong, H. Zhu, S. Lin, X. Luo, Y. Shen, X. Liu, J. Zhang, G. Guo, and B. Zhang, "Fusion-mamba for cross-modality object detection," arXiv preprint arXiv:2404.09146, 2024.
[51] X. Xie, Y. Cui, T. Tan, X. Zheng, and Z. Yu, "Fusionmamba: Dynamic feature enhancement for multimodal image fusion with mamba," Visual Intelligence, vol. 2, no. 1, p. 37, 2024.
[52] G. Chen, Y. Huang, J. Xu, B. Pei, Z. Chen, Z. Li, J. Wang, K. Li, T. Lu, and L. Wang, "Video mamba suite: State space model as a versatile alternative for video understanding," arXiv preprint arXiv:2403.09626, 2024.
[53] S. Chaudhuri and S. Bhattacharya, "Simba: Mamba augmented u-shiftgcn for skeletal action recognition in videos," arXiv preprint arXiv:2404.07645, 2024.
[54] J. Huang, S. Wang, S. Wang, Z. Wu, X. Wang, and B. Jiang, "Mamba-fetrack: Frame-event tracking via state space model," in Chinese Conference on Pattern Recognition and Computer Vision (PRCV). Springer, 2024, pp. 3–18.
[55] J. Ma, F. Li, and B. Wang, "U-mamba: Enhancing long-range dependency for biomedical image segmentation," arXiv preprint arXiv:2401.04722, 2024.
[56] Z. Xing, T. Ye, Y. Yang, G. Liu, and L. Zhu, "Segmamba: Long-range sequential modeling mamba for 3d medical image segmentation," in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2024, pp. 578–588.
[57] A. Archit and C. Pape, "Vim-unet: Vision mamba for biomedical segmentation," arXiv preprint arXiv:2404.07705, 2024.
[58] L. Huang, Y. Chen, and X. He, "Spectral-spatial mamba for hyperspectral image classification," Remote Sensing, vol. 16, no. 13, p. 2449, 2024.
[59] J. Yao, D. Hong, C. Li, and J. Chanussot, "Spectralmamba: Efficient mamba for hyperspectral image classification," arXiv preprint arXiv:2404.08489, 2024.
[60] Y. He, B. Tu, B. Liu, J. Li, and A. Plaza, "3dss-mamba: 3d-spectral-spatial mamba for hyperspectral image classification," IEEE Transactions on Geoscience and Remote Sensing, 2024.
[61] T. Wang, X. Xie, W. Wang, C. Wang, Y. Zhao, and Y. Cui, "Netmamba: Efficient network traffic classification via pre-training unidirectional mamba," in 2024 IEEE 32nd International Conference on Network Protocols (ICNP). IEEE, 2024, pp. 1–11.
[62] Z. Shao, X. Yao, Z. Wang, and J. Gao, "St-mambasync: The complement of mamba and transformers for spatial-temporal in traffic flow prediction," arXiv preprint arXiv:2404.15899, 2024.
[63] Y. Yang, Z. Xing, L. Yu, C. Huang, H. Fu, and L. Zhu, "Vivim: A video vision mamba for medical video segmentation," arXiv preprint arXiv:2401.14168, 2024.
[64] W. Li, X. Hong, R. Xiong, and X. Fan, "Spikemba: Multi-modal spiking saliency mamba for temporal video grounding," arXiv preprint arXiv:2404.01174, 2024.
[65] Z. Zhang, A. Liu, I. Reid, R. Hartley, B. Zhuang, and H. Tang, "Motion mamba: Efficient and long sequence motion generation," in European Conference on Computer Vision. Springer, 2024, pp. 265–282.
[66] Y. Tang, P. Dong, Z. Tang, X. Chu, and J. Liang, "Vmrnn: Integrating vision mamba and lstm for efficient and accurate spatiotemporal forecasting," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 5663–5673.
[67] Z. Peng, "Ptm-mamba: A ptm-aware protein language model with bidirectional gated mamba blocks," in Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, 2024, pp. 5475–5478.
[68] V. T. Hu, S. A. Baumann, M. Gui, O. Grebenkova, P. Ma, J. Fischer, and B. Ommer, "Zigma: A dit-style zigzag mamba diffusion model," in European Conference on Computer Vision. Springer, 2024, pp. 148–166.
[69] F. Wang, J. Wang, S. Ren, G. Wei, J. Mei, W. Shao, Y. Zhou, A. Yuille, and C. Xie, "Mamba-r: Vision mamba also needs registers," arXiv preprint arXiv:2405.14858, 2024.
[70] J. Zhang, S. Liu, K. Bian, Y. Zhou, P. Zhang, W. An, J. Zhou, and K. Shao, "Vim-f: Visual state space model benefiting from learning in the frequency domain," arXiv preprint arXiv:2405.18679, 2024.
[71] Z. Zhen, Y. Hu, and Z. Feng, "Freqmamba: Viewing mamba from a frequency perspective for image deraining," arXiv preprint arXiv:2404.09476, 2024.
[72] J. Bai, Y. Yin, Q. He, Y. Li, and X. Zhang, "Retinexmamba: Retinex-based mamba for low-light image enhancement," arXiv preprint arXiv:2405.03349, 2024.
[73] X. He, K. Cao, J. Zhang, K. Yan, Y. Wang, R. Li, C. Xie, D. Hong, and M. Zhou, "Pan-mamba: Effective pan-sharpening with state space model," Information Fusion, vol. 115, p. 102779, 2025.
[74] S. Mo and Y. Tian, "Scaling diffusion mamba with bidirectional ssms for efficient image and video generation," arXiv preprint arXiv:2405.15881, 2024.
[75] R. Deng and T. Gu, "Cu-mamba: Selective state space models with channel learning for image restoration," in 2024 IEEE 7th International Conference on Multimedia Information Processing and Retrieval (MIPR). IEEE, 2024, pp. 328–334.
[76] X. Lei, W. Zhang, and W. Cao, "Dvmsr: Distillated vision mamba for efficient super-resolution," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 6536–6546.
[77] Y. Huang, T. Miyazaki, X. Liu, and S. Omachi, "Irsrmamba: Infrared image super-resolution via mamba-based wavelet transform feature modulation model," arXiv preprint arXiv:2405.09873, 2024.
[78] Y. Yang, C. Ma, J. Yao, Z. Zhong, Y. Zhang, and Y. Wang, "Remamber: Referring image segmentation with mamba twister," in European Conference on Computer Vision. Springer, 2024, pp. 108–126.
[79] Q. Xiang, J. Cheng, J. Luo, J. Wu, R. Fan, X. Chen, and X. Tang, "Overlapmamba: Novel shift state space model for lidar-based place recognition," arXiv preprint arXiv:2405.07966, 2024.
[80] Z. Wan, P. Zhang, Y. Wang, S. Yong, S. Stepputtis, K. Sycara, and Y. Xie, "Sigma: Siamese mamba network for multi-modal semantic segmentation," arXiv preprint arXiv:2404.04256, 2024.
[81] T. Chen, Z. Ye, Z. Tan, T. Gong, Y. Wu, Q. Chu, B. Liu, N. Yu, and J. Ye, "Mim-istd: Mamba-in-mamba for efficient infrared small target detection," IEEE Transactions on Geoscience and Remote Sensing, 2024.
[82] Z. Wang, F. Kong, S. Feng, M. Wang, X. Yang, H. Zhao, D. Wang, and Y. Zhang, "Is mamba effective for time series forecasting?" Neurocomputing, vol. 619, p. 129178, 2025.
[83] C. Yang, Z. Chen, M. Espinosa, L. Ericsson, Z. Wang, J. Liu, and E. J. Crowley, "Plainmamba: Improving non-hierarchical mamba in visual recognition," arXiv preprint arXiv:2403.17695, 2024.
[84] Y. Gao, J. Huang, X. Sun, Z. Jie, Y. Zhong, and L. Ma, "Matten: Video generation with mamba-attention," arXiv preprint arXiv:2405.03025, 2024.
[85] X. Wang, Z. Huang, S. Zhang, J. Zhu, P. Gamba, and L. Feng, "Gmsr: gradient-guided mamba for spectral reconstruction from rgb images," arXiv preprint arXiv:2405.07777, 2024.
[86] Y. Teng, Y. Wu, H. Shi, X. Ning, G. Dai, Y. Wang, Z. Li, and X. Liu, "Dim: Diffusion mamba for efficient high-resolution image synthesis," arXiv preprint arXiv:2405.14224, 2024.
[87] Y. Qiao, Z. Yu, L. Guo, S. Chen, Z. Zhao, M. Sun, Q. Wu, and J. Liu, "Vl-mamba: Exploring state space models for multimodal learning," arXiv preprint arXiv:2403.13600, 2024.
[88] H. Zhao, M. Zhang, W. Zhao, P. Ding, S. Huang, and D. Wang, "Cobra: Extending mamba to multi-modal large language model for efficient inference," arXiv preprint arXiv:2403.14520, 2024.
[89] B.-K. Lee, C. W. Kim, B. Park, and Y. M. Ro, "Meteor: Mamba-based traversal of rationale for large language and vision models," arXiv preprint arXiv:2405.15574, 2024.
[90] W. Li, H. Zhou, J. Yu, Z. Song, and W. Yang, "Coupled mamba: Enhanced multi-modal fusion with coupled state space model," arXiv preprint arXiv:2405.18014, 2024.
[91] R. Chao, W.-H. Cheng, M. La Quatra, S. M. Siniscalchi, C.-H. H. Yang, S.-W. Fu, and Y. Tsao, "An investigation of incorporating mamba for speech enhancement," in 2024 IEEE Spoken Language Technology Workshop (SLT). IEEE, 2024, pp. 302–308.
[92] X. Zhang, Q. Zhang, H. Liu, T. Xiao, X. Qian, B. Ahmed, E. Ambikairajah, H. Li, and J. Epps, "Mamba in speech: Towards an alternative to self-attention," arXiv preprint arXiv:2405.12609, 2024.
[93] X. Xu, C. Chen, Y. Liang, B. Huang, G. Bai, L. Zhao, and K. Shu, "Sst: Multi-scale hybrid mamba-transformer experts for long-short range time series forecasting," arXiv preprint arXiv:2404.14757, 2024.
[94] Z. Xu, "Rankmamba: Benchmarking mamba's document ranking performance in the era of transformers," arXiv preprint arXiv:2403.18276, 2024.
[95] A. Liang, X. Jiang, Y. Sun, X. Shi, and K. Li, "Bi-mamba+: Bidirectional mamba for time series forecasting," arXiv preprint arXiv:2404.15772, 2024.
[96] A. Ali, I. Zimerman, and L. Wolf, "The hidden attention of mamba models," arXiv preprint arXiv:2403.01590, 2024.
[97] H. Zhang, Y. Zhu, D. Wang, L. Zhang, T. Chen, Z. Wang, and Z. Ye, "A survey on visual mamba," Applied Sciences, vol. 14, no. 13, p. 5683, 2024.
[98] N. Zubić, F. Soldá, A. Sulser, and D. Scaramuzza, "Limits of deep learning: Sequence modeling through the lens of complexity theory," arXiv preprint arXiv:2405.16674, 2024.
[99] X. Jiang, C. Han, and N. Mesgarani, "Dual-path mamba: Short and long-term bidirectional selective structured state space models for speech separation," in ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2025, pp. 1–5.
[100] J. Lin and H. Hu, "Audio mamba: Pretrained audio state space model for audio tagging," arXiv preprint arXiv:2405.13636, 2024.
[101] S. Shams, S. S. Dindar, X. Jiang, and N. Mesgarani, "Ssamba: Self-supervised audio representation learning with mamba state space model," in 2024 IEEE Spoken Language Technology Workshop (SLT). IEEE, 2024, pp. 1053–1059.
[102] T. Zhang, H. Yuan, L. Qi, J. Zhang, Q. Zhou, S. Ji, S. Yan, and X. Li, "Point cloud mamba: Point cloud learning via state space model," arXiv preprint arXiv:2403.00762, 2024.
[103] J. Liu, R. Yu, Y. Wang, Y. Zheng, T. Deng, W. Ye, and H. Wang, "Point mamba: A novel point cloud backbone based on state space model with octree-based ordering strategy," arXiv preprint arXiv:2403.06467, 2024.
[104] Z. Wang, Z. Chen, Y. Wu, Z. Zhao, L. Zhou, and D. Xu, "Pointramba: A hybrid transformer-mamba framework for point cloud analysis," arXiv preprint arXiv:2405.15463, 2024.
[105] T. Ota, "Decision mamba: Reinforcement learning via sequence modeling with selective state spaces," arXiv preprint arXiv:2403.19925, 2024.
[106] A. Correia and L. A. Alexandre, "Decision mamba architectures," arXiv preprint arXiv:2405.07943, 2024.
[107] C.-H. Lee, H. Kim, H.-j. Han, M.-K. Jung, B. C. Yoon, and D.-J. Kim, "Neuronet: A novel hybrid self-supervised learning framework for sleep stage classification using single-channel eeg," arXiv preprint arXiv:2404.17585, 2024.
[108] C. Zhang, W. Cui, and J. Guo, "Mssc-bimamba: Multimodal sleep stage classification and early diagnosis of sleep disorders with bidirectional mamba," arXiv preprint arXiv:2405.20142, 2024.
[109] Y. Sui, M. Zhao, J. Xia, X. Jiang, and S. Xia, "Tramba: A hybrid transformer and mamba architecture for practical audio and bone conduction speech super resolution and enhancement on mobile and wearable platforms," Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, vol. 8, no. 4, pp. 1–29, 2024.
[110] S. Li, T. Zhu, F. Duan, L. Chen, H. Ning, C. Nugent, and Y. Wan, "Harmamba: Efficient and lightweight wearable sensor human activity recognition based on bidirectional mamba," IEEE Internet of Things Journal, 2024.
[111] J. Park, J. Park, Z. Xiong, N. Lee, J. Cho, S. Oymak, K. Lee, and D. Papailiopoulos, "Can mamba learn how to learn? a comparative study on in-context learning tasks," in Proceedings of the 41st International Conference on Machine Learning, 2024, pp. 39 793–39 812.
[112] Z. Wu, Y. Gong, and A. Zhang, "Dtmamba: Dual twin mamba for time series forecasting," arXiv preprint arXiv:2405.07022, 2024.
[113] C. Wang, O. Tsepa, J. Ma, and B. Wang, "Graph-mamba: Towards long-range graph sequence modeling with selective state spaces," arXiv preprint arXiv:2402.00789, 2024.
[114] Y. Dai, O. Ma, L. Zhang, X. Liang, S. Hu, M. Wang, S. Ji, J. Huang, and L. Shen, "Is mamba compatible with trajectory optimization in offline reinforcement learning?" Advances in Neural Information Processing Systems, vol. 37, pp. 51 474–51 502, 2024.
[115] Y. Cao and W. Zhang, "Mamba4kt: An efficient and effective mamba-based knowledge tracing model," arXiv preprint arXiv:2405.16542, 2024.
[116] L. Li, H. Wang, W. Zhang, and A. Coster, "Stg-mamba: Spatial-temporal graph learning via selective state space model," arXiv preprint arXiv:2403.12418, 2024.
[117] A. Gu, K. Goel, and C. Ré, "Efficiently modeling long sequences with structured state spaces," arXiv preprint arXiv:2111.00396, 2021.
[118] A. Gu, I. Johnson, K. Goel, K. Saab, T. Dao, A. Rudra, and C. Ré, "Combining recurrent, convolutional, and continuous-time models with linear state space layers," Advances in neural information processing systems, vol. 34, pp. 572–585, 2021.
[119] Z. C. Lipton, J. Berkowitz, and C. Elkan, "A critical review of recurrent neural networks for sequence learning," arXiv preprint arXiv:1506.00019, 2015.
Automatically extracted. Refer to the original PDF for figures, tables, and formatting.