Emerging Trends in fMRI Data Image Reconstruction: A Review of Methodological Approaches
Recent advancements in image reconstruction techniques, particularly from functional Magnetic Resonance Imaging (fMRI) data, have led to an increase in scholarly publications in this domain. This article presents a concise overview of the predominant methodologies employed in image reconstruction and their relevance to our investigative approach. Although a comprehensive review is beyond the scope of this discussion, we categorize the prevailing methods into four main areas: direct decoding models, encoder-decoder models, invertible encoding models, and encoder input optimization.
Direct Decoding Models
Direct decoders utilize deep neural networks to map neuronal activity directly to input images or videos (Shen et al., 2019a; Zhang et al., 2020; Li et al., 2023). This training can occur through pretraining of decoders (Ren et al., 2021) or by imposing additional constraints on the loss function that guide the output toward learned image statistics (Shen et al., 2019a; Kupershmidt et al., 2022). These models have shown effectiveness in video reconstruction tasks, such as those applied in murine studies (Chen et al., 2024). However, a limitation arises when testing the generalization capability beyond the training dataset, distinguishing sensory reconstruction from stimulus identification.
Encoder-Decoder Models
Encoder-decoder frameworks represent a synthesis of independent training for brain encoders, tasked with converting brain activity into a latent representation, and decoders, which translate this latent space back into image or video form. This approach has gained traction due to its integration with state-of-the-art (SOTA) generative image models, such as stable diffusion (Rombach et al., 2021; Takagi and Nishimoto, 2023; Scotti et al., 2023; Chen et al., 2023; Benchetrit et al., 2023). Initially, the encoder is trained to interpret brain signals into a latent space, which is then utilized by pretrained generative networks. The semantic conditioning within these latent spaces allows for distinct processing of both low-level visual features and high-level semantic contexts (Scotti et al., 2023).
Invertible Encoding Models
Invertible encoding models involve mechanisms that, once trained to predict neuronal activity, can be inversely applied to derive sensory inputs based on brain data. This category also encompasses models that first establish the receptive fields or preferred stimuli of neurons, reconstructing inputs as weighted combinations of these fields according to their neuronal activity (Stanley et al., 1999; Thirion et al., 2006; Garasto et al., 2019; Brackbill et al., 2020; Yoshida and Ohki, 2020; Nishimoto et al., 2011). While innovative, this approach typically suffers from performance deficiencies related to the coding properties of neurons in comparison to more complex deep learning structures (Willeke et al., 2023).
Encoder Input Optimization
This method begins with the training of an encoder to predict neuronal activity based on sensory inputs. After training, the encoder remains fixed while the input is optimized through backpropagation to align the predicted activity with empirical observations (Pierzchlewicz et al., 2023). Unlike invertible models, this approach allows for the incorporation of any contemporary neuronal encoding model. However, it shares a common constraint with invertible designs, as networks are not specifically trained for image reconstruction, potentially limiting their effectiveness in extrapolating nuanced brain-encoded information. Research indicates that static image reconstructions optimized to align with predicted neural activity yield responses closer to actual neural reactions compared to methods focused solely on image similarity (Cobos et al., 2022).
While these methodologies have been delineated as distinct categories, there is substantial potential for their integration. For example, encoder input optimization can effectively interface with image diffusion techniques (Pierzchlewicz et al., 2023), and theoretically, invertible models may be similarly adapted.
Conclusion
Our research opts for a pure encoder input optimization strategy focused on single-cell activity within the mouse visual cortex for compelling reasons. Notably, advancements in neuronal encoding models tailored for dynamic visual stimuli (Sinz et al., 2018; Wang et al., 2025; Turishcheva et al., 2024) offer significant opportunities for enhanced performance. Additionally, incorporating a generative decoder trained for high-quality image production introduces risks of reconstructing images based on general statistics rather than the brain’s true representations. In instances where the brain fails to encode coherent images, the integrity of reconstruction should prioritize failure over the generation of misleading semantic representations.
Source: Original Source

