CATVis: Context-Aware Thought Visualization – Computer Vision & Graphics Lab

Tariq Mehmood*, Hamza Ahmad*, Muhammad Haroon Shakeel, Murtaza Taj (* contributed equally)

Abstract:

EEG-based brain-computer interfaces (BCIs) have shown promise in various applications, such as motor imagery and cognitive state monitoring. However, decoding visual representations from EEG signals remains a significant challenge due to their complex and noisy nature. We thus propose a novel 5-stage framework for decoding visual representations from EEG signals: (1) an EEG encoder for concept classification, (2) cross-modal alignment of EEG and text embeddings in CLIP feature space, (3) caption refinement via re-ranking, (4) weighted interpolation of concept and caption embeddings for richer semantics, and (5) image generation using a pre-trained Stable Diffusion model. We enable context-aware EEG-to-image generation through cross-modal alignment and re-ranking. Experimental results demonstrate that our method generates high-quality images aligned with visual stimuli, outperforming SOTA approaches by 27.08% in Classification Accuracy, 15.21% in Generation Accuracy and reducing Fréchet Inception Distance by 36.61%, indicating superior semantic alignment and image quality.

Methodology:

Our framework, CATVis, reconstructs visual representations from EEG via a five-stage pipeline that balances concept-level predictions with richer contextual descriptions.

EEG Encoder. A Conformer-based encoder captures spatio-temporal EEG patterns for concept classification.
Cross-modal Alignment. EEG embeddings are projected into CLIP’s joint feature space and aligned with caption embeddings using contrastive learning.
Caption Refinement. Top candidate captions are retrieved and re-ranked using class-guided similarity to better match the subject’s perception.
Semantic Interpolation. Predicted class embeddings and the re-ranked caption embedding are interpolated to form a semantically rich conditioning vector.
Image Generation. A pre-trained Stable Diffusion model generates images conditioned on the fused embeddings, producing photorealistic reconstructions that reflect both object class and contextual details.

This design enables context-aware EEG-to-image generation and improves semantic alignment between neural activity and generated visuals.

Results:

Our framework achieves significant improvements in EEG-based visual reconstruction. Compared to state-of-the-art methods, CATVis boosts classification accuracy by 27.08%, improves generation accuracy by 15.21%, and reduces Fréchet Inception Distance by 36.61%, resulting in higher semantic alignment and image quality. The qualitative samples visually affirm our quantitative results. The generated images not only resemble the visual stimuli in terms of object identity but also capture surrounding context and attributes, such as shape, color, and spatial composition. This level of detail, rarely observed in prior EEG-to-image works, validates the core premise of our framework: that context-aware thought visualization is achievable through careful alignment of EEG semantics and generative priors.

Resources:

Text Reference:

T. Mehmood, H. Ahmad, M. H. Shakeel, and M. Taj, "CATVis: Context-Aware Thought Visualization," in Proc. of the Int. Conf. on Medical Image Computing and Computer Assisted Intervention (MICCAI), 2025

Bibtex Reference:

@inproceedings{CATVisMICCAI2025,
  author={T. Mehmood, H. Ahmad, M. H. Shakeel, and M. Taj},
  title={CATVis: Context-Aware Thought Visualization},
  booktitle={Proc. of the Int. Conf. on Medical Image Computing and Computer Assisted Intervention (MICCAI)},
  year={2025},
}

Early Warning System for Forest Fire Detection

Bycvlabadmin November 20, 2023July 22, 2024

The overall goal of this project is to develop Forest Fire Monitoring System which is cost-effective, locally indigenized, scalable and feasible for application in Pakistani context. Based on this premise, instead of the high-end and costly thermal camera, NCRA-Agricultural Robotics Lab (NARL) at LUMS is proposing to use standard RGB cameras with night vision capabilities….

News

PhD Proposal Defense: Usman Nazir

By September 12, 2019September 24, 2025

Learning Socio-economic Indicators from Remote Sensing Data Thursday 12 Sep, 2019 at 03:30 am in CS Smart Room 9-105 SBASSE. Abstract Progress on the UN Sustainable Development Goals (SDGs) is hampered by a persistent lack of data regarding key social, environmental, and economic indicators, particularly in developing countries. For example, data on poverty and slavery,…

Journal Papers | Publications

JMP2024 – An exploratory deep learning approach to investigate tuberculosis pathogenesis in nonhuman primate model

Bycvlabadmin July 1, 2024July 22, 2024

Faisal Yaseen, Murtaza Taj, Resmi Ravindran, Fareed Zaffar, Paul A. Luciw, Aamer Ikram, Saerah Iffat Zafar, Tariq Gill, Michael Hogarth and Imran H. Khan Abstract: Background Tuberculosis (TB) kills approximately 1.6 million people yearly despite the fact anti-TB drugs are generally curative. Therefore, TB-case detection and monitoring of therapy, need a comprehensive approach. Automated radiological…

News

Paper accepted at IEEE Transactions on Geoscience and Remote Sensing

Bycvlabadmin April 2, 2024September 24, 2025

Our paper titled “Stereollax Net: Stereo Parallax Based Deep Learning Network For Building Height Estimation” accepted at IEEE Transactions on Geoscience and Remote Sensing. This work was an outcome of PhD Thesis work by Sana Jabbar More info: Click here

Teaching

Digital Image Processing

By January 9, 2019November 22, 2019

This is a graduate-level introductory course on the fundamentals of digital image processing. The course will emphasize the general principles of image processing. It will extend the signals and systems knowledge of the students to two-dimensional signals. This is a very important course for any student who wants to do a senior project related to…

Conference Papers | Publications

Eurographics 2015 – Efficient RANSAC for n-gonal Primitive Fitting

By May 4, 2015January 10, 2019

Ahsan Abdullah, Reema Bajwa, Syed Rizwan Gilani, Zuha Agha, Saeed Boor Boor, Murtaza Taj, Sohaib Ahmed Khan The 36th Annual Conference of the European Association for Computer Graphics, 2015 Kongresshaus in Zürich, Switzerland, 4th – 8th May, 2015 Abstract We present a modeling approach to automatically fit 3D primitives to point clouds in order to…

Similar Posts