Shweta Mahajan

Hello! I am currently a postdoctoral researcher in the Vision Group at University of British Columbia advised by Prof. Leonid Sigal and Prof. Kwang Moo Yi. My research at UBC is focussed on diffusion models for high-level as well as low-level computer vision. I obtained my Ph.D. under the supervision of Prof. Stefan Roth, Ph.D. in the Visual Inference Group, Technische Universit├Ąt Darmstadt. During my Ph.D., I conducted research on deep generative algorithms for multimodal representation learning and the efficiency of exact inference deep generative models. I received my M.Sc. from the Saarland University where I was a part of the Machine Learning Group and the Max Planck Institute of Informatics.

Email  /  CV  /  Google Scholar  /  LinkedIn

profile photo


  • 11/2023: Recognized as a Top Reviewer at NeurIPS 2023!
  • 09/2023: Talk on Multimodal Representation Learning with Deep Generative Models at NTU Singapore.
  • 09/2023: Unsupervised Semantic Correspondence Using Stable Diffusion accepted at NeurIPS 2023.
  • 08/2023: Nominated for the GI-Dissertationspreis!
  • 07/2023: Awarded research grant by the Vector Institute of Artificial Intelligence!
  • 06/2023: Invited to the doctoral consortium at CVPR 2023!
  • 06/2023: Nominated for the Bertha Benz Best Thesis Award in Germany!
  • 02/2023: Make-A-Story: Visual Memory Conditioned Consistent Story Generation accepted at CVPR 2023.
  • 10/2022: I have joined UBC as a postdoctoral researcher.
  • 06/2022: I have successfully defended my Ph.D. dissertation (summa cum laude)!


I am interested in computer vision and machine learning, specifically in deep generative models for multimodal representation learning.

Unsupervised Semantic Correspondence Using Stable Diffusion

Eric Hedlin, Gopal Sharma, Shweta Mahajan, Hossam Isack, Abhishek Kar, Andrea Tagliasacchi, Kwang Moo Yi
NeurIPS, 2023
paper / arxiv / code /

One can leverage this semantic knowledge within diffusion models to find semantic correspondences with prompt optimization.

Make-A-Story: Visual Memory Conditioned Consistent Story Generation

Tanzila Rahman, Hsin-Ying Lee, Jian Ren, Sergey Tulyakov, Shweta Mahajan and Leonid Sigal
CVPR, 2023
paper / arxiv /

Sentence-conditioned soft attention over the memories enables effective reference resolution and learns to maintain scene and actor consistency when needed.

Multimodal Representation Learning for Diverse Synthesis with Deep Generative Models

Shweta Mahajan
Ph.D. Thesis, 2022

Diverse Image Captioning with Grounded Style

Franz Klein, Shweta Mahajan and Stefan Roth
GCPR, 2021
paper / arxiv / code

A sequential variational framework encoding the style information grounded in images for stylized image captioning.

PixelPyramids: Exact Inference Models from Lossless Image Pyramids

Shweta Mahajan and Stefan Roth
ICCV, 2021
paper / supp / arxiv / video / code

A block-autoregressive exact inference model employing a lossless pyramid decomposition with scale-specific representations to encode the joint distribution of image pixels.

Diverse Image Captioning with Context-Object Split Latent Spaces

Shweta Mahajan and Stefan Roth
NeurIPS, 2020
paper / supp / arxiv / video / code

We introduce a novel factorization of the latent space to model diversity in contextual descriptions across images and texts within the dataset.

Normalizing Flows with Multi-Scale Autoregressive Priors

Shweta Mahajan*, Apratim Bhattacharyya*, Mario Fritz, Bernt Schiele and Stefan Roth
CVPR, 2020
paper / supp / arxiv / video / code

We improve the representational power of flow-based models by introducing channel-wise dependencies in their latent space through multi-scale autoregressive priors.

Latent Normalizing Flows for Many-to-Many Cross-Domain Mappings

Shweta Mahajan, Iryna Gurevych and Stefan Roth
ICLR, 2020 (Best Paper Award, Fraunhofer IGD)
paper / arxiv / video / code

Our model integrates normalizing flow-based priors for the domain-specific information, which allows us to learn diverse many-to-many mappings between the image and text domains.

Joint Wasserstein Autoencoders for Aligning Multimodal Embeddings

Shweta Mahajan, Teresa Botschen, Iryna Gurevych and Stefan Roth
ICCV Workshops, 2019 (Oral presentation)
paper / arxiv

We propose joint Gaussian regularization of the latent representations to ensure coherent cross-modal semantics that generalize across datasets.