News
- 11/2023: Recognized as a Top Reviewer at NeurIPS 2023!
- 09/2023: Talk on Multimodal Representation Learning with Deep Generative Models at NTU Singapore.
- 09/2023: Unsupervised Semantic Correspondence Using Stable Diffusion accepted at NeurIPS 2023.
- 08/2023: Nominated for the GI-Dissertationspreis!
- 07/2023: Awarded research grant by the Vector Institute of Artificial Intelligence!
- 06/2023: Invited to the doctoral consortium at CVPR 2023!
- 06/2023: Nominated for the Bertha Benz Best Thesis Award in Germany!
- 02/2023: Make-A-Story: Visual Memory Conditioned Consistent Story Generation accepted at CVPR 2023.
- 10/2022: I have joined UBC as a postdoctoral researcher.
- 06/2022: I have successfully defended my Ph.D. dissertation (summa cum laude)!
|
Research
I am interested in computer vision and machine learning, specifically in deep generative models for multimodal representation learning.
|
|
Unsupervised Semantic Correspondence Using Stable Diffusion
Eric Hedlin, Gopal Sharma, Shweta Mahajan, Hossam Isack, Abhishek Kar, Andrea Tagliasacchi, Kwang Moo Yi
NeurIPS, 2023
paper /
arxiv /
code /
One can leverage this semantic knowledge within diffusion models to find semantic correspondences with prompt optimization.
|
|
Make-A-Story: Visual Memory Conditioned Consistent Story Generation
Tanzila Rahman, Hsin-Ying Lee, Jian Ren, Sergey Tulyakov, Shweta Mahajan and Leonid Sigal
CVPR, 2023
paper /
arxiv /
Sentence-conditioned soft attention over the memories enables effective reference resolution and learns to maintain scene and actor consistency
when needed.
|
|
Multimodal Representation Learning for Diverse Synthesis with Deep Generative Models
Shweta Mahajan
Ph.D. Thesis, 2022
Thesis
|
|
Diverse Image Captioning with Grounded Style
Franz Klein, Shweta Mahajan and Stefan Roth
GCPR, 2021
paper /
arxiv /
code
A sequential variational framework encoding the style information grounded in images for stylized image captioning.
|
|
PixelPyramids: Exact Inference Models from Lossless Image Pyramids
Shweta Mahajan and Stefan Roth
ICCV, 2021
paper /
supp /
arxiv /
video /
code
A block-autoregressive exact inference model employing a lossless pyramid decomposition with scale-specific representations to encode the joint distribution of image pixels.
|
|
Diverse Image Captioning with Context-Object Split Latent Spaces
Shweta Mahajan and Stefan Roth
NeurIPS, 2020
paper /
supp /
arxiv /
video /
code
We introduce a novel factorization of the latent space to model diversity in contextual descriptions across images and texts within the dataset.
|
|
Normalizing Flows with Multi-Scale Autoregressive Priors
Shweta Mahajan*, Apratim Bhattacharyya*, Mario Fritz, Bernt Schiele and Stefan Roth
CVPR, 2020
paper /
supp /
arxiv /
video /
code
We improve the representational power of flow-based models by introducing channel-wise dependencies in their latent space through multi-scale autoregressive priors.
|
|
Latent Normalizing Flows for Many-to-Many Cross-Domain Mappings
Shweta Mahajan, Iryna Gurevych and Stefan Roth
ICLR, 2020 (Best Paper Award, Fraunhofer IGD)
paper /
arxiv /
video /
code
Our model integrates normalizing flow-based priors for the domain-specific information, which allows us to learn diverse many-to-many mappings between the image and text domains.
|
|
Joint Wasserstein Autoencoders for Aligning Multimodal Embeddings
Shweta Mahajan, Teresa Botschen, Iryna Gurevych and Stefan Roth
ICCV Workshops, 2019 (Oral presentation)
paper /
arxiv
We propose joint Gaussian regularization of the latent representations to ensure coherent cross-modal semantics that generalize across datasets.
|
|