News
- 02/2024: Prompting Hard or Hardly Prompting: Prompt Inversion for Text-to-Image Diffusion Models accepted at CVPR 2024.
- 02/2024: ViVid-1-to-3: Novel View Synthesis with Video Diffusion Models accepted at CVPR 2024.
- 02/2024: Unsupervised Keypoints from Pretrained Diffusion Models accepted at CVPR 2024.
- 11/2023: Recognized as a Top Reviewer at NeurIPS 2023!
- 09/2023: Talk on Multimodal Representation Learning with Deep Generative Models at NTU Singapore.
- 09/2023: Unsupervised Semantic Correspondence Using Stable Diffusion accepted at NeurIPS 2023.
- 08/2023: Nominated for the GI-Dissertationspreis!
- 07/2023: Awarded research grant by the Vector Institute of Artificial Intelligence!
- 06/2023: Invited to the doctoral consortium at CVPR 2023!
- 06/2023: Nominated for the Bertha Benz Best Thesis Award in Germany!
- 02/2023: Make-A-Story: Visual Memory Conditioned Consistent Story Generation accepted at CVPR 2023.
- 10/2022: I have joined UBC as a postdoctoral researcher.
- 06/2022: I have successfully defended my Ph.D. dissertation (summa cum laude)!
|
Research
I am interested in computer vision and machine learning, specifically in deep generative models (diffusion models, normalizing flows, variational methods, GANs) for multimodal representation learning.
|
|
Prompting Hard or Hardly Prompting:
Prompt Inversion for Text-to-Image Diffusion Models
Shweta Mahajan, Tanzila Rahman, Kwang Moo Yi, Leonid Sigal
CVPR, 2024
paper /
arxiv
Inverting the diffusion model to obtain interpretable language prompts directly based on the findings that different timesteps of the diffusion process cater to different levels of detail in an image.
|
|
ViVid-1-to-3: Novel View Synthesis with Video Diffusion Models
Jeong-gi Kwak, Erqun Dong, Yuhe Jin, Hanseok Ko, Shweta Mahajan, Kwang Moo Yi
CVPR, 2024
paper /
arxiv /
code /
We utilize a pre-trained video diffusion model to solve consistency in zero-shot view synthesis.
|
|
Unsupervised Keypoints from Pretrained Diffusion Models
Eric Hedlin, Gopal Sharma, Shweta Mahajan, Xingzhe He, Hossam Isack, Abhishek Kar Helge Rhodin, Andrea Tagliasacchi, Kwang Moo Yi
CVPR, 2024
paper /
arxiv /
code /
One can leverage this semantic knowledge within diffusion models to find key points across images of similar kind.
|
|
Unsupervised Semantic Correspondence Using Stable Diffusion
Eric Hedlin, Gopal Sharma, Shweta Mahajan, Hossam Isack, Abhishek Kar, Andrea Tagliasacchi, Kwang Moo Yi
NeurIPS, 2023
paper /
arxiv /
code /
One can leverage this semantic knowledge within diffusion models to find semantic correspondences with prompt optimization.
|
|
Make-A-Story: Visual Memory Conditioned Consistent Story Generation
Tanzila Rahman, Hsin-Ying Lee, Jian Ren, Sergey Tulyakov, Shweta Mahajan and Leonid Sigal
CVPR, 2023
paper /
arxiv /
Sentence-conditioned soft attention over the memories enables effective reference resolution and learns to maintain scene and actor consistency
when needed.
|
|
Multimodal Representation Learning for Diverse Synthesis with Deep Generative Models
Shweta Mahajan
Ph.D. Thesis, 2022
Thesis
|
|
Diverse Image Captioning with Grounded Style
Franz Klein, Shweta Mahajan and Stefan Roth
GCPR, 2021
paper /
arxiv /
code
A sequential variational framework encoding the style information grounded in images for stylized image captioning.
|
|
PixelPyramids: Exact Inference Models from Lossless Image Pyramids
Shweta Mahajan and Stefan Roth
ICCV, 2021
paper /
supp /
arxiv /
video /
code
A block-autoregressive exact inference model employing a lossless pyramid decomposition with scale-specific representations to encode the joint distribution of image pixels.
|
|
Diverse Image Captioning with Context-Object Split Latent Spaces
Shweta Mahajan and Stefan Roth
NeurIPS, 2020
paper /
supp /
arxiv /
video /
code
We introduce a novel factorization of the latent space to model diversity in contextual descriptions across images and texts within the dataset.
|
|
Normalizing Flows with Multi-Scale Autoregressive Priors
Shweta Mahajan*, Apratim Bhattacharyya*, Mario Fritz, Bernt Schiele and Stefan Roth
CVPR, 2020
paper /
supp /
arxiv /
video /
code
We improve the representational power of flow-based models by introducing channel-wise dependencies in their latent space through multi-scale autoregressive priors.
|
|
Latent Normalizing Flows for Many-to-Many Cross-Domain Mappings
Shweta Mahajan, Iryna Gurevych and Stefan Roth
ICLR, 2020 (Best Paper Award, Fraunhofer IGD)
paper /
arxiv /
video /
code
Our model integrates normalizing flow-based priors for the domain-specific information, which allows us to learn diverse many-to-many mappings between the image and text domains.
|
|
Joint Wasserstein Autoencoders for Aligning Multimodal Embeddings
Shweta Mahajan, Teresa Botschen, Iryna Gurevych and Stefan Roth
ICCV Workshops, 2019 (Oral presentation)
paper /
arxiv
We propose joint Gaussian regularization of the latent representations to ensure coherent cross-modal semantics that generalize across datasets.
|
|