Dhruvesh Patel

dhruveshpate@umass.edu

🌟 I’m looking for internship opportunities for Spring and Summer 2025. Here is my CV. 🌟

I am currently a fourth-year Computer Science PhD student at UMass Amherst working with Prof. Andrew McCallum alongside some amazing colleagues at the Information Extraction and Synthesis Laboratory. I completed my undergraduate at IIT Madras, where I worked on Robotics research, mentored by Prof. Bandyopadhyay.

Outside of my academic pursuits, I’ve been fortunate to have worked with some amazing collaborators from the industry. I have worked as a research scientist inter at Meta Reality Labs and Abridge AI. Before beginning my master’s program at UMass, I worked for two years as a software engineer at MathWorks. I also dedicated a year to collaborating with Prof. Partha Talukdar on solving various NLP problems in the industry.

research

Autoregressive models dominate the scene for generative modeling of non-ordinal discrete data, like text, mostly due to the scalability of pre-training. However, as generative models they have many limitations: limited conditioning and control at inference time, inefficient use of inference time computation by tying the sequence length to the computation, and inability to support non-sequential forms of interaction like edits or deletions. I’m interested in scaling non-autoregessive models like discrete diffusion and flows for text generation either by adapting pre-trained AR models through continued training or by making non-AR pre-training more efficient.

Prior to this, I have worked on non-Euclidean representation learning, energy-based models for discrete data, and compositional generalization in-context learning.

affiliations and internships

Meta Reality Labs

09/2022 - 12/2022

Abridge AI

05/2020 - 12/2020

IESL

05/2019 - present

CICS UMass

01/2019 - present

Mathworks

06/2016 - 01/2018

IIT Madras

06/2011 - 01/2016

news

Oct 1, 2024	Learning Representations for Hierarchies with Minimal Support was accepted at NeurIPS 2024!
Apr 1, 2024	Language Guided Exploration for RL Agents in Text Environments was accepted at NAACL (findings) 2024.
Aug 1, 2023	My work on Pre-trained language models for Visual Planning for Human Assistance, done as a research intern at Meta Reality Labs., has been accepted at ICCV 2023.
Sep 7, 2022	Super excited to start my internship at Meta Reality Labs!
Apr 25, 2022	Excited to present our work on multi-label classification using box embeddings at ICLR 2022!

mentors and collaborators

I have been fortunate to work have worked with many amazing people over the years. Here is a list of my current and previous collaborators.
Michael Boratko [Google] (2019 - 2025)
Tahira Naseem [IBM] (2023 - 2023)
Akash Srivastava [MIT-IBM Research] (2023 - 2023)
Keerthiram Murugesan [IBM] (2022 - 2023)
Kenneth Clarkson [IBM] (2023 - 2023)
Kartik Talamadupula [IBM] (2019 - 2019)
Pavan Kapanipathi [IBM] (2019 - 2019)
Jay-Yoon Lee [Seoul National University] (2020 - 2022)
Partha Talukdar [IISc Bangalore/Google Research] (2018 - 2018)
Sandipan Bandyopadhyay [IIT Madras] (2016 - 2016)
Ruta Desai [Meta AI] (2022 - 2023)
Unnat Jain [Meta AI] (2023 - 2023)

selected publications

2024

Language Guided Exploration for RL Agents in Text Environments

Hitesh Golchha, Sahil Yerawar, Dhruvesh Patel, Soham Dan, and Keerthiram Murugesan

In In submission, 2024

Code

2023

ICCV

Pretrained Language Models as Visual Planners for Human Assistance

Dhruvesh Patel, Hamid Eghbalzadeh, Nitin Kamra, Michael Louis Iuzzolino, Unnat Jain, and 1 more author

Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023

arXiv HTML Code

2022

ICLR
Modeling Label Space Interactions in Multi-label Classification using Box Embeddings

Dhruvesh Patel, Pavitra Dangati, Jay-Yoon Lee, Michael Boratko, and Andrew McCallum

In International Conference on Learning Representations, 2022

Abs Bib

Multi-label classification is a challenging structured prediction task in which a set of output class labels are predicted for each input. Real-world datasets often have natural or latent taxonomic relationships between labels, making it desirable for models to employ label representations capable of capturing such taxonomies. Most existing multi-label classification methods do not do so, resulting in label predictions that are inconsistent with the taxonomic constraints, thus failing to accurately represent the fundamentals of problem setting. In this work, we introduce the multi-label box model (MBM), a multi-label classification method that combines the encoding power of neural networks with the inductive bias and probabilistic semantics of box embeddings (Vilnis, et al 2018). Box embeddings can be understood as trainable Venn-diagrams based on hyper-rectangles. Representing labels by boxes rather than vectors, MBM is able to capture taxonomic relations among labels. Furthermore, since box embeddings allow these relations to be learned by stochastic gradient descent from data, and to be read as calibrated conditional probabilities, our model is endowed with a high degree of interpretability. This interpretability also facilitates the injection of partial information about label-label relationships into model training, to further improve its consistency. We provide theoretical grounding for our method and show experimentally the model’s ability to learn the true latent taxonomic structure from data. Through extensive empirical evaluations on both small and large-scale multi-label classification datasets, we show that BBM can significantly improve taxonomic consistency while preserving or surpassing the state-of-the-art predictive performance.
@inproceedings{patel2022modeling, title = {Modeling Label Space Interactions in Multi-label Classification using Box Embeddings}, author = {Patel, Dhruvesh and Dangati, Pavitra and Lee, Jay-Yoon and Boratko, Michael and McCallum, Andrew}, booktitle = {International Conference on Learning Representations}, year = {2022}, url = {https://openreview.net/forum?id=tyTH9kOxcvh}, }
NeurIPS

Structured Energy Network As a Loss

Jay Yoon Lee, Dhruvesh Patel, Purujit Goyal, Wenlong Zhao, Zhiyang Xu, and 1 more author

In Advances in Neural Information Processing Systems, 2022

Abs

Belanger & McCallum (2016) and Gygli et al. (2017) have shown that an energy network can capture arbitrary dependencies amongst the output variables in structured prediction; however, their reliance on gradient-based inference (GBI) makes the inference slow and unstable. In this work, we propose Structured Energy As Loss (SEAL) to take advantage of the expressivity of energy networks without incurring the high inference cost. This is a novel learning framework that uses an energy network as a trainable loss function (loss-net) to train a separate neural network (task-net), which is then used to perform the inference through a forward pass. We establish SEAL as a general framework wherein various learning strategies like margin-based, regression, and noise-contrastive, could be employed to learn the parameters of loss-net. Through extensive evaluation on multi-label classification, semantic role labeling, and image segmentation, we demonstrate that SEAL provides various useful design choices, is faster at inference than GBI, and leads to significant performance gains over the baselines.

2020

ACL
Weakly Supervised Medication Regimen Extraction from Medical Conversations

Dhruvesh Patel, Sandeep Konam, and Sai Prabhakar

In Proceedings of the 3rd Clinical Natural Language Processing Workshop, Nov 2020

Abs Bib

Automated Medication Regimen (MR) extraction from medical conversations can not only improve recall and help patients follow through with their care plan, but also reduce the documentation burden for doctors. In this paper, we focus on extracting spans for frequency, route and change, corresponding to medications discussed in the conversation. We first describe a unique dataset of annotated doctor-patient conversations and then present a weakly supervised model architecture that can perform span extraction using noisy classification data. The model utilizes an attention bottleneck inside a classification model to perform the extraction. We experiment with several variants of attention scoring and projection functions and propose a novel transformer-based attention scoring function (TAScore). The proposed combination of TAScore and Fusedmax projection achieves a 10 point increase in Longest Common Substring F1 compared to the baseline of additive scoring plus softmax projection.
@inproceedings{patel-etal-2020-weakly, title = {Weakly Supervised Medication Regimen Extraction from Medical Conversations}, author = {Patel, Dhruvesh and Konam, Sandeep and Prabhakar, Sai}, booktitle = {Proceedings of the 3rd Clinical Natural Language Processing Workshop}, month = nov, year = {2020}, address = {Online}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2020.clinicalnlp-1.20}, doi = {10.18653/v1/2020.clinicalnlp-1.20}, pages = {178--193}, }
AKBC
Representing Joint Hierarchies with Box Embeddings

*Dhruvesh Patel, *Shib Sankar Dasgupta, Michael Boratko, Xiang Li, Luke Vilnis, and 1 more author

In Automated Knowledge Base Construction (AKBC), Nov 2020

Abs Bib Code Slides

Learning representations for hierarchical and multi-relational knowledge has emerged as an active area of research. Box Embeddings [Vilnis et al., 2018, Li et al., 2019] represent concepts with hyperrectangles in n-dimensional space and are shown to be capable of modeling tree-like structures efficiently by training on a large subset of the transitive closure of the WordNet hypernym graph. In this work, we evaluate the capability of box embeddings to learn the transitive closure of a tree-like hierarchical relation graph with far fewer edges from the transitive closure. Box embeddings are not restricted to tree-like structures, however, and we demonstrate this by modeling the WordNet meronym graph, where nodes may have multiple parents. We further propose a method for modeling multiple relations jointly in a single embedding space using box embeddings. In all cases, our proposed method outperforms or is at par with all other embedding methods.
@inproceedings{dhruveshbox2020, title = {Representing Joint Hierarchies with Box Embeddings}, author = {Patel, *Dhruvesh and Dasgupta, *Shib Sankar and Boratko, Michael and Li, Xiang and Vilnis, Luke and McCallum, Andrew}, booktitle = {Automated Knowledge Base Construction (AKBC)}, year = {2020}, url = {https://openreview.net/forum?id=J246NSqR_l}, video = {https://youtu.be/yqP8wjMocAs}, }