Sumanta Kashyapi

Data Scientist @ Dell

IR, Representation Learning, DL, LLM

Email: sumantakashyapi [at] gmail [dot] com

Visit My Works

Bio

Sumanta Kashyapi is a senior data scientist at Dell Technologies where he is part of the predictive analytics team. Before joining Dell, he received his PhD in Computer Science from The University of New Hampshire, advised by Prof. Laura Dietz and Masters from The National Institute of Technology Hamirpur, advised by Prof. Madhu Kumari. His main research interests are in the intersection of Representation Learning and Information Retrieval (IR). During his PhD, Sumanta specifically focused on representation learning suitable for clustering short snippets of texts and investigated how it can be leveraged for various IR tasks. His recent works revolve around incorporating complex structures found in data into learned representations with a specific downstream task in mind and doing it efficiently.

Projects
Download free bootstrap 4 landing page, free boootstrap 4 templates, Download free bootstrap 4.1 landing page, free boootstrap 4.1.1 templates, meyawo Landing page
Query-Specific Siamese Similarity Network (QS3M)

JCDL 2022 - Best student paper nominee

Can we improve search result clustering (SRC) by directly involving the query context into the trained similarity metric used for clustering? To investigate this, we propose Query-Specific Siamese Similarity Metric (QS3M) for query-specific clustering of text documents

Read more
Download free bootstrap 4 landing page, free boootstrap 4 templates, Download free bootstrap 4.1 landing page, free boootstrap 4.1.1 templates, meyawo Landing page
Clustering Optimization as Blackbox (COB)

RepL4NLP 2021

Can we generalize contrastive learning for clustering tasks by directly optimizing for a clustering quality metric like RAND index? We present Clustering Optimization as Blackbox (COB) that employs a recent optimization technique suitable for discrete metrics and show that it leads to better representations suitable for clustering.

Read more
Download free bootstrap 4 landing page, free boootstrap 4 templates, Download free bootstrap 4.1 landing page, free boootstrap 4.1.1 templates, meyawo Landing page
Overview Retriever with Clustering Augmentation (ORCA)

FIRE 2022

Dense Passage Retrieval (DPR) relies on the underlying embedding space to find relevant documents in response to a query. In this work, we explore whether the embedding model trained with an auxiliary clustering objective improves the retrieval quality of a DPR system.

Read more
My Journey So Far...
Download free bootstrap 4 landing page, free boootstrap 4 templates, Download free bootstrap 4.1 landing page, free boootstrap 4.1.1 templates, meyawo Landing page

Recent Publications

Complete list of publications in Google scholar
  1. Topic-Mono-BERT: A Joint Retrieval-Clustering System for Retrieving Overview Passages
    S Kashyapi, L Dietz
    FIRE 2022

  2. Query-specific subtopic clustering
    S Kashyapi, L Dietz
    JCDL 2022

  3. Learn The Big Picture: Representation Learning for Clustering
    S Kashyapi, L Dietz
    RepL4NLP 2021

  4. Wikimarks: Harvesting Relevance Benchmarks from Wikipedia
    L Dietz, S Chatterjee, C Lennox, S Kashyapi, P Oza, B Gamari
    ACM SIGIR 2022

  5. Retrieve-Cluster-Summarize: An Alternative to End-to-End Training for Query-specific Article Generation
    C Lennox, S Kashyapi, L Dietz
    arXiv preprint 2023

Recent Patents


  1. Transformer-based automatic labeler for misaligned anomalous event with time series data.
    U.S. Patent DC-132501.01, filed June 05, 2023. Patent pending.
    S Kashyapi et al.

  2. Time series anomaly detection with rare event failure prediction.
    U.S. Patent DC-134123.01, filed Sep 13, 2023. Patent pending.
    S Kashyapi et al.

Acknowledgment


This website uses the template and design by themewagon. Also, the My journey so far... section is inspired from the resume section of Tao Yu.