"Truth is One. Sages call It by various names." - RgVeda, 1.164.46.
"Then That Goal should be sought for, whither having gone none returns again. I seek refuge in that Primeval Purusha Whence streamed forth the ancient activity or energy." - Srimad Bhagavad Gita, 15.4.
Kamakura Buddha । Kenchoji Temple
Ann Arbor Vedanta Symposium । Mormonism । Favorites । Jai Madhab Nilanjan । Bharat Sevashram Sangha । Baba Swami
Khepa Baba । Bijoy Krishna Goswami । Trailanga Swami
Shibulipi | Selected Paintings | Vida | CV
I work at Gigashibu. Previously, I worked at the Department of Learning Health Sciences at University of Michigan. I completed my PhD (2018) in Computer Science and Engineering at University of Michigan in the Language and Information Technologies (LIT) group, working with Dr Rada Mihalcea.
My broad research interest is in Natural Language Processing (NLP), Computational Linguistics, and Data Mining.
My specific interest is in spirituality, and sociology of consumerism (link; see point 10.) and how language interacts with and interpenetrates them, esp. as manifested in:
In the past, I have worked in summarization, keyword extraction, stylistics (readability and formality analysis, culturomics, native language identification), and stylometry (authorship attribution, author profiling).
Before coming to Michigan, I spent two amazing years as a PhD student of Computer Science and Engineering at University of North Texas (UNT) from Fall 2012 to Summer 2014, under Dr Rada Mihalcea.
I was affiliated with the Language and Information Technologies (LIT) group at UNT.
I completed my Master of Engineering in Computer Science and Engineering at Penn State in Summer 2012.
My advisor was Dr Prasenjit Mitra, and I collaborated with Dr Xiaofei Lu.
My Master's project was on informality detection and measurement at sentence level.
An informality measure of documents like F-score helps style-based classification, ranking and diversification of search results.
Besides quantifying formality - one of the most important register dimensions, F-score moderately correlates with readability tests.
While at Penn State, I had worked in the CyDAR project, some relevant details and pointers can be found here.
While working in the ChemXSeer project at Penn State, I redesigned the Gaussian file search engine.
I did my 2011 summer internship at NEC Labs, Princeton, NJ.
My mentors were Dr Christopher Malon and Dr Bing Bai.
We worked in multiple-choice question answering.
My 2010 summer internship was at IBM India Research Lab in New Delhi, under Sachindra Joshi.
We worked in topic modeling of call center chats, in collaboration with Kumar Avinava Dubey and Dr Shantanu Godbole.
I completed my Bachelor of Engineering in Computer Science and Engineering from Jadavpur University, Kolkata, India, in Spring 2008.
My advisor was Prof. (Dr.) Sivaji Bandyopadhyay, a well-known Indian NLP researcher.
My schooling was from Howrah Zilla School (1992-2004) and Cowley Memorial St. Monica's Primary School (1991-1992) - both in Howrah, West Bengal, India.
Shibamouli Lahiri, Carmen Banea, Rada Mihalcea: Matching Graduate Applicants with Faculty Members. Socinfo 2017.
Shanta Phani, Shibamouli Lahiri, Arindam Biswas: A Supervised Learning Approach for Authorship Attribution for Bengali Language Literary Texts. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 2017.
Shibamouli Lahiri, Rada Mihalcea, Po-Hsiang Lai: Keyword Extraction from Emails. Journal of Natural Language Engineering (JNLE), 2016.
Saeid Parvandeh, Shibamouli Lahiri, Fahimeh Boroumand: PerSum: Novel Systems for Document Summarization in Persian. International Journal of Asian Language Processing. Journal version, ArXiv e-print. 2016.
Shanta Phani, Shibamouli Lahiri, Arindam Biswas: A Machine Learning Approach for Authorship Attribution for Bengali Blogs. IALP 2016. [data]
Shanta Phani, Shibamouli Lahiri, Arindam Biswas: Authorship Attribution in Bengali Language. ICON 2015. [data]
Shibamouli Lahiri: SQUINKY! A Corpus of Sentence-level Formality, Informativeness, and Implicature. ArXiv e-print (2015). [data]
Vanessa Loza, Shibamouli Lahiri, Rada Mihalcea, Sean Lai: Building a Dataset for Summarization and Keyword Extraction from Emails. LREC 2014.
Shibamouli Lahiri: Complexity of Word Collocation Networks: A Preliminary Structural Analysis. EACL 2014 Student Research Workshop. ArXiv e-print. [code and data] [EACL 2014 slides]
Shanta Phani, Shibamouli Lahiri, Arindam Biswas: Inter-rater Agreement Study on Readability Assessment in Bengali. ICONACC 2014.
Shibamouli Lahiri, Sagnik Ray Choudhury, Cornelia Caragea: Keyword and Keyphrase Extraction Using Centrality Measures on Collocation Networks. ArXiv e-print (2014).
A demo of our system, designed by Sagnik Ray Choudhury, was available here. The link also contained code, data and supplementary material for the paper. Currently, we have a Google Drive link for code + data.
Shibamouli Lahiri, Rada Mihalcea: Authorship Attribution Using Word Network Features. ArXiv e-print (2013).
Shibamouli Lahiri, Rada Mihalcea: Using N-gram and Word Network Features for Native Language Identification. BEA-8 Workshop (2013). [poster]
Shanta Phani, Shibamouli Lahiri, Arindam Biswas: Culturomics On A Bengali Newspaper Corpus. IALP 2012. [code and data]
In Figure 2_t(a) of this paper, frequency of the political party "Congress" is inflated because of the occurrences of "Trinamool Congress", which is another political party. So effectively, "Congress" counts for both "Congress" as well as "Trinamool Congress". We are thankful to Srayan Datta for pointing out this discrepancy. - February 14, 2014
Shibamouli Lahiri, Xiaofei Lu: Inter-rater Agreement on Sentence Formality. ArXiv e-print (2011). [code and data]
Shibamouli Lahiri, Juan Pablo Fernández Ramírez, Shikha Nangia, Prasenjit Mitra, C. Lee Giles, Karl T. Mueller: ChemXSeer Digital Library Gaussian Search. ArXiv e-print (2011).
Shibamouli Lahiri, Prasenjit Mitra, Xiaofei Lu: Informality Judgment at Sentence Level and Experiments with F-score. CICLING 2011. [This is my Master's paper.]
I acknowledge Prof. Christos Faloutsos for ideas that led to Sections 3.2 and 3.3 of this paper. - August 11, 2018
Sumit Bhatia, Shibamouli Lahiri, Prasenjit Mitra: Generating Synopses for Document-element Search. CIKM 2009.