Bookstore - TERI

TERI Bookstore

World Digital Libraries: An International Journal (WDL)
Vol.15(2) December 2022
Print ISSN : 0974-567X
Online ISSN : 0975-7597

Topic Modelling and its Application in Libraries: a review of specialized literature

M. Lamba, Department of Library and Information Science, University of Delhi, Delhi – 110007, (E): lambamanika07@gmail.com

Prof. M. Madhusudhan, Department of Library and Information Science
University of Delhi, Delhi – 110007, (E): madhusudhan@libinfosci.du.ac.in

DOI: 10.18329/09757597/2022/15207

Abstract

Text mining application is one of the most trending and highly researched areas in social sciences. To date, library professionalsâ€™ knowledge of text mining tools and practice is mainly limited, resultantly, the library community poorly understands the full range of issues related to text mining. This article provides information on applying a text mining approach called topic modelling in the library and information science domain. Topic modelling is a text mining approach that determines a generative model for documents. This article maps how topic modelling can be used in libraries through various databases during 2009â€“22 period. It was found that the gap between information users and librarians can be bridged by text mining training. The study concludes that librarians must develop text mining skills to meet their patronsâ€™ needs.

Introduction

The exponential expansion of data has become inevitable with the advancement of digital platforms, software, and media. Several social science and humanities disciplines employ computational approaches and methods to take advantage of this digital trace data and use it to explore various research problems. Nowadays, libraries use computational research to curate, manage, store, and make the generated information accessible by machines, algorithms, and applications.

Topic modelling is a text mining approach that determines a generative model for documents. This generative model revolves around the probability distributions over the words and helps process, manage, organize, and extract knowledge derived from a huge amount of text data in various databases. It is used to mine the content of a document. A topic can be defined as the main idea discussed in a text, that is, the theme or subject of different granularities. There are mainly two tasks performed in topic modelling. First to discover the major topics in text data and second to analyse which documents cover which topics (Figure 1). There are no machine-readable annotations that can tell the topic modelling programs about the semantic meaning of the words in the text. Thus, it infers abstract topics based on similar word usage patterns in each document (Lamba and Madhusudhan 2022). It is a well-established research area in digital libraries, computer science, bioinformatics, recommender systems, digital humanities, and medicine. It has many applications such as literature review, marketing, citation analysis, organizing documents, improving search results, browsing, exploratory analysis, keyword extraction, text categorization, tag recommendation, and similarity search and can be applied generally to various data types like images, survey data, biological data, and musical notes. There are multiple algorithms to conduct topic modelling, such as probabilistic latent semantic analysis, latent dirichlet allocation (LDA), structural topic model (STM), correlation explanation (CorEx), hierarchal topic modelling, dynamic topic modelling, correlated topic modelling, lda2vec, BERTopic, contextualized topic modelling, and many more.

Thus, this study will: (i) analyse the metadata of current literature on how topic modelling is used in libraries and (ii) present a descriptive and integrative literature review of the same.

Related Literature

The development of web and digital libraries has “made it easier to access a larger number of textual documents, which come together to develop useful data resources” (Salloum, et al. 2018). Owing to the rise of digital libraries, many videos, audio, music, books, archives, photographs, periodicals, monographs, and genealogies have been digitized over the past two decades with highly structured metadata records. “The research methods used by librarians have fundamentally changed by relying on the construction of digital libraries. When the construction reaches a certain scale, this document-focused management and service encounter bottlenecks. As a result, it becomes quite difficult for them to discover the relevant facts or knowledge from such vast resources” (Cuijuan, et al. 2018).

Various modern techniques such as text mining, machine learning, natural language processing, and semantic web can analyze digital resources on a large scale to offer precise services focused on mining hidden knowledge. “Text mining is a specialized interdisciplinary field combining techniques from linguistics, computer science, and statistics to build tools that can efficiently retrieve and extract information from the digital text” (Lammey 2015). It is a computational research technique that uncovers patterns in an extensive collection of text-based datasets. It is the process of deriving information from machine-readable documents by extracting useful information and patterns from them. It is composed of four stages (Figure 2). Firstly, relevant documents are identified. Secondly, these documents are turned into a machine-readable format to extract the structured data. Thirdly, the extracted information is mined to test hypotheses, discover new knowledge, and identify new relationships. Tools for the text mining have the potential to (i) translate search strategies for different datasets, (ii) increase search accuracy, and (iii) reduce biases by improving the transparency, objectivity, and reproducibility of search strategies (McGowan 2021).

Text Mining in Libraries

Many text mining applications have been researched in the library and information science domain over the years. Nicholson (2003) coined the term ‘bibliomining’. It uses data mining and bibliometric tools to analyze data produced by library services. Some of the significant applications of text mining in libraries have been summarized below with examples from prominent studies:

Making ontologies: Ontologies have important applications, for example, query expansion for information retrieval, generation of grammars, information extraction, and so on, but building comprehensive ontologies is a time-consuming procedure. Machine- learning methods can be used for ontology learning by substituting the semantic classes with clusters of similar words and using semantic relations (Beimann and Mehler 2014). Further, they suggested using computational linguistic methods to produce “a Dewey Decimal Classification-based topic classification scheme for digital libraries” (Lamba and Madhusudhan 2022).

Library services: Bibliomining provides an information service system to provide faster access to documents than manual cataloguing to match user profiles to their needs and acquire relevant resources targeted at users’ needs (Alunga 2016).

Library management: Text mining can be used to perform a major activity of library management—collection development. Cong (2017) studied this text mining application in libraries through book procurement.

Recommendation of resources based on reading/search habits: Text mining can be applied in libraries to recommend books to users based on their reading habits recorded during the circulation activity. Luo (2017) used a similar approach by performing a “cluster algorithm to assist the librarians in acquiring books based on the borrowing frequency and the type of favourites of all kinds of books for fanciers, and then recommend readers the appropriate resources according to their professional backgrounds, interests, hobbies, and other information.” Text mining can not only be used to recommend print resources (using bibliomining), it can also recommend electronic resources (using document clustering, topic proportion, or classification) according to users’ reading and searching preferences (Lamba and Madhusudhan 2019a).

Job Advertisement: Yang, et al. (2016) showed that text mining could be used to analyze job advertisements published by libraries to identify the changes that have occurred in the profession over the years. Library school faculty can use findings from such studies to develop curricula to align with real-life requisites of the workplace. In contrast, library managers can use the results to measure the areas of the prospective requirement for organization planning.

Scientometrics: Text mining can be an instrumental methodology in scientometrics to solve a number of problems. It can be applied to perform topic modelling [Miyata, et al. (2020); Han (2020); Lamba and Madhusudhan (2019b); Sugimoto, et al. (2011)], co-word analysis, author–topic modelling, burst detection (Tattershall, et al. 2020), the correlation between citations and downloads, and much more.

Management and organization of electronic resources: Meta-tagging of the electronic resources using topic modelling and topic proportion not only saves the users’ time but also helps in organizing and managing the electronic resources. It is the most efficient way to organize and manage a library’s database/website/repository resources. A topic modelling tags a resource based on the concept/theme behind it; it can be a very useful text mining technique for librarians to increase the visibility and usability of their library resources (Lamba and Madhusudhan 2022).

Automatic classification of future resources: Text mining can be used to predict the future classification of resources based on the previous tagging using topic modelling.

Improved information retrieval and searching of electronic resources: Text mining aids in delivering a fast-searching experience and better resource retrieval using topic modelling.

Sentiment analysis, social media mining, and marketing of libraries: Text mining can not only be applied to full text documents or research articles but is equally applicable to short texts like tweets or comments. Lamba and Madhusudhan (2018) used ‘sentiment analysis as an experimental study to introduce new service for libraries’ users’.

Methodology

Many articles exist on the application of topic modelling in libraries, but the scope of this study is limited to articles published from 2009 to 2022. A search was conducted with a combination of various key terms such as â€˜topic modellingâ€™, â€˜LDAâ€™, and â€˜topic modelling in librariesâ€™ in Library and Information Science Abstract (LISA), Emerald database, Science Direct, Google Scholar, and other databases including search on the e-journal websites and search engines. In total, 43 publications were reviewed for the study.

Findings

This section will present the findings of the descriptive analysis using R, followed by a comprehensive literature review of the selected 43 articles.

Table 1 presents distinct types of documents used for the study. Five types of documents were identified where 65% were research articles, 21% were conference papers, 5% were book chapters, 5% were theses, and 5% were technical reports.

Table 2 summarizes different types of authorship patterns for the selected publications. Eight authorship types were identified, where most publications had either 1 or 2 authors besides publications with 10, 7, and 6 authors were also identified.

Figure 3 shows the year-wise distribution of the articles used for the study. It was found that most of the publications were from the years 2010, 2018, 2019, and 2020. Figure 4 depicts the title-wise distribution of the articles. Most papers were published in Scientometrics journal, followed by Information Processing & Management and Information Technology and Libraries journals.

Topic Modelling in Libraries

A comprehensive descriptive and integrative review of the selected publications is detailed in this section. Mehler and Waltinger (2009) presented a topic-classification model to help digital libraries process documents more quickly. They explored several classifiers where they used Open Archive Initiative metadata as the source and Dewey Decimal Classification (DDC) as the target for Wikipedia categories. They found a necessity to create support vector machine (SVM)-based DDC classifiers for massive training datasets for over two languages to get better F-measure values.

Newman, et al. (2010) investigated the human topic model evaluation to improve the user experience for discovering digital libraries’ contents and search interfaces. They presented methods to assess the coherence and interpretability of topics where more than 70 human subjects evaluated and scored 500 topics from various domains and genres. They used a scoring model that performed well at predicting human scores, which could be a crucial initial step in incorporating topic modelling in
digital libraries.

Efron, et al. (2011) proposed a way to improve topic modelling for extensive federated digital library collections by removing documents that present weak topical information due to metadata inconsistencies during the training of topic models. They employed Content Aggregation on a corpus from the Institute of Museums and Library Services Digital Collections for their initial assessment. They demonstrated an improvement in word coherence within topics.

Hagedorn, et al. (2011) investigated the usefulness of topic modelling algorithms to library users. They used HathiTrust instances containing architecture, art, and art history entries from the beginning of 2010, with terms produced using topic modelling. They used two methods to understand the usefulness. They created an un-moderated environment where users navigated the instances without supervision and talked to expert users as they navigated the instances. They found that the use of topic facets was high in an un-moderated testing environment. However, satisfaction was somewhat low, whereas, in the one-on-one session, they were led to believe by the expert users that using topics in conjunction with additional subject terms like LCHS is best to use.

Using LDA, Sugimoto, et al. (2011) analysed “3121 doctoral dissertations completed at North American LIS programs between 1930 and 2009. They found that the LIS field has changed substantially” from 1930–69 to 2000–09, where the use of word ‘library’ and its related terms has diminished in the topics over time.

Yang, et al. (2011) used topic modelling in the collections of historical newspapers published in Texas from 1829 to 2008 to assist historical research. In their paper, they experimented with topic models to identify potential issues of interest for historians.

Using topic modelling, Zhao, and Jiang (2011) compared Twitter and New York Times (a traditional news source). They found that New York Times and Twitter covered similar topics and categories but had a different distribution of topic types and categories. Further, they identified Twitter-specific and NYT-specific topics. They discovered an interesting correlation between the proportions of opinionated retweets and tweets and topic types and categories.

Aletras, et al. (2014) compared different topic representations, textual phrases, topic words, and images for document retrieval tasks. They directed that the participants use pre-defined queries to retrieve relevant documents. They discovered that users could more easily understand text labels than image and keyword labels. They demonstrated that labelling methods are an effective alternative topic representation.

Riddell (2014) used topic modelling to explore a corpus of 22,198 journal articles and book reviews from four US-based German studies journals. He revealed the disciplinary trends in the German studies journal and discussed the prospects of topic models in the 19th century research and intellectual history.

Choi, et al. (2015) proposed using an automatic topic discovery method from web-mined user-generated interpretations of songs. They used LDA on 24,436 well-known songs from songmeaning.com and the Million Song dataset. They also evaluated filtering techniques to identify high-quality topics using Normalized Point-wise Mutual Information. Their study demonstrated a strategy that shows opportunities for enriching subject metadata in music
digital libraries.

Olowookere, et al. (2015) presented a software program, UPH Digital Library Miner, to mine documents in digital libraries to identify the topical structure and topic-based similarities. It used topic modelling and inverted Kullback–Leibler divergence measure at the backend. They showed the software’s working by integrating it with Greenstone digital library system, which contained 628 publications from IEEE Transactions on Software Engineering and reported the results.

Cain (2016) presented a case study on topic modelling in the preliminary description of declassified Bill Clinton presidential records. He demonstrated how topic modelling could be used to improve access to poorly described digital texts that are distributed to archives and libraries.

Hengchen, et al. (2016) used topic modelling on a huge subset of the European Commission’s digital archival records to understand the potential and limits to obtain key themes automatically. They mapped the headings of the EUROVOC thesaurus to the topics as a proof of concept for the future possibility of representing discovered topics with a hierarchal search interface for the users.

Kim, et al. (2016) provided a new lens to analyse topics about citation sentences from 6360 full text articles from PubMed Central and selected the top 15 journals in the field of oncology. They used the Author–Journal–Topic (AJT) model to consider authors and journals for topic analysis. They found vital topics shared among researchers in oncology and the prominent journals and authors who lead knowledge exchange in the sub-disciplines
of oncology.

Hashtag-LDA is an approach that Zhao, et al. (2016) presented for personalized recommendation by discovering global hashtags and the association between themes and hashtags based on latent topics in microblogs.

Chen (2017) considered the importance of blog topics at different times and used the time parameter to differentiate between variations of the same topic at different periods while analysing the blog topics. The time parameter further improved the performance of blog searching and helped rank the blogs based on their popularity.

Figuerola, et al. (2017) used bibliographical references (title and abstract) from 92,705 papers from 1978 to 2014 from the LISA database to determine the key topics and categories in the domain of LIS using LDA. They discovered 19 topics that might be divided into four major categories: (i) information technology, (ii) processes, (iii) libraries, and (iv) specific regions of information application.

Gao and Wallace (2017) analysed faculty publications across campus using topic modelling in addition to similarity analysis and visualization techniques. They showed how librarians could use these tools to monitor research trends for collection development
and customize research support for targeted faculty members.

Fang, et al. (2018) conducted regression analysis on document–topic distributions to determine cold and hot topics after using LDA to extract themes from research abstracts. From the top six accounting journals, 3737 articles were used between 1992 and 2014. Thirty-two topics were identified, of which seven were hot topics and six were cold topics.

Lamba and Madhusudhan (2018) used topic modelling for Indian LIS ETDs from the Shodhganga database for 2013–17. They identified five core topics: (i) user studies,
(ii) information literacy, (iii) library resources, (iv) scientometrics, and (v) library services, and built a prediction model.

A strategy for rating research institutes based on topic distribution was discussed by Ma, et al. (2018) in their paper. The distribution of topics and papers was determined using LDA, and the impact of papers (measured in terms of the number of citations) was assigned to research topics. The institution–paper matrix was then used to determine the competitive research institution for each research topic.

Momtazi (2018) proposed a classification method that used LDA to classify questions in community-based question–answering, which exploited latent semantics from unlabelled data.

Sun (2018) proposed a conceptual framework to understand the information-seeking behaviour processes, associated information, and cognitive barriers of issue-based knowledge crystallization (IBKC). He used topic modelling to find “how people should be assisted in dealing with cold-start problems when social clues are lacking?” by generating an initial information landscape.

Timakum, et al. (2018) examined the full text journal articles from “six of the leading journals in the field of library science”. They pinpointed changes in the field’s knowledge patterns between 1997 and 2016 using topic modelling, co-word analysis, and text summarization.

Bainbridge, et al. (2019) reported a case study to reproduce and create a topic model obtained from a corpus of documents. They covered the necessary steps, challenges, and recommendations based on lessons learned to use the virtual machine Data Capsule platform at the HathiTrust Research Center to replicate the work.

Topic modelling was creatively used by Chen, et al. (2019) to analyse research corpora that users accessed via a proxy server at the library. They also discussed the advantages and disadvantages of using library proxy log data for learning analytics research.

Goodman (2019) tested and discussed the potential application of topic modelling tools for archivists and integration into processing workflows, especially appraisal. LDA was utilized by Lamba and Madhusudhan (2019a) to examine 928 full text articles from the 1981–2018 issue of the DESIDOC Journal of Library and Information Technology. They discovered 50 topics in all, including 26 unique topics, where bibliometrics, information retrieval, ICT, and user studies were the most extensively studied in India at the time.

In another study by Lamba and Madhusudhan (2019b), they applied topic modelling on LIS ETDs ‘from ProQuest Dissertations and Theses Global database from 2014 to 2018’ using the RapidMiner tool. They performed metadata analysis first to find the association between different entities like universities, departments, degree types, and departments with ETDs. They then identified eight key topics: (i) academic library, (ii) children’s literature, (iii) information retrieval, (iv) user study, (v) archival science, (vi) library leadership, (vii) digital library, and (viii) digital communication. Finally, they built a prediction model to classify the future untagged ETDs in the database.

Callaway, et al. (2020) performed topic modelling over 300 extended bibliographies of an edited collection of Defining Digital Humanities, which describes the ‘pull’ and ‘push’ of “digital humanities as a negotiation between gatekeeping and warm invitation. They analysed the metadata to explore the push and pull manifestation across different demographics”.

Han (2020) studied the evolution of the LIS discipline by analysing 14,035 LIS journal articles using LDA for the period of 1966–2019. He discovered that (i) research on library science topics has declined over time, (ii) bibliometrics research, particularly citation analysis, is extremely steady throughout durations,
(iii) information retrieval research has constantly dominated with a general transition to model-based text processing, (iv) research on information seeking behaviour is too steady and is distributed amongst different topics instead of represented as a distinct subject, and (v) research on information technology has increased over time, and (vi) research in information systems and organizational activities has a closer relationship with e-commerce.

Miyata, et al. (2020) identified topics from 1648 full text using research articles from 5 peer-reviewed representative LIS journals. They used LDA to visualize the knowledge structure and its transitions from 2000–02 to 2015–17. They found 30 topics in each period and plotted them on a 2-D map using LDAvis.

Zamani, et al. (2020) proposed dynamic content-specific LDA to identify domains in COVID-specific discourse to monitor societal changes in concerns or viewpoints. They showed reliable tracking of topic evolution, and those dynamics were then considered to forecast changes in real work outcomes like mobility and unemployment. They also concluded that this model might be used to monitor any discussion that changed over time.

Junlabuddee and Tuamsuk (2021) categorized and analysed information science data from journals between 2013 and 2019 using topic modelling—30,571 research articles were analysed and 30 topics were identified. The five most frequently researched topics were data management, competency development, social media analytics, bioinformatics, and public and community services.

Koh and Fienup (2021) examined the outcomes of several topic modelling strategies to examine chat reference data gathered from academic libraries between April 2015 and May 2019. They found that Probabilistic Latent Semantic Analysis (pLSA) performed the best and produced more accurate and interpretable topics. Furthermore, they found that whole-chat datasets, which included both ends of the conversation between library users and librarians, performed better than question-only datasets, which only contained the user’s initial inquiry.

Shadrova (2021) pointed out that as topic modelling is gaining popularity in social sciences and digital humanities, she found it interesting to study the epistemological concerns centred around linguistic concepts, topic modelling, and argumentative embedding of evidence resulting from topic modelling. She concluded that topic modelling does not meet the criteria for an independent research approach in its current form. It is based on irrational presumptions and cannot be tested against alternative theories.

Taleqani, et al. (2021) examined public opinion towards transit during COVID-19 by using social media posts on Twitter through topic modelling. In addition to sharing the lessons they learnt while creating the methodology and problem statement, they found discussions about public transit changes through the initial several months of 2020.

Urs and Minhaj (2021) used topic modelling and other text mining techniques to analyse the data science curriculum from 32 iSchools. They showed that the data science programmes exhibited biased towards data visualization, machine learning, data mining, NLP, and AI, slanted towards ontologies and health informatics, went light on statistics, and had minimal thrust towards research data management.

Vu (2021) analysed theses of the Finnish Universities of Applied Sciences from the Theseus database using topic modelling for 2009–20 period. He found the LDA 8-topic model and DTM 5-topic model were the most promising. Moreover, he demonstrated a small test utilizing the model to develop a thesis supervisor finder for students.

Banerjee (2022) fine-tuned language models like BERT and SciBERT to capture specialized vocabulary and used chapter-level labels to understand the topic discussed in the chapter presented in the electronic theses and dissertations (ETDs).

Topic modelling was utilized by Glowacka-Musial (2022) to speed up the process of assigning subject headings to the digital collections of news releases from New Mexico State University published between 1958
and 2020.

Hennesy and Naughton (2022) demonstrated the application of topic modelling to 7773 articles published in Library Quarterly journals from 1931 to 2015. They examined two topics—Women Librarians and Great Men—that suggested differences in gender representation in the journal and supported a new hypothesis of the historical inclusion of gendered objects in LIS literature.

Lamba and Madhusudhan (2022) provided a thorough theoretical foundation for topic modelling and examples of its tools and several visualization techniques. Through a case study employing three distinct tools, they further illustrated the use of topic modelling in libraries.

Conclusion

The article reviewed different practical and theoretical aspects of the use and application of topic modelling in the discipline of LIS. It discussed various problems, challenges, uses, advantages, and disadvantages related to topic modelling applications in LIS. The studies reviewed in this article will guide librarians in effectively implementing topic modelling in their libraries and abreast their knowledge of current research and application of topic modelling and its advantages. The study is accompanied with one major limitation—studies are published only in English language.

Text mining is still a budding field in libraries and can produce insights from extensive library collections. It has a vast number of applications and offers possibilities to libraries to improve their services (for example, selective dissemination of information (SDI), current awareness service (CAS), reference service, recommendation service), improve their collection, and add value to the documents. To date, library professionals’ knowledge of text mining tools and practices, such as topic modelling, is mainly limited. Consequently, the library community poorly understands the full range of issues related to text mining. The future of libraries and the LIS field can be largely improved by applying text mining methods like topic modelling to library collections and librarians’ and LIS students’ developing skills in text mining to meet the needs of today’s patrons.

References

Aletras N., et al. 2014. Representing topics labels for exploring digital libraries. In IEEE/ACM Joint Conference on Digital Libraries, pp. 239–248. DOI: https://doi.org/10.1109/JCDL.2014.6970174

Alunga A. J. 2016. A conceptual data mining model (DMM) used in selective dissemination of information (SDI): a case study of Strathmore University library. Regional Journal of Information and Knowledge Management 1 (2). Details available at <https://docplayer.net/199323345-A-conceptual-data-mining-model-dmm-used-in-selective-dissemination-of-information-sdi-a-case-study-of-strathmore-university-library.html>, last accessed on 3 October 2022

Bainbridge, D., et al. 2019. Using the HTRC Data Capsule Model to promote reuse and evolution of experimental analysis of digital library data: a case study of topic modeling. In 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL), pp. 463–464. Details available at <https://doi.org/10.1109/jcdl.2019.00124>

Biemann, C. and A. Mehler. 2014. Text Mining: from ontology learning to automated text processing applications. Springer International Publishing. Details available at <https://link.springer.com/book/10.1007/978-3-319-12655-5>, last accessed on 3 October 2022

Banerjee, B. 2022. Opening scholarly documents through text analytics. In JCDL ’22: Proceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries. Details available at <https://dl.acm.org/doi/abs/10.1145/3529372.3530948>, last accessed on 3 October 2022

Cain, J. O. 2016. Using topic modeling to enhance access to library digital collections. Journal of Web Librarianship 10 (3): 210–225. DOI: https://doi.org/10.1080/19322909.2016.1193455

Callaway, E., et al. 2020. The Push and Pull of Digital Humanities: Topic Modeling the “What is digital humanities?” Digital Humanities Quarterly 14 (1). Details available at <http://www.digitalhumanities.org/dhq/vol/14/1/000450/000450.html>

Chen, L-C. 2017. An effective LDA-based time topic model to improve blog search performance. Information Processing & Management 53 (6): 1299–1319. DOI: https://doi.org/10.1016/j.ipm.2017.08.001

Chen, Y., et al. 2019. Using probabilistic topic modeling of library access records to identify learning trends in educational research. Details available at <https://doi.org/10.7916/d8-acrt-re15>, last accessed on 3 October 2022

Choi, K., et al. 2015. Topic modeling users’ interpretations of songs to inform subject access in music digital libraries. In Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 183–186. DOI: https://doi.org/10.1145/2756406.2756936

Cong, D. 2017. Application of text mining in library book procurement. In MATEC Web of Conferences. DOI: https://doi.org/10.1051/matecconf/201710002044

Cuijuan, X., et al. 2018. Implementation of a linked data-based genealogy knowledge service platform for digital humanities. Data and Information Management 2 (1):15–26. DOI: https://doi.org/10.2478/dim-2018-0005

Doig, C. 2015. Introduction to topic modeling in python. Details available at <https://chdoig.github.io/pygotham-topic-modeling/#/>, last accessed on 3 October 2022

Efron, M., et al. 2011. Building topic models in a federated digital library through selective document exclusion. Proceedings of the American Society for Information Science and Technology 48 (1):1–10. DOI: https://doi.org/10.1002/meet.2011.14504801048

Fang, D., et al. 2018. Discovering research topics from library electronic references using latent Dirichlet allocation. Library Hi Tech. 36 (3): 400–410. DOI: https://doi.org/10.1108/LHT-06-2017-0132

Figuerola, C. G., et al. 2017. Mapping the evolution of library and information science (1978–2014) using topic modeling on LISA. Scientometrics 112 (3):1507–1535. DOI: https://doi.org/10.1007/s11192-017-2432-9

Gao, W. and L. Wallace. 2017. Data mining, visualizing, and analyzing faculty thematic relationships for research support and collection analysis. Details available at <https://uh-ir.tdl.org/handle/10657/4343>, last accessed on 3 October 2022

Glowacka-Musial, M. 2022. Applying topic modeling for automated creation of descriptive metadata for digital collections. Information Technology and Libraries 41(2). DOI: https://doi.org/10.6017/ital.v41i2.13799

Goodman, M. M. 2019. What is on this disk? An exploration of natural language processing in archival appraisal. Details available at <https://cdr.lib.unc.edu/downloads/wm117s91s?locale=en>, last accessed on
3 October 2022

Hagedorn, K., et al. 2011. A new way to find: testing the use of clustering topics in digital libraries; D-Lib Magazine 17(9/10). Details available at <https://doi.org/10.1045/september2011-hagedorn>

Han, X. 2020. Evolution of research topics in LIS between 1996 and 2019: an analysis based on latent Dirichlet allocation topic model. Scientometrics 125 (3):2561–2595. DOI: https://doi.org/10.1007/s11192-020-03721-0

Hengchen, S., et al. 2016. Exploring archives with probabilistic models: Topic modeling for the valorisation of digitised archives of the European Commission. In 2016 IEEE International Conference on Big Data (Big Data), pp. 3245–49. DOI: https://doi.org/10.1109/BigData.2016.7840981

Hennesy, C. and D. Naughton. 2022. Computational topic models of the library quarterly. Portal: Libraries and the Academy 22(3): 745–768. Details available at <https://preprint.press.jhu.edu/portal/sites/ajm/files/hennesy.pdf>

Junlabuddee, S. and K. Tuamsuk. 2021. Analysis of research data in information science using the topic modeling method; Journal of Mekong Studies 17(1): 89–109. Details available at <https://so03.tci-thaijo.org/index.php/mekongjournal/article/view/251779>

Kim, H. J., et al. 2016. Exploring the leading authors and journals in major topics by citation sentences and topic modeling. In Proceedings of the Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL), pp. 42–50. Details available at <https://aclanthology.org/W16-1506>

Koh, H. and M. Fienup. 2021. Topic modeling as a tool for analyzing library chat transcripts. Information Technology and Libraries
40 (3). DOI: https://doi.org/10.6017/ital.v40i3.13333

Lamba, M. and M. Madhusudhan. 2018. Metadata tagging of library and information science theses: Shodhganga (2013–17). In ETD 2018: Beyond the Boundaries of Rims and Oceans Globalizing Knowledge with ETDs, Taipei, Taiwan. DOI: https://doi.org/10.5281/zenodo.1475795>

Lamba, M. and M. Madhusudhan. 2019a. Mapping of topics in DESIDOC Journal of Library and Information Technology, India: a study. Scientometrics 120 (2): 477–505. DOI: https://doi.org/10.1007/s11192-019-03137-5

Lamba, M and M. Madhusudhan. 2019b. Mapping of ETDs in ProQuest dissertations and theses (PQDT) global database (2014–18). Cadernos BAD 1: 169–182. Details available at <http://hdl.handle.net/20.500.11959/brapci/134567>

Lamba, M. and M. Madhusudhan. 2022. Topic modeling. In Text Mining for Information Professionals, pp. 105–137. Springer, Cham. DOI: https://doi.org/10.1007/978-3-030-85085-2_4

Lammey, R. 2015. Crossref text and data mining services. Insights 28 (2): 62–68. DOI: https://doi.org/10.1629/uksg.233

Luo, L. 2017. Application of data mining in library-based personalized learning. International Journal of Emerging Technologies in Learning 12 (12): 127–133. DOI: https://doi.org/10.3991/ijet.v12i12.7967

Ma, T., et al. 2018. Topic-based research competitiveness evaluation. Scientometrics 117(2):789–803. DOI: https://doi.org/10.1007/s11192-018-2891-7

McGowan, B. S. 2021. Using text mining tools to inform search term generation: an introduction for librarians. Portal: Libraries and the Academy 21 (3): 603–618. DOI: https://doi.org/10.1353/pla.2021.0032

Mehler, A. and U. Waltinger. 2009. Enhancing document modeling by means of open topic models: crossing the frontier of classification schemes in digital libraries by example of the DDC; Library Hi Tech. 27(4): 520–539. DOI: https://doi.org/10.1108/07378830911007646

Miyata, Y., et al. 2020. Knowledge structure transition in library and information science: topic modeling and visualization. Scientometrics 125 (1): 665–687. DOI: https://doi.org/10.1007/s11192-020-03657-5

Momtazi, S. 2018. Unsupervised latent dirichlet allocation for supervised question classification. Information Processing & Management 54 (3): 380–393. DOI: https://doi.org/10.1016/j.ipm.2018.01.001

Nicholson, S. 2003. The bibliomining process: data warehousing and data mining for library decision-making. Details available at <https://repository.arizona.edu/handle/10150/106392>, last accessed on
3 October 2022

Newman, D., et al. 2010. Evaluating topic models for digital libraries. In Proceedings of the 10th Annual Joint Conference on Digital Libraries - JCDL ’10, p. 215. DOI: https://doi.org/10.1145/1816123.1816156

Olowookere, T. A., et al. 2015. UPH digital library miner: a topic modeling-based software application for mining document collections of a digital library. International Journal of Computer Applications
132 (13):1–8

Riddell, A. B. 2014. How to read 22,198 journal articles: studying the history of German studies with topic models. In Distant Readings: topologies of German culture in the long nineteenth century, L Tatlock and M Erlin (Eds), pp. 91–114. Boydell & Brewer. Details available at <https://www.cambridge.org/core/books/distant-readings/how-to-read-22198-journal-articles-studying-the-history-of-german-studies-with-topic-models/8CCA9DAE73D12598829F9CC4626D24DF>

Salloum, S. A, et al. 2018. Using text mining techniques for extracting information from research articles. In Intelligent Natural Language Processing: trends and applications, K Shaalan, A E Hassanien, and F Tolba (Eds), pp. 373–397. Springer International Publishing. DOI: https://doi.org/10.1007/978-3-319-67056-0_18

Shadrova, A. 2021. Topic models do not model topics: Epistemological remarks and steps towards best practices. Journal of Data Mining & Digital Humanities: 7595. DOI: https://doi.org/10.46298/jdmdh.7595

Sugimoto, C. R., et al. 2011. The shifting sands of disciplinary development: analyzing North American library and information science dissertations using latent dirichlet allocation. Journal of the American Society for Information Science and Technology
62 (1): 185–204. DOI: https://doi.org/10.1002/asi.21435

Sun, F. 2018. Supporting information seeking and sensemaking in issue-based knowledge crystallization. Details available at <https://etda.libraries.psu.edu/files/final_submissions/18417>, last accessed on
3 October 2022

Taleqani, A. R., et al. 2021. Using topic modeling to identify public opinion on public transportation during the COVID-19 pandemic. Details available at <https://rosap.ntl.bts.gov/view/dot/59793>, last accessed on 3 October 2022

Tattershall, E., G. Nenadic, and R. D. Stevens. 2020. Detecting bursty terms in computer science research. Scientometrics 122 (1): 681–699. DOI: https://doi.org/10.1007/s11192-019-03307-5

Timakum, T., et al. 2020. A data-driven analysis of the knowledge structure of library science with full text journal articles. Journal of Librarianship and Information Science 52 (2): 345–365. DOI: https://doi.org/10.1177/0961000618793977

Urs, S. R. and M. Minhaj. 2021. Evolution of data science and its education in iSchools: an impressionistic study using curriculum analysis. Journal of the Association for Information Science and Technology, pp. 1–17. DOI: https://doi.org/10.1002/asi.24649

Vu, M. 2021. Building topic modeling on theses abstracts data. Details available at <https://www.theseus.fi/bitstream/handle/10024/512538/Vu_Mai.pdf?sequence=2&isAllowed=y>, last accessed on 3 October 2022

Yang T.-I, et al. 2011. Topic modeling on historical newspapers. In Association for Computational Linguistics (ACL) Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, Portland, Oregon, United States. Details available at <https://digital.library.unt.edu/ark:/67531/metadc83799/>, last accessed on 3 October 2022

Yang, Q., X. Zhang, X. Du, A. Bielefield, and Y. Q. Liu. 2016. Current market demand for core competencies of librarianship—a text mining study of American Library Association’s advertisements from 2009 through 2014. Applied Sciences 6 (2):48. DOI: https://doi.org/10.3390/app6020048

Zamani, M., et al. 2020. Understanding weekly COVID-19 concerns through dynamic content-specific LDA topic modeling. In Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science, pp. 193–198. DOI: https://doi.org/10.18653/v1/2020.nlpcss-1.21

Zhao, F., et al. 2016. A personalized hashtag recommendation approach using LDA-based topic model in microblog environment. Future Generation Computer Systems 65:196–206. DOI: https://doi.org/10.1016/j.future.2015.10.012

Zhao, W. X. and J. Jiang. 2011. An empirical comparison of topics in twitter and traditional media. Detail available at <http://www.mysmu.edu/faculty/jingjiang/papers/TechReport%28Zhao2011%29.pdf>, last accessed on 3 October 2022