Abstract
The rise of digital platforms has led to a massive influx of textual data. While traditional textual analysis techniques have been effective, analyzing large datasets is becoming impractical due to the required time and resources. To demonstrate the usefulness of text mining as an alternative, this study analyzed data extracted from an emergency remote learning (ERL) environment. Free-form responses from a series of cross-sectional surveys (2020–2022) were analyzed using word frequency, collocation, concordance, topic modeling, and sentiment analyses. According to the findings, the most commonly occurring unigram and bigram in the text corpus were “hard” and “mental health,” respectively. Three primary themes based on lived experiences were identified, namely individual, academic, and technological challenges, and another three themes emerged from coping strategies, including entertainment, relationship, and health-related mechanisms. Negative sentiment toward the ERL setup was also evident in the text corpus. Overall, the combination of text mining techniques allowed for a comprehensive exploration of the linguistic features of the corpus and provided a multifaceted understanding of the selected phenomenon. Consequently, this study endorses text mining as a methodology for analyzing large volumes of textual data.
Keywords: Online Learning Environment, Emergency Remote Learning, Text Mining, Student Experience, COVID-19
Introduction
Social scientists and humanities scholars have a long history of employing textual analysis in their research projects. This methodology has been recognized as a powerful tool for investigating the social world and understanding human behavior (e.g., Wanniarachchi et al., 2022; Yang & Sheu, 2019). In the context of qualitative research, Kuckartz (2014) defined textual analysis as the process of deriving meaningful information expressed within a document using data analysis techniques commonly applied in social sciences disciplines. Examples of such techniques include classical content analysis, grounded theory, and thematic analysis. Despite the abundance of empirical evidence supporting the effectiveness of these techniques, it has become increasingly impractical to analyze data using a manual text analysis approach (Rahman, 2017). For example, vast amounts of unstructured textual data became available for analysis due to the rise of the Internet, social media networks, and other digital communication channels (Alcober et al., 2020; Bringula et al., 2022; Macanovic, 2022). Utilizing traditional manual methods of text analysis in such large datasets is nevertheless time-consuming and resource-intensive. Researchers are consequently recognizing the viability of computational text analysis (Wiedemann, 2016), especially when compared to manual methods (Syyrilä et al., 2021).
Computational text analysis, or the use of computer-assisted techniques in analyzing textual data, is often referred to as text mining. Ignatow and Mihalcea (2017) described text mining as a process of extracting consequential patterns from unstructured texts using techniques commonly applied in natural language processing, machine learning, information retrieval, sentiment analysis, named-entity recognition, concordance analysis, and computational linguistics. Examples of such techniques include keyword extraction, topic modeling, and concordance identification. Unlike textual analysis, text mining leverages the power of computational methods to automatically analyze large volumes of text data. For instance, a study conducted by Garcia (2020) analyzed 65,396 tweets related to the COVID-19 pandemic using sentiment and emotion analyses. The study highlights how text mining techniques can be used to quickly and accurately analyze large volumes of textual data that would be difficult, if not impossible, to scrutinize manually. Unsurprisingly, many researchers are starting to support various manual approaches to text analysis with text mining techniques (Macanovic, 2022; Wiedemann, 2016). Other researchers even explored the compatibility of using these text analysis approaches concomitantly (Inaba & Kakai, 2019; Muller et al., 2016; Yu et al., 2011).
Like other websites that are rich in texts (e.g., social media platforms), online learning environments also generate large volumes of textual data that can be analyzed using text mining techniques (Wahyono et al., 2021). These environments provide a variety of text-based communication channels (e.g., messaging systems and discussion forums) and allow students to produce textual data in the form of their course submissions (e.g., essays and assignments). Despite the potential of these platforms, researchers often tend to rely on social media networks as the source of data for text mining when analyzing phenomena related to online education (e.g., Aydin, 2021; Bozkurt, 2021; Zhou & Mou, 2022). By solely relying on social media data, researchers may be missing out on valuable insights that can only be gleaned from data generated within online learning platforms. For instance, the shift to online learning modality during the COVID-19 pandemic, also known as emergency remote learning (ERL), resulted in a significant increase in the volume of textual data generated within these environments (Fung et al., 2022). This scenario exemplifies how online learning environments can provide a fertile ground for applying text mining techniques to extract valuable educational insights and enhance learning experiences. Unfortunately, Bond (2020) reported that online surveys were the primary data collection tools, and quantitative nonexperimental designs were the main research methods used during the implementation of ERL. The application of text mining techniques in analyzing large qualitative data within these learning environments could have potentially extracted valuable insights for improving the overall learning experience of students in times of crisis.
This study explores the application of text mining techniques to analyze large-scale textual data generated in online learning environments. Given the substantial volume of data produced on these platforms during the COVID-19 pandemic, the period of ERL implementation presents an optimal source for such an analysis. From 2020 to 2022, our students participated in a series of cross-sectional surveys containing quantitative and free-form questions regarding their ERL experience. The findings from the quantitative data analysis have already been presented in a separate study (Garcia & Revano, 2022), allowing this research to focus specifically on the qualitative aspects of the data. With this pseudo-longitudinal dataset, not only is there an opportunity to demonstrate the capabilities of text mining techniques, but the study also addresses the limitation in online learning literature of not using data from different time points (e.g., Mirahmadizadeh et al., 2020; Syyrilä et al., 2021). Additionally, it expands the relatively scant body of research that has been conducted on the analysis of free-form comments in comparison to quantitative responses in student surveys (Alhija & Fresko, 2009; Brockx et al., 2012). This expansion is vital since this deficiency leaves a significant gap in our understanding of online learning environments and experiences, particularly as qualitative data are a rich but underutilized source of feedback crucial for the quality assessment of educational processes and outcomes (Gakhal & Wilson, 2019; Hoon et al., 2015). Given the importance of learning environments in shaping educational outcomes (Cayubit, 2022), understanding these settings is essential. Insights derived from their large-scale qualitative data, which is often underanalyzed, can inform more effective educational strategies and interventions. Therefore, this study will utilize text mining techniques to answer the following research questions (RQ):
- Using word frequency and collocation analyses, what are the most frequent unigrams and bigrams in student feedback on their ERL experiences?
- Using Key-Word-In-Context (KWIC) concordance analysis, what are the underlying meanings behind the most frequently occurring words in the corpus?
- Using latent Dirichlet allocation (LDA) topic modeling, what are the emergent topics from students' lived experiences and coping strategies?
- Using lexicon-based sentiment analysis, what is the general polarity of free-form responses generated in and extracted from an ERL environment?
Background of the Study
Online Learning Environments as a Data Goldmine
A learning environment refers to the setting or context in which education occurs, encompassing physical classrooms, digital platforms, and hybrid systems that blend both. In educational research, this domain is a critical area of focus because it significantly influences learner engagement, cognitive development, and educational outcomes (Cayubit, 2022; Rusticus et al., 2023). Theories of educational psychology, such as constructivism, emphasize the importance of the environment in shaping how students interact with content and each other, as well as how it affects their learning processes. With the increasing popularity of digital platforms during the COVID-19 pandemic, online learning environments have become a primary area of interest for educational researchers (Garcia, 2022; Ong & Quek, 2023; Salta et al., 2022). Driven by the necessity to continue educational operations remotely (Lin & Yeh, 2022), schools at various levels have adopted these platforms rapidly. This widespread adoption has made online learning environments a rich source of diverse data types, including, but not limited to, real-time interaction data, learning analytics, and feedback mechanisms. These data are collected through various activities such as gamification (Valderama et al., 2022), video lectures (Garcia & Yousef, 2023), interactive simulations (Ari et al., 2022), virtual classrooms (Islam et al., 2023), and more (Hadad et al., 2024). Each activity provides a detailed picture of student learning behaviors and engagement patterns. Therefore, these environments are goldmines for conducting empirical studies and testing educational theories in situ. The ability to gather and analyze such expansive data sets allows for a more comprehensive understanding of the effectiveness of various online teaching methods as well as student experience within virtual learning environments.
Student Experience in Emergency Remote Learning
The abrupt transition to online learning has had a considerable impact on the educational landscape, establishing ERL as a critical subject for scholarly investigation. This shift has unveiled unique challenges and opportunities, drawing research interest toward the impact on student interaction outside the traditional classroom environment. As educational institutions navigated this transition, they continually gathered feedback from their students. According to Aldridge and Bianchet (2022), leveraging student feedback about their learning environments is crucial for enabling students to participate actively in shaping and enhancing their educational settings. With these initiatives, there is a wealth of unstructured textual data available that can be effectively analyzed using text mining techniques. Unfortunately, analyses of student experiences during ERL have predominantly employed quantitative research methods, as evidenced by a systematic review (Bond, 2020). Wong (2015) asserted that although quantitative data provides a measurement of reality, qualitative data in the form of students' own words provides valuable insight into the "why" of their lived educational experiences (Tremblay et al., 2021; Vindrola-Padros et al., 2020). Other researchers have thereby proposed a more comprehensive approach to investigating student experience. For example, Klemenčič and Chirikov (2015) emphasized qualitative research methods as a viable tool for disentangling the complexities of the student experience in higher education. These methodologies provide authentic and trustworthy accounts of real-life experiences, the circumstances in which they occur, and the significance they hold for students that cannot be easily comprehended from numerical data (Gakhal & Wilson, 2019; Peñarrubia-Lozano et al., 2021; Rahman, 2017). Even in the causal explanation of events where a dismissal of qualitative research has been particularly virulent, Maxwell (2012) argued that it could make an essential contribution to causal inquiry in education by capturing the richness of individual experiences, interpretations, and meanings. Nevertheless, Klemenčič and Chirikov (2015) noted that qualitative research can be labor-intensive as it requires significant time and effort for both data collection and analysis. Exploration of digital approaches for qualitative data analysis is warranted to accommodate large datasets (Syyrilä et al., 2021; Wiedemann, 2016).
Text Mining Techniques for Qualitative Data Analysis
In qualitative research, key challenges such as efficiency and accuracy arise when analyzing large amounts of unstructured textual data. Traditional manual analysis methods can be time-consuming and often involve subjective judgments by the researchers, leading to potential bias in the analysis (Galdas, 2017; Klemenčič & Chirikov, 2015). Fortunately, with the rapid increase in digital content and the development of advanced computational techniques, text mining has emerged as a promising approach to tackle these challenges (Garcia & Cunanan-Yabut, 2022; Macanovic, 2022). Text mining techniques provide a promising alternative by allowing researchers to process vast amounts of qualitative data efficiently and systematically. This approach also offers several advantages over manual analysis methods, including greater efficiency, reduced researcher bias, and the ability to identify patterns, themes, and relationships that may not be immediately apparent in traditional methods (Ignatow & Mihalcea, 2017). Many researchers (e.g., Syyrilä et al., 2021; Wiedemann, 2016) have therefore underscored computational text analysis as a valuable tool for identifying patterns, themes, and relationships that may not be immediately apparent through manual analysis.
In the analysis of textual data, scholarly literature recommends various techniques. First, frequency-sorted word lists (i.e., word frequency) have been an essential process in computational text analysis due to their valuable representation of meaning for various purposes, such as text categorization and information retrieval. Word frequency refers to the number of times a word appears within a corpus of texts. Nonetheless, relying solely on this approach does not provide insight into the context in which a given term is utilized within the text (Ignatow & Mihalcea, 2017). Other studies have suggested collocation analysis as a complementary technique to word frequency as it can identify significant associations between words and reveal language use patterns (Cordeiro, 2019). Collocation analysis is a process of identifying the co-occurrence of specific words based on the idea that those that commonly co-occur have a certain level of semantic relatedness. Meanwhile, O'Donnell (2008) recommends KWIC concordance analysis to facilitate the detection of lexicogrammatical patterns. This text mining technique presents a list of all occurrences of a specific word or phrase, accompanied by the words and phrases that appear before and after it. The computational method of topic modeling is also recommended for the identification of word usage patterns and underlying themes by analyzing the semantic structure of the text. This unsupervised machine learning technique has been used in the analysis of different corpora, such as online reviews (Kwon et al., 2021) and political speeches (Miranda & Bringula, 2021). In the comparative analysis of topic modeling methods, Albalawi et al. (2020) discovered that LDA is one of the two methods that generate the most valuable outputs with diverse ranges and meanings. Sentiment analysis may also be leveraged to extract more human elements from a text corpus (Garcia, 2020). Although each text mining technique has its unique strengths and benefits, their combination has the potential to enhance the quality and depth of the analysis, leading to a more comprehensive understanding of the results.
Methods
Research Design
Within the field of social science text mining research, Ignatow and Mihalcea (2017) provided a framework that outlines five key design decisions for researchers to consider: research type, level of analysis, mode of analysis, data selection, and inferential logic. First, this study combined the philosophy of idiographic and nomothetic approaches for the research type. The idiographic approach was utilized to acquire an in-depth understanding of the unique experiences of students in the ERL environment. Concurrently, the nomothetic approach was also used to identify common patterns and themes within the dataset. The combination of idiographic and nomothetic approaches allowed for comprehensive data analysis (Beltz et al., 2016). Second, the sociological level of analysis was selected in recognition of the critical role played by the social spaces in which the texts were produced. This decision was made to illuminate the connections between the textual data and the societal context in which they were situated (Nguyen et al., 2020). As noted by Ignatow and Mihalcea (2017), analyzing texts as social information provides valid and relevant insights into social reality. Third, a mixed-methods research design was adopted as the mode of analysis since the research questions necessitated both computational linguistics and qualitative data analysis techniques. According to Creswell (2014), most text mining research projects are best comprehended as mixed methods. Fourth, pre-existing survey data on student experience during ERL was strategically selected to avoid incurring additional time and cost. Using existing data is a commonly observed practice within the domain of text mining research (e.g., Alsayat & Ahmadi, 2022). Finally, abduction was selected as the inferential logic as it allows to generate and test explanations or hypotheses using both qualitative and quantitative data, which aligns with the research questions and the chosen mixed-methods research design.
Data Collection and Procedure
The data were collected from cross-sectional surveys conducted at the end of every trimester in a Philippine higher education institution. These surveys focused on student experiences during ERL and have been conducted from the academic year 2020-2021 until 2022. The survey questionnaire was distributed via the institution's online learning platform as a course assignment. It included both quantitative and qualitative items revolving around their challenges, lived experiences, and coping strategies. The results of the quantitative data analysis were previously reported in a separate study (Garcia & Revano, 2022). For the qualitative section, the open-ended questions asked were: (1) "What challenges did you encounter during online learning in the pandemic?" (2) "How would you describe your overall experience with online learning?" and (3) "What strategies did you use to cope with these challenges?" Only free-form responses were extracted, anonymized, and exported from a database to a text file (original text file). Contents of the original text file were then copied, pre-processed, and saved as another text file (cleaned text file). Data pre-processing methods (e.g., text cleaning, lemmatization, and tokenization) were adopted from the studies of Garcia (2020) and Bringula et al. (2022), depending on the specific text mining technique employed. Finally, the original text file was used to report complete feedback on the paper, while the cleaned text file was used for text mining.
Data Analysis and Techniques
The data analysis techniques employed in this study are tailored to the research questions. For RQ1, the most relevant terms were extracted using word frequency analysis. In linguistics, the frequencies of words are used to examine how people communicate (Lijffijt et al., 2011). The statistical measure term frequency-inverse document frequency was used to evaluate how relevant a particular word is in the text corpus. Distinctive keywords from each question were also identified and visually presented using word clouds. Meanwhile, words that were used in different contexts (i.e., words sense disambiguation) were noted for further analysis (e.g., "I need support to learn the lesson" and "My parents need support for me to continue college"). Collocation, or words that commonly co-occur, was also performed to identify hidden semantic structures by counting bigrams as one word. As a linguistic concept, collocation is considered a crucial component of semantic analysis (Barnbrook et al., 2013). For RQ2, the word context of the most frequent unigrams was examined using 15-word concordances (left and right) of the keyword of interest. By looking at the surrounding words, it becomes easier to interpret the theme of a search term in a corpus. KWIC concordance analysis can also assist in detecting lexicogrammatical patterns (O'Donnell, 2008). For RQ3, LDA was used to explore the inferred topics and themes further. LDA is a topic modeling technique that uncovers broader underlying themes and patterns within the entire corpus of qualitative data. Evidence suggests that LDA is superior to other statistical topic models in text mining (Liu et al., 2011). This analysis was conducted twice to cover both lived experiences and coping strategies. For RQ4, a sentiment analysis using a lexicon-based approach was performed to measure the polarity behind the textual data. According to Garcia and Cunanan-Yabut (2022), sentiment analysis can capture an overview of public opinion on various topics. The entire data analysis was carried out using R programming.
Results
Of the 1,246 students who participated in the surveys, 978 textual responses (78.49%) were extracted and analyzed. These textual responses on online experience and coping strategies comprised 125,636 words (128.46 words per student and 26.2 words per sentence) and 95,267 words (97.41 words per student and 21.7 words per sentence), respectively. In terms of language, most students responded using English except for the 94 instances of 24 Filipino words (e.g., "mabagal" or slow, "trabaho" or employment, "pera" or money).
RQ1: Using word frequency and collocation analyses, what are the most frequent unigrams and bigrams in student feedback on their ERL experiences?
Table 1 presents the ten most frequent words in the corpus. Interestingly, "hard" was the most frequent word, indicating the challenging situation students faced during ERL. As has been recently reported by Treceñe (2022), students faced difficulties (e.g., adapting to the new learning mode) when attending online classes. Overall, these results suggest that students' feedback on their ERL experiences was characterized by a focus on a variety of topics, including the challenges and difficulties they encountered, the importance of technology and online resources, concerns about health and well-being, and the impact of ERL on personal relationships.
| Rank | Word | Frequency | Sample Feedback |
|---|---|---|---|
| 1 | hard | 689 | Sometimes I have a hard time catching up to lessons especially with many workloads stacking up. |
| 2 | internet | 654 | Very worrisome and stressful -- there's a lot of inconsistencies and you'd always worry about your Internet, deadlines and such. |
| 3 | health | 591 | I've found myself at the end of the line a few times, my mental health got worse and I've experienced breakdowns more often. |
| 4 | work | 457 | Family business got affected and my mother lost her work because of cost cutting it was hugely an adjustment for us because tuition fees were still expensive as usual. |
| 5 | sleep | 431 | While I get to sleep often and do my usual routine which is play games, attend class during school weeks and all that- I feel very pressured and anxious 24/7 because of online classes itself. |
| 6 | family | 425 | Due to this pandemic, my mental well-being is gradually declining due to shattered goals, plans, and separation from friends, family and other important people in my life. |
| 7 | convenience | 399 | At first, I was excited to experience online learning in the comfort of our own homes, its convenience and learning on my own pace, but over time it got stale, boring, and repetitive for me. |
| 8 | safe | 376 | The long quarantine greatly demotivated me in more ways than one but at least me and my family are safe. |
| 9 | friends | 326 | I have less interaction with people but I cope by playing computer games with friends and starting finding ways to help me grow and learn new things. |
| 10 | anxiety | 311 | Life during COVID is personally similar to my situation whenever I'm at home. Though it did get more difficult to stay positive with the experiences of loss and anxiety during the pandemic. |
Coincidentally, the first five most frequent words in the corpus are the same as the most frequent distinctive words in lived experiences (i.e., hard, Internet, health, work, and sleep). This result suggests that students commonly encounter challenges related to work, technology, and their physical and mental health in an ERL environment. In contrast, the five most frequent distinctive words in coping strategies were games, exercise, Facebook, sleep, and work. This result suggests that students engage in leisure activities and social media use. Meanwhile, "sleep" and "help" were the most frequent words that occurred in both sections. The most frequent distinctive words for both lived experiences and coping strategies are illustrated in Figure 1.
In addition to unigrams, the most frequent bigrams were also identified using collocation analysis. According to the result (See Figure 2 for the text cloud), the most frequent two-word combinations were mental health (e.g., "Due to this pandemic, my mental health is gradually declining due to shattered goals, plans, and separation from friends, family, and other important people in my life"), internet connection (e.g., "there's a lot of inconsistencies and you'd always worry about your internet connection, deadlines and such"), financial support (e.g., "I'm torn between thinking of wanting to graduate as soon as I can to support my mom who is our main financial support and thinking of just stopping wasting too much money"), new normal (e.g., "I have realized that with all this time alone, I can focus on improving myself by choosing to be motivated to learn and adapt to this new normal"), and tuition fee (e.g., "with the pandemic and job constraints, the 60k tuition fee is very very tight and heavy on my parents."). Only the words "health" and "internet" from the top unigrams appeared in the top bigram list. This finding suggests that "mental health" and "internet connection" are important topics for students in terms of their ERL experiences.
RQ2: Using Key-Word-In-Context (KWIC) concordance analysis, what are the underlying meanings behind the most frequently occurring words in the corpus?
In search of a nuanced insight into the meanings behind the most frequently used words, KWIC concordance analysis was performed on the top five unigrams. The results indicated that the word "life" was more frequently paired with "hard" (56%) than with "school" (27%). This finding suggests that students may be more concerned with their overall life situation and circumstances than specifically with their educational experiences (e.g., "My life during this pandemic has not been the easiest and it is hard because I feel anxious" [Student 25]). This observation may provide a possible explanation for the observed decline in academic performance among students during the pandemic (Engzell et al., 2021). Students exhibited greater concern for daily survival than their academic studies (e.g., "Our status in life is now different because we need to earn more money plus it is hard catching up to lessons especially with many workloads stacking up" [Student 225]). Excessive workload intensified this dilemma since students' ability to manage time was already a recurring problem when learning online even before the pandemic, as confirmed by a qualitative research synthesis (Blackmon & Major, 2012).
Further insights into the reasons for this disparate impact on students can be gleaned from the subsequent unigrams in the list. Notably, the term "slow" was mentioned in 89% of occurrences of the term "internet," suggesting that bandwidth and connectivity issues may be significant challenges for students (e.g., "Our internet is too slow so I cannot participate during online class" [Student 66]). This finding is critical because the internet connectivity experience directly influences students' behavioral intention to use online learning. Poor internet access could lead to reduced opportunities to participate actively in online classes and less time to complete online activities. Consequently, researchers have suggested that educational institutions should adjust their online delivery methods in response to these technological constraints that many students face (Cullinan et al., 2021). Another possible explanation for the challenges faced by students during ERL relates to their well-being, with the term "mental" being mentioned in 95% of occurrences of the word "health" (e.g., "Mental health is deteriorating day by day but still trying to survive, it's hard but this won't last forever so I try to just go with the flow." [Student 531]). Furthermore, "mental health" emerged as the most frequent bigram in the corpus. In contrast, the word "physical" was only mentioned in 25% of occurrences of the term "health." These findings suggest that mental health may be a more salient issue for students in ERL than physical health (e.g., "I hope everyone values everyone's mental health because it's as important as our physical." [Student 243]). Researchers such as Lai et al. (2020), Yang et al. (2021), and Lee et al. (2021) noted the adverse impacts of ERL on students due to a heavy academic workload and a sense of disconnection from the school community, among others.
Regarding the term "work," it was observed that it was used in two distinct contexts, namely school deliverables and employment. Among the top unigrams, the words "group" (e.g., "still hard doing group work online when you do not know what the groupmates are dealing with" [Student 229]), "school" (e.g., "All of my work consists of doing it on the computer and not on the school itself" [Student 428]), and "activity" (e.g., "Being in a physical classroom makes me only focus on my work which is not case with activity via online" [Student 85]) were mentioned in 52%, 41%, and 32% of the occurrences of "work," respectively. Meanwhile, the unigrams "freelance" (e.g., "I realize that even though I'm still a student, I should think or plan for another sidelines like part time jobs or freelance work" [Student 711]) and "job" (e.g., "Family business got affected and my mother lost her job so she looked for extra work" [Student 391]) were identified in 42% and 67% of occurrences of the term "work," respectively. In addition to the previously established issue of academic workload, these findings highlight the adverse economic effects of the pandemic on students' and their parents' labor market participation. As a result, many students have experienced the loss of internships, jobs, or job offers, and those students from lower-income backgrounds have been more likely to postpone graduation than their higher-income peers (Aucejo et al., 2020). The subsequent unigram in the list, "sleep," may have been influenced by the term "work," as students mentioned the term "quality" in 76% of occurrences (e.g., "My sleeping schedule is off, and I can say that I do not get a quality sleep anymore" [Student 236]). This finding highlights the prevalence of poor sleep quality among students during the pandemic, which is a critical issue as sleep quality has been shown to mediate the relationship between perceived stress and dietary behaviors (Du et al., 2021). Overall, these insights have important implications for educators and institutions seeking to support students in the transition to ERL.
RQ3: Using latent Dirichlet allocation (LDA) topic modeling, what are the emergent topics from students' lived experiences and coping strategies?
Upon investigation, it became apparent that certain words were utilized in varying contexts within the corpus. For instance, the unigram "support" was used to describe both financial assistance (e.g., "I'm torn between thinking of wanting to graduate as soon as I can to support my mom who is our main financial support and thinking of just stopping wasting too much money" [Student 623]) and educational guidance (e.g., "I would say that the support from my teachers helped me tremendously when it comes to learning while being in the comfort of my home and time" [Student 125]). LDA topic modeling was employed to identify emergent topics. Additionally, due to the distinctiveness of the lived experiences keywords identified through KWIC analysis, another analysis was conducted to cover the coping strategies as well.
| Topic | Words | Sample Sentences | Label |
|---|---|---|---|
| 1 | internet, wi-fi, computer, poor, connection, access, data, mobile, stable, issue |
|
Technological Factors |
| 2 | class, teacher, school, attend, project, time, instructions, hard, subject, good |
|
Academic Factors |
| 3 | hard, stress, anxiety, mental, lack, health, fear, survive, cope, physical |
|
Individual Factors |
Results from the LDA analysis indicate that the challenges experienced by students during the ERL implementation can be summarized into three distinct themes. As shown in Table 2, the topics revolved around technological factors (e.g., unreliable internet access and lack of devices), academic factors (e.g., heavy workload and request for academic calendar cancellation), and individual factors (e.g., isolation, mental health difficulties, financial distress within the household, and need for a part-time job). These outcomes align with the contextual interpretations obtained from the KWIC analysis, which supports their validity. These findings demonstrate that some of the challenges experienced by students were not necessarily a direct result of ERL but instead emerged due to the pandemic situation. Therefore, educational leaders can still leverage ERL as a platform to offer academic support programs that help students overcome these challenges.
| Topic | Words | Sample Sentences | Label |
|---|---|---|---|
| 1 | games, facebook, netflix, boring, tutorial, interaction, playstation, movies, youtube, fun |
|
Entertainment Strategies |
| 2 | family, help, friends, support, relationship, work, busy, fun, home, parents |
|
Relationship Strategies |
| 3 | exercise, busy, mental, videos, sleep, work, safe, health, physical, active |
|
Health Strategies |
In the context of coping strategies (see Table 3), the LDA analysis revealed three inferred topics: entertainment strategies, relationship strategies, and health strategies. Entertainment strategies include playing video games and watching movies; relationship strategies include connecting to friends via online platforms and getting more involved with the family; and health strategies include physical exercise and getting quality sleep. According to recent studies, these coping mechanisms are valid and effective strategies. For instance, Barr and Copeland-Stewart (2021) reported that playing video games during the pandemic has been associated with a positive impact on the players' perceived well-being. This effect is attributed to video games providing an enjoyable means of stress relief, maintaining social contact, and offering a mentally stimulating escape from the impact of lockdown. Meanwhile, Boursier et al. (2021) observed that watching television series can act as a recuperation tactic for managing emotional turmoil by enabling individuals to seek momentary refuge in imaginary worlds. Consequently, students were able to devise their coping strategies to endure the pandemic, independent of any educational establishment's influence. Nevertheless, despite the effectiveness of these coping mechanisms (Garcia & Revano, 2022), it is recommended that educational institutions design interventions and academic support initiatives that can more effectively assist students, particularly with respect to the psychoeducational ramifications of the pandemic. Such interventions can facilitate students' adaptation to the "new normal" and help them build resilience and coping skills that can serve them beyond the pandemic (Almeida, 2023; Lamsal, 2022; Tomé & Coelho, 2023).
RQ4: Using lexicon-based sentiment analysis, what is the general polarity of free-form responses generated in and extracted from an ERL environment?
In contrast to earlier research that reported positive sentiments (Lau & Sim, 2020) and perceptions (Khan et al., 2021), the results of the sentiment analysis in the present study revealed a predominantly negative experience among students concerning the ERL setup. The findings suggest that students are not satisfied with their experience of ERL despite the efforts made by educators and institutions to enhance their online learning environment (e.g., Ofosu-Ampong et al., 2024). The negative sentiment may be attributed to various factors, such as the lack of face-to-face interaction, limited socialization opportunities, and technical difficulties encountered during the learning process (Guillaume et al., 2022; Leahy et al., 2021). In addition, a chi-square test revealed that there is an unequal number of positive and negative sentiments in the text corpus (χ 2(1) = 37.71), which is unlikely to have occurred from a sampling error (p < .05). This finding suggests that the prevalence of negative sentiments in the text corpus is not merely a chance occurrence, but rather reflects a genuine trend in the students' experiences of ERL. Moreover, the significance level of p < .05 suggests that this trend is statistically meaningful and not likely to be a spurious result. Table 4 shows some of the most frequently occurring words and sample sentences from the text corpus with positive and negative sentiments.
| Word | Positive | Negative |
|---|---|---|
| family | "The good thing about pandemic for me is I get to spend more time with my family and do fun stuff together." | "It is depressingly sad and feel lonely even though I have my family with me more frequently than before." |
| exercise | "I really hate the lockdown part of the pandemic since all gyms are closed and I cannot concentrate to exercise at home even before" | "What I hate about this pandemic is the lockdown because I cannot go outside to exercise, socialize, and go to school." |
| sleep | "I sleep longer now because our online class is flexible and many of our activities are asynchronous which is a good thing for me." | "My life became unstable because of this pandemic, I cannot sleep anymore properly because of many assignments." |
| help | "I have more time now to play computer games with friends and even find ways to help me grow and learn new things." | "Even as a privileged Filipino, I cannot help but worry about me and my family's well-being and the fear of having the virus." |
| "Good thing facebook was invented before pandemic because even though there is a lockdown, I can still contact my friends and talk to my family from other places." | "I am demotivated to study these days and I see myself scrolling the news feed of my facebook account for hours every day without doing any assignments." | |
| safe | "Online class is the only solution I believe to continue education while staying safe as well in the comfort of our homes." | "I fear going out because it is not safe just to even breathe the same air with other people so I isolate myself in my room" |
Discussion
The advent of the Internet, social media networks, and other digital communication channels has resulted in the proliferation of substantial volumes of textual data. Despite the established effectiveness of traditional textual analysis techniques, it has become increasingly impractical to analyze such large datasets due to the necessitated time and resources (Galdas, 2017; Klemenčič & Chirikov, 2015). As a viable alternative to traditional methods, text mining is increasingly being acknowledged as an effective methodology for analyzing large volumes of textual data (Macanovic, 2022; Wiedemann, 2016). Like other textually rich websites, online learning platforms also contain significant amounts of textual content. The sudden transition of instructional delivery to ERL during the COVID-19 pandemic has intensified this trend of increased textual data production. Analyzing these data is an opportunity to prove the viability of text mining techniques in the education discipline. The present study thus examined data extracted from an ERL environment using text mining techniques, such as word frequency, collocation, KWIC concordance, topic modeling, and sentiment analysis. Assessing online learning environments provides a valuable opportunity to explore text mining as an educational evaluation methodology because these platforms are repositories of extensive and varied textual data.
According to the findings, the most frequent word in the text corpus was "hard", with a total of 689 instances constituting 70.45% of all occurrences. Interestingly, the word "life" was observed more frequently than "school," which implies that individuals experience personal difficulties more acutely than academic challenges. The analysis of text data indicated that mental health is a significant issue among students in ERL, as evidenced by the prevalence of this topic in top bigrams. Other researchers such as Lai et al. (2020), Yang et al. (2021), and Lee et al. (2021) have similarly raised pandemic-induced mental health concerns, such as stress, depression, and anxiety. More importantly, Tee et al. (2020) reported that students experience more severe psychological impacts and exhibit greater symptoms of mental health problems compared to those who are employed. Numerous factors have been proposed to account for poor mental health among students, including difficulties with maintaining focus during online classes, a rigorous academic curriculum, the demand for academic success, the physical distance from educational institutions, and financial constraints related to educational costs. Given the complex and varied nature of these challenges, educational institutions should undertake measures to address the pressing mental health crisis that is affecting students.
When it comes to academic challenges, the text mining analysis unveiled the heavy academic workload as a major issue. This conclusion was substantiated by the frequency of the word "work" in 46.73% of the text corpus, making it the fourth most frequent unigram. Conversely, Khalil et al. (2020) and Barrot et al. (2021) identified the greatest academic-related challenges for students to overcome as content understanding and learning environment, respectively. A plausible justification for the prevalence of excessive workload during the pandemic could be the insufficient pedagogical skills and training of educators for conducting purely online instruction (Jain et al., 2020). Despite their digital literacy skills, teachers may lack the necessary competence to effectively teach online courses. As such, teachers tend to provide more supplementary materials to support learning goals at home and, in some cases, additional assessments (e.g., assignments, reports, case studies) to measure the attainment of these goals. Although teachers express a willingness to minimize the workload of students (Lepp et al., 2021), online learning necessitates increased effort, energy, and responsibility from students. For example, the intensive use of synchronous videoconferencing platforms has led to the emergence of "Zoom fatigue" – a phenomenon that is more draining than face-to-face meetings due to the heightened requirement for sustained attention (Spataro, 2020). The management implication of this result suggests that educational institutions must strategize on how to alleviate the burden on students and ensure a conducive academic environment in times of crisis. It is also imperative that students are provided with a manageable workload and are instructed by teachers who are more understanding towards their students' current educational and mental conditions.
Consistent with recent discoveries (e.g., Cullinan et al., 2021), the present study similarly exhibited technological barriers impeding the effective implementation of ERL despite the digital revolution of the past few decades. However, contrary to the results of a systematic review conducted by Rasheed et al. (2020) that identified technology use as the primary challenge, the findings of this study highlight inadequate internet access as the primary impediment. This finding is supported by the KWIC analysis showing that the word occurrence of "internet" in 66.86% (second most frequent word) of the text corpus also mentioned the word "slow" in its 89% occurrences (N = 582). The issue of inadequate access to quality internet service that is observed in this study is analogous to that experienced by other developing nations (Khan et al., 2021; Mpungose, 2020; Zalat et al., 2021). This finding suggests that the first-level digital divide (e.g., access to the Internet) is a more pressing problem than the second-level digital divide (e.g., technology use). For developed countries, there is evidence demonstrating comparable difficulties, particularly for students from low socioeconomic status backgrounds and rural regions (Peñarrubia-Lozano et al., 2021). Given that a student's intention to use learning management systems is directly impacted by their experience with internet connectivity, the importance of establishing a dependable network infrastructure when relying exclusively on ERL as an instructional delivery platform is consequently emphasized. In the absence of such reliable infrastructure, alternative modalities such as radio and television must be considered.
Overall, the application of text mining techniques has profoundly enhanced our understanding of the impact of the sudden shift of instructional delivery from traditional to online modes of teaching. This transition placed many students at an extreme disadvantage, a conclusion reflected in the language they used in their free-form responses. Through the employment of various text mining methods, significant linguistic patterns were uncovered that highlighted the challenges faced by students in this new educational environment. Specifically, the most frequent unigrams and bigrams identified by word frequency and collocation analyses were strongly indicative of these challenges. The examination of contextual snippets of the most frequently used words using a KWIC concordance analysis presented a more nuanced understanding of language use. This analysis also facilitated the identification of patterns and relationships between words that may not be discernible through word frequency and collocation analyses. It also demonstrated the existence of certain words that were used in different contexts. The process of topic modeling helped identify the underlying thematic structure of the corpus and provided insight into the key topics that emerged from the data. From a macro perspective, sentiment analysis uncovered the overall polarity of the corpus. Collectively, the combination of these text mining techniques allowed for a comprehensive exploration of the linguistic features of the corpus and provided a multifaceted understanding of the lived experience and coping strategies of students. Consequently, this study endorses the use of text mining as an effective methodology for analyzing large volumes of textual data and understanding complex educational phenomena.
While the COVID-19 pandemic and the transition to ERL provided a context for this research, the focus of the study is not solely on these events. Instead, the primary objective is to demonstrate the potential of text mining as a method for analyzing large volumes of textual data—a task that traditional qualitative techniques may find challenging. As repeatedly pointed out in many research articles (Galdas, 2017; Klemenčič & Chirikov, 2015), a traditional qualitative analysis is often manual and time-consuming and requires iterative processes that can be impractical for handling large datasets. Conversely, text mining automates much of the data processing, enabling researchers to efficiently analyze vast quantities of text to identify trends, patterns, and sentiments that would be difficult to discern manually. This study contributes to the growing body of evidence suggesting that text mining approaches can support and, in some aspects, enhance the capabilities of a traditional qualitative data analysis (Janasik et al., 2008). By proving that text mining can address complex social science research questions effectively, this study reinforces the method's validity and applicability in academic and practical settings.
Despite the successful analyses using text mining techniques, there are limitations to the study that should be acknowledged. Primarily, while text mining provided significant insights into the language used by students, it did not explore the reasons behind the use of specific language. This limitation restricts our understanding of the surface level of textual expressions and omits deeper explorations of the underlying motivations, attitudes, and beliefs that shape these expressions. From the educational research perspective, relying solely on written text as the source of data can overlook critical aspects of the learning experience, such as verbal communication, nonverbal cues, and social interactions, which are often vital in fully understanding student engagement and learning dynamics. Future research could benefit from exploring the use of text mining techniques in combination with other data sources (e.g., video or audio recordings). Finally, the study was limited to text mining techniques because it was the scope of the study. Future researchers are encouraged to consider using text mining techniques in combination with other textual analysis methods. Comparing the strengths and limitations of different approaches may identify the most appropriate methods of data analysis.
Conclusion
This study aimed to test the efficacy of text mining as an educational evaluation methodology by analyzing student experience within online learning environments. It examined free-form comments from a series of cross-sectional surveys to understand how effectively text mining techniques can uncover insights from large volumes of textual data. Through the application of these techniques, the analyses revealed "hard" and "mental health" as the most frequent unigram and bigram, respectively. KWIC analysis was utilized to uncover the contexts and meanings behind these prevalent words. Additionally, topic modeling identified themes such as individual, academic, and technological challenges related to the learning experiences, alongside themes of entertainment, relationships, and health in coping strategies. The analysis also detected a predominately negative sentiment toward the experiences within these online environments. Unlike the limited insights typically derived from quantitative and traditional qualitative methods, the findings of this study underscore the depth and uniqueness of information that can be extracted through computational text analysis. Furthermore, this research provides empirical evidence of the effectiveness of text mining techniques in leveraging computational methods to achieve a comprehensive understanding of complex phenomena within large-scale datasets. This approach not only enhances our comprehension of student experiences but also enriches the methodologies available for educational research.