Introduction

Ready or not, educational institutions were forced to shift from traditional face-to-face to online instruction during the lockdown period of the COVID-19 pandemic (Adedoyin & Soykan, 2020; Mishra et al., 2020). Fortunately, there has been a positive perception and acceptance of online learning systems before (Fidalgo et al., 2020; Garcia, 2017) and during (Amir et al., 2020; Das et al., 2021; Khan et al., 2021) this global health crisis. Notwithstanding, challenges for teaching online courses persist. In a synthesized literature review, Kebritchi et al. (2017) identified these challenges into major topical themes, such as issues related to learners (e.g., expectations and readiness), instructors (e.g., the transition from face-to-face to online and teaching styles), and content development (e.g., the role of instructional strategies and integration of multimedia in content). Pedagogical patterns employed in face-to-face lessons also require revisions to accommodate the learning requirements in a virtual classroom (Ferri et al., 2020). Hence, there is a necessity to develop new multimedia learning materials (pedagogical challenge), provide access to online learning infrastructure (technological challenge), and assist stay-at-home learners to have a conducive learning environment (social challenge). For late adopters of online education, the suddenness of this shift to emergency remote learning makes it difficult to respond to these challenges immediately. However, it has become the priority of the education sector to ensure that learning never stops (Mukhtar et al., 2020; Thomas & Rogers, 2020), thus the rapid relocation of all face-to-face courses to online learning systems.

As classroom venues transition from traditional to virtual, learning style becomes a critical factor because of its association with student success in distance education (Battalio, 2009; Zapalska & Brozik, 2006). Theorists affirmed that learning styles are a manifestation of individual differences in learning. Butler and Pinto-Zipp (2005) explored students’ learning styles for online instructional methods. Using Gregorc Learning Styles Delineator, the findings revealed that Concrete-Sequential (structured, predictable, practical, thorough) and Concrete-Random (original, intuitive, investigative) emerged as the most frequent single learning style. Further, online learners prefer instructional methods that emphasize convenience. This assertion is consistent with the three-year study by Cole et al. (2014), in which convenience was the most cited reason for student satisfaction in online instruction. They likewise prefer the asynchronous online learning style as it is unrestricted by time, place, or any other classroom constraints, thereby fulfilling the promise of learning “anytime and anywhere” (Shahabadi & Uplane, 2015). For comparison, synchronous online learning utilizes time-bounded activities and meetings where each student virtually participates in class depending on the schedule (Malik et al., 2017). This mode may not be pragmatic for students with technological (e.g., limited access due to gadget sharing, unreliable internet connection), domestic (e.g., need to work for extra income, financial distress within the household), institutional (e.g., excessive cognitive load and activities) and individual barriers (e.g., difficulty adjusting learning styles). As a solution, an asynchronous mode of content delivery has been proposed, especially for developing countries with multifactorial and interrelated challenges (Baticulon et al., 2021; Garcia, 2022).

When it comes to asynchronous online learning, Koutsabasis et al. (2011) recommended multimedia content such as videos of lectures, demonstrations, and examples. Educators used videos in various ways, including presenting situational challenges to encourage problem-solving, providing information in an engaging format, and producing supplementary material to academic content, to name a few (Malaina et al., 2018; Rasi & Poikela, 2016; Tsukuta et al., 2019). As it becomes an integrated fragment of traditional courses and a cornerstone of many blended courses, researchers have been exploring the formula for creating successful educational videos for learning purposes — subsequently referred to as video-based learning (VBL). For instance, Guo et al. (2014) launched the largest-scale study on how video production affects student engagement and uncovered influential factors, namely speaking rate and pre-production. Further, Bialowas and Steimel (2019) explored the ideal video length and discovered that short-form videos (roughly three minutes) could have more influence on student motivation and immediacy. On the other hand, Brame (2016) reviewed the literature to establish principles and guidelines to maximize student learning and arrived at various elements to consider, such as cognitive load, student engagement, and active learning. The delivery of lecture recordings and additional video materials has become a significant aspect of education since it allows for more flexibility in the teaching process and encourages self-directed and self-paced learning.

Despite numerous studies, VBL still has some challenges to be considered when designing asynchronous video lectures (subsequently referred to as video lectures). The disadvantage of video lectures relative to passive learning is that the lost direct contact with students does not promote active learner participation and collaborative learning (Yousef et al., 2015). In VBL and online learning, this is a significant point of inquiry to ascertain the validity and completion of the existing principles and guidelines on constructing video lectures for online instruction. The literature has also reported problems with video lectures. For example, the absence of teacher support causes students to have learning difficulties (Homer et al., 2008). Thus, teachers need to offer such support to promote student comprehension of video lectures. Chin-Yuan et al. (2020) reported that one method to encourage deeper understanding and thus student performance is the implementation of video annotations. Another fundamental concern with online education is the sense of isolation that threatens students’ ability to learn (Borup et al., 2012). To overcome this challenge, researchers recommended increasing the feeling of emotional connection in an online learning environment and balancing technology utilization with the human touch. Kizilcec et al. (2015) determined that a teacher’s talking head has the benefits of social and other nonverbal cues, which could assist students in focusing and feeling more connected. To measure the applicability of these proposed solutions for online instruction, we investigated the cognitive (i.e., learning performance) and affective (e.g., watching behavior, attitude, engagement, and satisfaction) effects of integrating annotations and talking heads in video lectures. Determining these effects may provide a basis for academic institutions, curriculum developers, instructional designers, and educators on maximizing the benefits of VBL in online education. The succeeding parts of the paper cover the theoretical underpinning of online instructional strategies, how the data were collected and analyzed, a discussion of the findings, and conclusions, implications, and recommendations.

Literature Review

Online Instructional Strategies and Asynchronous Learning

The promised benefits and efficacy of online instruction, from the convenience of online learning (student perspective) to the opportunity of offering additional courses (institutional perspective), are widely discussed in the literature. For example, a meta-analysis of online learning courses asserted its importance as a strategy to improve course access and flexibility in education institutions (Castro & Tumibay, 2021). However, Ferri et al. (2020) posited that pedagogical patterns and instructional strategies used in face-to-face instruction require amendments to acclimate the learning requirements in a virtual environment. Instructional strategies refer to methods and approaches that provide conditions under which learning goals and outcomes are accomplished. In light of the COVID-19 pandemic, Mahmood (2021) revisited various instructional strategies that deliver online education effectively, particularly in developing countries. One example of such an online instructional strategy is recorded video lectures to provide students with anytime-anyplace access to learning materials, which paves the way to an asynchronous learning mode (sometimes called self-paced learning).

For many years, the primary audience of online learning was students who purposely selected this mode and could establish a virtual learning environment. However, due to the global pandemic, students with little to no resources find themselves forced to adapt to this new type of learning (Garcia & Revano, 2022; Khusanov et al., 2022). The education sector must consequently rethink the most appropriate approach to implementing online teaching and learning. For instance, online courses customarily require synchronous web conferences where teachers and students mandatorily meet in a virtual space according to given schedules. In the case of working students trying to survive the pandemic, it is nearly impossible to attend synchronous online courses (Aristovnik et al., 2020). Conversely, students staying at home may have additional household responsibilities to support parents working tirelessly to provide financially during the pandemic. These barriers, and the added distractions from family members (e.g., younger siblings), may reduce the time for school interaction and review of learning materials. In developing countries, a growing concern among learners is the unstable internet connection, which directly influences behavioral intention towards online learning (Garcia, 2017). Although not commonly accepted, educational institutions resort to asynchronous courses (e.g., the school provides recorded video lectures and students submit deliverables on their schedules within given time frames) as a viable response to these challenges.

Upon reviewing asynchronous and distance learning in the age of COVID-19, Brady and Pradhan (2020) pointed out two academic institutions that performed curricular changes to accommodate the unforeseen cancelation of in-person didactics. Both institutions purchased Cisco WebEx for video conferencing and recorded online sessions to allow asynchronous playback for participants unable to join at the designated time. In addition, Brady and Pradhan (2020) also urged shifters to consider curricular assessment to ensure students learn the desired content. In medical and allied health professional education, Gupta et al. (2020) explored the utilization of asynchronous environment assessment. Drawn from the fact that assessment is an integral aspect of any teaching and learning system, especially during a pandemic (Fung et al., 2022), valid and fair asynchronous assessment methods are mandatory when transitioning to online instruction. Although the findings may not apply to other disciplines, the study identified several assessment methods for an asynchronous environment, such as problem-based questions, open-ended short-answer questions, and more. Rapanta et al. (2020), on the other hand, suggested assigning more asynchronous collaborative and individual works to compensate for the consequence of teachers devoting more time to designing online learning materials. In another example, Ishak et al. (2020) examined the role of asynchronous online video lectures in flipped-class instruction using a mixed-method research approach. According to the analysis, such instructional materials in an asynchronous environment promoted students’ intrinsic needs based on self-determination theory, such as autonomy, relatedness, and perceived competence.

VBL and Asynchronous Video Lectures

Previous studies have evaluated the utilization of VBL materials in flipped, blended, and online classes as a content-delivery tool. In a systematic review of VBL from 2008 to 2019, Sablić et al. (2020) categorized VBL literature into dimensions such as teachers’ reflections and feedback, professional development, and student learning outcomes. Microteaching, a faculty development technique whereby teachers review teaching session recordings, is one of the earliest applications of VBL as a feedback tool. Tripp and Rich (2012) analyzed 63 studies and arrived at six key dimensions, such as (1) reflection tasks, (2) guiding reflection, (3) individual/collaborative reflection, (4) video length, (5) number of reflections, and (6) the measuring reflection. From the teachers’ perspective, reviewing recorded teaching sessions allows them to learn from the feedback (Christ et al., 2017) and increases preferences for changing their teaching style (Tripp & Rich, 2012). For students, videos create a stimulating learning environment that promotes a deeper understanding of a topic. To construct educational videos that maximize student learning, Brame (2016) underscores three elements. First, cognitive load (or the amount of information that working memory can hold at one time) is addressed by reducing extraneous load and enhancing germane load. The improvement of cognitive load is attainable using techniques such as signaling (e.g., highlighting the most important keywords), segmenting (e.g., short videos), weeding (e.g., eliminating music), and match modality (e.g., Khan Academy–style tutorial videos). Another element is engagement to increase the percentage of watched videos and social partnership between students and teachers. Making multiple videos for a lesson (or dividing a topic into subtopics) and using first-person narrative are some examples to achieve this element. Lastly, videos should promote active learning to increase content knowledge, problem-solving abilities, and positive attitudes towards learning. Aside from the popular methodology of utilizing guide questions, another recommendation to promote active learning is to use interactive features that give students control, such as movement through video and selecting predominant sections to review.

In the systematic review of VBL (Sablić et al., 2020), asynchronous video is an unpopular research topic. The opposite of live video, an asynchronous video is a pre-recorded video intended for watching after production. Live videos are also available for asynchronous mode when purposely recorded for future usage. In 2008, Cardall et al. (2008) performed a cross-sectional survey to compare the student experience between live and video-recorded lectures. The survey results show that live attendance remains the predominant method to watch lectures for various reasons: lack of motivation to watch recorded videos, to show appreciation to instructors, and to feel as if they are getting more for their tuition money. Almost a decade later, Bahnson and Olejnikova (2017) replicated the study and discovered that students “really like” recorded videos, but there is no evidence presented to say that they prefer it more. In addition, student learning did not improve by substituting a self-paced, recorded module for live instruction. Then, during the COVID-19 era, Islam et al. (2020) repeated the study and found that students prefer pre-recorded video lectures to live Zoom lectures because of flexibility and convenience. They also added that learning through video lectures depends on students’ motivation — a barrier reported by Cardall et al. (2008) and a missing factor in the study of Bahnson and Olejnikova (2017). This impediment has been a challenge for many educational institutions, and the pandemic aggravates this vulnerability resulting in students losing their motivation to learn (Bihu, 2022; Patricia Aguilera-Hermida, 2020; Tan, 2020). Notwithstanding, the change of heart among students from live videos to a recorded format indicates the consequentiality of continually rethinking and reevaluating the best way to incorporate videos in education.

Common Presentation Styles of Recorded Video Lessons

In many VBL studies, ‘video’ was the primary terminology for educational video materials. However, there are various video styles (see Figure 1) whose form could affect the evaluation of educational interventions, thus creating inconsistencies in the literature. Therefore, exploring these lecture styles is necessary to establish the characteristics of such video materials, distinguish how it differs from one another, and allow teachers to select the most appropriate video style according to their skills and preferences. Online lecture videos are rendered in various styles, including a (1) narrated slide presentation, (2) presenter-only lecture, (3) live lecture capture, (4) picture-in-picture, (5) hand-drawn videos, and (6) screencasting. The first video lecture style, narrated slide presentation, depends on slide presentation software (e.g., Microsoft PowerPoint) supplemented with a teacher’s voice-over explaining the information displayed on the screen. Excellent verbal communication skill is required for this video style since it is the only connection between teachers and students. Conversely, a presenter-only lecture incorporates a talking head (similar to a commentator on television), which is very effective for a presentation that requires an emotional connection. Aside from the art of communication, presenters must first master the art of visual cues (e.g., good posture, body language, and eye contact). Unlike other styles, live lecture capture happens in a traditional classroom where a live audience is present. The lecture is intended for the synchronous format but then recorded to allow asynchronous access. The main advantage of this video style is the opportunity for teachers to interact with students and for students to raise questions while allowing absentees to catch up with the discussion. Another lecture style is picture-in-picture, which combines the narrated slide presentation and presenter-only lecture. Although it has the advantages of both video styles, picture-in-picture is one of the most complex formats as post-production is required (Chen & Wu, 2015). The inclusion of post-production means that video editing skills and knowledge are needed. On the other hand, hand-drawn videos are an explainer type of media that heavily rely on animated learning graphics drawn by hand on an actual whiteboard or digital drawing board (e.g., Khan-style learning videos). This video style offers several advantages, such as providing information incrementally that is synchronized with the linear pattern of audio data, directing learners’ attention to the crucial part of the lesson, and using hand motion as a social cue which influences learners to work harder (Chen & Thomas, 2020). Lastly, screencasting (or the digital recording of a computer screen) is one of the latest video styles and is used as a video walkthrough to explain how things work. Unlike other video styles, screencasting requires software capable of recording a screen and an energetic voice track to compensate for the lack of emotional connection.

Numerous studies have already investigated the impact of using video styles through comparative analysis. For instance, Chen and Thomas (2020) compared hand-drawn videos and narrated slide presentations in a laboratory setting that simulated an online learning environment. According to 328 undergraduate students, the hand-drawn video was the most engaging. Cross et al. (2013) obtained similar results where the majority expressed that hand-drawn is engaging and personal while PowerPoint presentation is clear and legible, which adds value during lecture and review, respectively. Another comparative evaluation that involves narrated slide presentation was the study of Chen and Wu (2015), which was compared with picture-in-picture and live lecture capture. According to their experimental evaluation, both video styles (i.e., picture-in-picture and live lecture capture) produced significantly better learning performance than a narrated slide presentation. However, the narrated slide presentation generated the most sustained attention and highest cognitive load among the three video styles. In another study, Sadik (2016) employed the live lecture capture and compared it with screencasting to supplement classroom lectures. According to students, screen recordings are better than live recordings in many aspects of video quality and usefulness. Aside from the study of Chen and Wu (2015), there is little evaluation on the employment of picture-in-picture that shows a talking head on the video. Other studies refer to a teacher’s talking head but not in the context of video style. For instance, Mohamad Ali and Hamdan (2016) assessed the effects of a talking head in instructional materials by comparing actual human characters to two-dimensional characters. In this study, there were no video learning materials involved. The nearest previous evaluation to this paper is the observational field study of Kizilcec et al. (2015), which compared video lectures with or without the instructor’s face. Nonetheless, this study aims to replicate some part of their protocol, with the main difference of having this study for emergency remote education and the inclusion of annotations in the video learning materials.

Methodology

Research Design

The present study followed the educational Cluster Randomized Controlled Trial (C-RCT) approach, in which groups of individuals (in this case, class sections of students) were randomly assigned to a treatment. Moberg and Kramer (2015) asserted that C-RCT is ideal for testing interventions when taken on behalf of a group and when the nature of the intervention carries a high risk of contamination. One example of such contamination is the frequent contact between participants, which is likely to happen between students in an online channel. During the pandemic, many studies emphasized the importance of social relationships and student connectedness (Garcia et al., 2022; Hehir et al., 2021). Moreover, the participating university did not permit randomization at an individual level under the latest policy and student enrollment procedures precipitated by the pandemic. Nevertheless, treatment for each group in this study (one control group and three experimental groups) was randomly assigned. The designated treatment for the control group, regular videos (G1), serves as the baseline measurement for comparison with the results from experimental groups with different treatments such as videos with face (G2), videos with annotation (G3), and videos with face and annotation (G4).

Aside from the study protocol of C-RCT, we borrowed concepts from various theoretical frameworks, such as the Cognitive Theory of Multimedia Learning (CTML) (Mayer, 2005) and the extended version Cognitive-Affective Theory of Learning with Media (CATML) (Moreno, 2005). First, CTML provides a guideline for the creation of video lectures. According to this theory, the design of video lectures should not cause extraneous processing demand and suggest various guiding principles to follow, such as coherence, signaling, segmenting, embodiment, and modality. On the other hand, CATML offers a basis for the intervention evaluation. This theory states that motivational factors, where affect acts as the on/off switch, mediate cognitive processes involved in learning from multimedia materials. Consequently, in addition to student learning performance, the present study also investigated affective factors, such as video watching behavior, engagement, attitude, and satisfaction as part of the evaluation of treatments.

Setting and Sample

This educational intervention study was carried out for one semester from January to April of the 2020-2021 academic year at one of the largest universities in the Philippines. Like other educational institutions in the country, this university switched to emergency remote education as a response to the challenges of the COVID-19 pandemic. One of the unique features of the online learning platform in this university is the provision of recorded video lectures for all offered courses of all undergraduate degree programs. These video lectures are purposely created for asynchronous access to accommodate all students who cannot attend the synchronous meetings for the same reasons discussed in the literature. However, a narrated slide presentation was the only available video lecture style, and there were specific professors assigned to create videos for each course (according to specializations). One masterclass consisting of four sections with 50 students each enrolled in an introductory website design and development course was the pool of participants (n = 200). Three full-time professors handled the masterclass, and collaborative teaching was the primary method of instruction. Synchronous meetings were twice a week (lecture and laboratory sessions) and two hours per meeting. Nonetheless, students were not required to attend synchronous meetings (except the orientation) because video lectures were already available in the learning management system. Student enrollment per class section was not controlled in this study but based on the procedures mandated by the university.

Video Lectures

Although video lectures with a narrated slide presentation style were already available, new video lectures were recreated from scratch to allow uniformity in all treatments. Without this, the video style on the other treatments will be different from the available videos, which may affect the evaluation. The development of video lectures for the present study followed some of the applicable guiding principles of CTML. This includes segmenting (information is presented in small user-spaced segments), pre-training (key terms are presented before the actual lesson), coherence (non-essential information was removed), modality (the speech was used in the discussion), embodiment (actual human was used as an agent for videos with a teacher’s face), personalization (lessons were presented in a conversational style), and voice (actual human voice was used instead of robot-like voice from text-to-speech programs). The recording of video lectures followed the picture-in-picture video style, with the screen (PowerPoint presentation) and video (talking head) recorded separately. The recorded video lecture with the PowerPoint presentation and without the talking head served as the regular videos for G1. When combined (screen and video) in post-production, the new video output served as the videos with face for G2. In another round of post-production, annotations were added to regular videos to form the videos with annotation for G3. Finally, the recorded videos of a talking head early on were integrated with the treatment for G3 to form the videos with face and annotation for G4. All videos underwent a review and approval stage with other subject matter experts teaching the same and related courses. This requirement is necessary to ensure the correctness, completeness, and quality of video materials. Figure 2 shows video screenshots for each treatment in the same timestamp.

The syllabus of the introductory web design and development course we followed is composed of seven modules covering three web languages: HTML, CSS, and JavaScript. Each module was divided into different subtopics (minimum of two and maximum of four). Dividing a lesson into segments complies with CTML, which recommends that video lectures should not cause extraneous processing demand (Mayer, 2005). A total of 18 subtopics and 72 videos (four videos per subtopic) were created for this course. The mean duration of videos was 258 seconds (range = 193 to 318 seconds), which is a little bit higher than the ideal video (roughly three minutes) length recommended by Bialowas and Steimel (2019). Given the nature of the course, most videos were live coding demonstrations and hands-on exercises.

Research Instruments

We evaluated cognitive and affective factors using several research instruments. Throughout the 14-week academic term, students took ungraded formative assessments after every lesson, graded summative assessments after every two lessons, and a comprehensive final examination. Course grades, the basis of learning performance, were derived from summative assessments (75%) and a final examination (25%). The content of the final examination is similar to the pre-test given during course orientation. All these departmental assessments were designed and developed by a pool of professors (n = 5) who are considered the subject experts. Assessment scheduling and grading systems were all determined by the target university. In terms of watching behavior, several video metrics were collected, such as view-through rate (percentage of students who watched the video in its entirety), view count (total number of video views), heatmap (how a student played the video), and watch time (how long a student watch the video). To capture these video performance metrics automatically, a custom Google Chrome extension was purposely coded to track, monitor, and save activities performed by students. This approach was to ensure that we can capture the data we need and that they are not being stored in an external company’s database. All participating students agreed to use a Google Chrome browser, turn on the developer mode, and install the extension. Each visit, the corresponding activities (e.g., clicking the play button), and other important data (e.g., length of stay) were recorded by the extension and transmitted to our database. For privacy protection, all data were encrypted to prevent the identification of students at any level of use. We also limited the data collection within the video landing pages under a single domain name. The remaining affective factors such as engagement, satisfaction, and attitude were evaluated using a single survey instrument consisting of these three dimensions. Using the expert judgment approach, the initial instrument was evaluated by the same pool of professors to enhance content validity by checking the accuracy, completeness, and readability. Content validity index testing was employed to determine whether each item per scale was congruent with the construct. The computation resulted in an average congruency percentage of 91, which was higher than the threshold of 90 percent. A pilot test was also conducted with students from the other masterclass of the same course (n = 28) to ensure the reliability and validity of the instrument. Using Cronbach’s alpha, the computation resulted in 0.74 for engagement, 0.84 for attitude, and 0.78 for satisfaction. All Cronbach’s alpha values were above the cutoff point of 0.7, indicating that the instrument was internally consistent. Sample questions include “I think that the video lectures improve my learning” for attitude, “When video lectures are available, the online class experience is much better” for satisfaction, and “I was fully concentrated while watching the video” for engagement.

Data Collection and Analysis

During course orientation, students answered a pre-test questionnaire to ensure that prior knowledge regarding the subject matter was not significantly different among the groups. We also utilized the result of this pre-test in a within-group comparison to determine if there were a significant increase in the final examination (post-test) scores. Students completed the pre-test on January 14, 2021, and the post-test on April 9, 2021. During the post-test collection period, students also answered the survey questionnaire consisting of three affective constructs (attitude, satisfaction, and engagement) subjected to a between-group analysis. Moreover, assessment scores were collected as part of the learning performance analysis (cognitive effect) of all treatments. All students submitted a confidentiality undertaking and informed consent before starting their first lesson. The collected data were analyzed using IBM SPSS Statistics 26.0 (IBM Corporation, USA). Demographic information was reported, and data distribution was tested using descriptive statistics. We used the paired t-test, one-way Analysis of Variance (ANOVA), and Multivariate Analysis of Variance (MANOVA) to analyze the within-group comparison of pre-test and post-test, between-group comparison of pre-test questionnaire, and the results of the learning performance between groups in all the recorded assessments.

Results and Discussion

Dependent Variable G1
Mean (SD)
G2
Mean (SD)
G3
Mean (SD)
G4
Mean (SD)
Significance
Summative 1 (S1) 64.13 (9.56) 76.29 (5.23) 66.91 (8.11) 73.24 (6.61) .000
Summative 2 (S2) 43.29 (5.43) 53.29 (7.10) 78.82 (7.91) 79.43 (8.22) .000
Summative 3 (S3) 44.42 (6.21) 49.91 (9.98) 62.89 (7.37) 68.26 (8.49) .000
Summative 4 (S4) 52.98 (7.23) 69.13 (9.56) 78.26 (6.49) 79.18 (9.51) .000
Final Exam (FE) 63.29 (7.22) 67.82 (5.37) 81.29 (8.21) 82.97 (7.69) .000

The primary objective of this study was to examine the cognitive and affective effects of video lectures with annotations and talking heads in an asynchronous mode of learning. Using a C-RCT study design, four groups of students received different treatments throughout the 14-week intervention. We analyzed their learning performance, watching behavior, satisfaction, engagement, and attitude to measure the effectiveness of annotations and talking heads. A demographic survey revealed that the participants were dominated by male students (89.23%) with a mean age of 18.92 years. The mean scores of pre-tests among the four groups ranged from 38.6 to 54.2, and the one-way ANOVA analysis revealed that all participants possessed the same prior knowledge regarding the subject (F = .492, p = 0.765) before the intervention.

Learning Performance

The first analysis concerning the cognitive effects of treatments was the comparison of pre-test and post-test questionnaires within each group. Using paired t-test, the analysis revealed that the mean scores of G1 improved from 34.24 ± 5.21 to 63.29 ± 7.22 (p = 0.00), G2 improved from 43.16 ± 7.28 to 67.82 ± 5.37 (p = 0.00), G3 improved from 42.11 ± 7.11 to 81.29 ± 8.21 (p = 0.00), and G4 improved from 38.92 ± 8.31 to 82.97 ± 7.69 (p = 0.00). These within-group analyses are consistent with the current literature proving the positive impact of using VBL (Sablić et al., 2020). The learning performance of all groups in their summative assessments and comprehensive final examination were also scrutinized. Using MANOVA, the results revealed a significant difference between treatments (see Table 1). G2 received the highest score in S1 (Introduction to Web Technologies), while G4 attained the highest scores in S2 (HTML), S3 (CSS), S4 (JavaScript), and FE. In addition, G1 received the lowest scores in all activities.

Although all groups have significantly improved due to video lectures, it is noteworthy that G4 outperformed other groups in most of the recorded assessments. This finding indicates that combining annotations and talking heads in video lectures yields the highest positive impact on student learning performance compared to using each technique independently or not at all. The nature of the course and how it is ordinarily positioned as a programming course (even though it is technically not) in computing curricula (Park & Wiedenbeck, 2011) may have something to do with this finding. In a standard introductory web design and development course, students learn how to code web languages such as HTML, CSS, and JavaScript. Students often mistakenly identify HTML and CSS as programming languages and the inclusion of a real programming language (i.e., JavaScript) and coding activities may explain why it is deemed a programming course. In computer programming education, there is a ‘fear of coding’ among novice students. This phenomenon causes a negative attitude and low academic achievements, especially when navigating this uncharted territory alone (Garcia, 2021). Thus, the cognitive and affective support of teachers in a form of annotations and talking heads play a significant role especially in learning complex topics online (S2-S4) but not as much in introductory lessons (S1).

As we shift to online instruction, the learning environment presents an opportunity to promote independence and a sense of responsibility to students. However, the loss of human interaction that is fundamental for them becomes critical. One of the external factors influencing the negative feelings and perceptions towards the course is the availability of teachers. According to Rogerson and Scott (2010), teachers play a critical role in the student learning experience concerning this fear factor. In a study by Ferri et al. (2020) on remote teaching during a pandemic, students asserted that they “need to feel emotions, and that can not be given by a 100% remote experience”. While we acknowledge that there is no substitute for proper teacher-student interaction, the mitigation of this problem may be attributed to the inclusion of talking heads in video lectures. A familiar talking head may have accentuated a parasocial interaction for students that fills their social needs and decreases their loneliness. This social surrogacy is comparable to the illusionary parasocial relationship between television personalities and viewers commonly tackled in the Uses and Gratification Theory (Rubin, 2008). Such video lectures thus induce social presence even in a virtual environment, which leads to a more inviting online learning experience and reduced transactional distance. This is a significant finding because a conducive digital learning space is a requirement, especially in a pandemic context (Lamsal, 2022). On the other hand, Kizilcec et al. (2015) reported similar findings of an increased social presence when watching video lectures with their teacher's face. This video style is likewise associated with an increased engagement and positive attitude, which is illustrated in the subsequent discussion.

Another significant aspect to explore is how teacher-generated annotations improved learning in an introductory web design and development course. Currently, there is no literature yet for video lectures with annotations in this course or even in computer programming education. As such, the improved learning performance in the present study was explored through the lens of computer languages that have commonalities with human languages (Connolly, 2001), where annotations have been thoroughly analyzed. Similar to the practices in language education, the present study incorporated various annotation techniques and styles such as digitally writing notes, explanations, comments, drawings, and other types of visual remarks (e.g., underlining parts of the code or highlighting sections of a web page). Past studies have already explored the effects of various multimedia annotations for second language acquisition, which is regarded as one of the computer-mediated communications that offer access to authentic language input (Akbulut, 2007; Yeh et al., 2017). In computer programming, there is a learning method called a top-down approach where students use code snippets to acquire language ability before moving to the details (i.e., grammar, data definition, vocabulary) of the language (Saito & Yamaura, 2013). An interesting finding from the MANOVA results supporting the similarity to language learning was that teacher-generated annotations worked significantly better on topics that contain web languages (S2, S3, S4) than the foundational concepts (S1). It also goes back to the fundamental concepts of CTML that suggest learning occurs through a dual-coding process (e.g., a combination of verbal and non-verbal processing).

Video Watching Behavior

Throughout the experiment, the collected data reached 42,425 total page views (212.13 page views per student) for all the web browsing activities within the learning management system. Moreover, 39.92% (16,935 views) of these page views were attributed to the video pages and a total of 47,665 minutes of watch time. In addition, a total of 16,245 web browsing sessions were recorded using one-hour idle session delimiters. According to ANOVA, there was a significant difference in watch time among the groups (F = .515, p < 0.001). G4 accumulated a total of 15,256 minutes of watch time with a view-through rate of 92%, followed by G3 with 13,639 minutes of watch time and 87% view-through rate, G2 with 10,245 minutes of watch time with 67% view-through rate, and G1 with 8,525 minutes of watch time with 57% view-through rate. This could be explained by the fact that a talking head is more engaging (Guo et al., 2014) and that annotations made students pause and/or replay the video materials (Tseng, 2021). Figure 3 demonstrates how students played a video material where drop-offs indicate where they stopped paying attention, and big spikes signify the section of the media that is compelling enough to watch and replay. All groups started with 100% attention in the first few seconds of the timestamp. However, it shows that G1 lost engagement in the middle part of the video and possibly went back at the end to watch the summary and conclusion of the lessons. On the other hand, G2 and G3 performed almost similarly while G4 retained attention in most parts.

Satisfaction, Engagement, and Attitude

For the affective factors, the between-group analysis (Figure 4) shows that G4 has the highest mean scores among the groups (4.27 ± 0.87) followed by G3 (3.90 ± 0.52) and G2 (3.91 ± 0.52) with almost similar mean scores, and G1 with the lowest mean score (2.82 ± 1.06). Among these factors, only attitude was not significant. First, the positive impact on satisfaction is consistent with existing studies that proved educational videos as a vital instructional material that enhances learning satisfaction compared to traditional education (El-Sayed & El-Sayed, 2013) and text-based video-free online learning (Choi & Johnson, 2007). Further, talking heads on lecture videos may have compensated for the lack of interaction in online instruction, which is the most cited reason for dissatisfaction with online learning (Cole et al., 2014). The additional effort of the teacher to add video annotations may have caused students to appreciate the online course, which is similar to the findings of Draus et al. (2014), where students expressed their appreciation for teachers who devote more effort to an asynchronous online class. In the case of engagement, it corroborates previous studies exhibiting that talking heads and video annotations cause students to like the lectures better (Kizilcec et al., 2015) and can be favorable for enhancing student learning engagement (Tseng, 2021), respectively. Meanwhile, Tseng (2021) likewise reported that annotations distracted some students from watching the videos, which seems to be not the case in this study. Future study is still warranted to verify the impact of video annotations in teaching other courses. Lastly, the results on the attitude factor were not significant despite the positive acceptance of online learning before and during the pandemic. It also contradicts existing studies that exhibit the significant positive impact of VBL (Sablić et al., 2020), video annotations (Chiu et al., 2018), and teacher’s talking head (Kizilcec et al., 2015). This finding may be explained by the fact that, from a global perspective, transitioning from onsite to online lectures due to the COVID-19 crisis has a stronger effect, particularly on male students from less developed regions (similar to the participants in this study), as determined by Aristovnik et al. (2020). For verification, another study should be conducted after the pandemic. Overall, both talking heads and annotations produce advantageous effects to the affective domain.

Conclusion

In this paper, we investigated the cognitive (i.e., learning performance) and affective (e.g., watching behavior, attitude, engagement, and satisfaction) effects of integrating annotations and talking heads in video lectures. Following an educational-based cluster randomized controlled trial approach, four cohorts of students received different treatments (regular videos, videos with face, videos with annotation, or videos with face and annotation) within a 14-week academic period. Our major findings were as follows: (1) videos with talking heads and annotations yielded the highest learning performance, (2) the watch time of videos with talking heads and annotations was significantly longer, and (3) students from the G4 cohort expressed the highest satisfaction, engagement, and attitude scores. Ultimately, these findings suggest a valuable opportunity for academic institutions, curriculum developers, instructional designers, and teachers who are and will be moving face-to-face courses to online learning management systems to maximize the usage of VBL, especially in a time of global crisis. Because pedagogical patterns used in face-to-face lessons require revisions to accommodate the learning requirements in a virtual classroom, these findings converge on some recommendations when designing and creating video lectures. First, we recommend including a teacher’s talking head in response to the sense of isolation that threatens students’ ability to learn. This video style increases the feeling of emotional connection in an online learning environment and balances the use of such technology with the human touch. We also recommend incorporating annotations to promote better comprehension in video lessons. This technique compensates for the absence of teacher support causing students to experience learning difficulties. In the case of student attitude, it may be necessary for institutions to offer various learning options, address learners’ emotions directly, and foster intrinsic motivation through activities that encourage exploration.

Success notwithstanding, our findings must still be observed within its limitations. First, the recruitment of participants was subjected to the temporary policy and student enrollment procedures precipitated by the pandemic. This restriction resulted in a small sample size that may influence the generalizability of quantitative results. Furthermore, the topics covered by our video materials followed the syllabus of an introductory web design and development course. The experiment may produce different effects when performed in other courses. Future studies could replicate our experiment in other disciplines to further demonstrate and validate the results. It is also important to note that the creator of our video lectures is proficient with video production and editing resulting in high-quality and professional videos. Guseva and Kauppinen (2018) highlight competencies needed in producing effective educational videos, such as video and audio qualities, presentation skills, content, visuals, and understanding of the video production process. These competencies denote a comprehensive training is needed by teachers who may lack these professional skills. In addition, faculty time requirements may be prohibitive, and creating high-quality asynchronous content could be more time and labor-intensive than creating traditional didactics (Kraut et al., 2019). Therefore, teachers need to evaluate specific student needs to determine the right balance between the effort spent on creating lecture videos and potential learning gains. One potential solution to unburden teachers with these additional tasks is to hire external video editors. However, close supervision and collaboration with subject matter experts are necessary to ensure the correctness and quality of video materials. Another consideration is how to present the talking head. In our study, the talking head video focused on the upper human body from head to shoulder only. The experiment could yield a different result if a full-body was presented because of more life-like behaviors (e.g., gestures) as a visual cue. Future studies could compare different presentations of talking heads and determine which one is the most effective. Meanwhile, one issue we faced with talking heads was their location on the video screen. The extensiveness of some of the content (e.g., source codes) that we need to present on the screen forced us to reposition the talking head’s location depending on the slide. This inconsistency could be distracting for some students and warrants further solutions. Finally, as mentioned in the methodology, there were still students who attended synchronous meetings and did not exclusively rely on video lectures, which could have affected the student learning performance.

Despite the teaching and learning difficulties precipitated by the pandemic, this global health crisis only forced education innovation into the core of every academic institution. It also presents an opportunity to identify new strategies and approaches that could leapfrog progress and respond to the issues during these challenging times. Ready or not, academic institutions will move forward by adjusting to a new educational environment.