Introduction

In the interdisciplinary subfield of computational linguistics and computer science, Automatic Speech Recognition (ASR) is defined as the automated process of converting voice input into its corresponding transcript. It is an important research domain in human-computer interaction (HCI) due to its vast real-world applications that unceasingly alter the way people live (Fendji et al., 2022; Hu et al., 2011; Proksch et al., 2019). The implications of ASR technologies led to classifications based on how they strengthen human-human communication (HHC) and human-machine communication (HMC) (Yu & Deng, 2015). One example is the obstacle brought by monolingualism in HHC, particularly when people communicate with people from another language group. ASR alleviates this difficulty and eliminates the language barrier through automated speech-to-speech translation (e.g., Asian and English languages (Nakamura et al., 2006)). In terms of HMC, there have been many exciting advancements and voice assistants (e.g., Siri, Cortana, and Alexa) have become mainstream technologies because they come inbuilt into numerous devices, including smartphones and computers. The ubiquitousness of ASR is most apparent in the persistent proliferation of voice assistants in many sectors, such as health (Miner et al., 2020; Schulte et al., 2020; Seródio Figueiredo et al., 2022), business (Chang, 2000; Hegdepatil & Davuluri, 2021; Rabassa et al., 2022), education (Al Shamsi et al., 2022; Fox Carly et al., 2021; Russell et al., 1996), and others.

The reputation of computer programming as an intellectually challenging course has conceived a new phenomenon known as programming anxiety. Sometimes referred to as fear of coding, many researchers have studied the reasons for its occurrence and prospective solutions due to its negative effects on the learning process. It has been posited that this psychological state occurs because students mistakenly assess their programming learning ability compounded by the underdevelopment of requisite skills (Connolly et al., 2009). With the shortage of self-efficacy and a sense of control, it is consequently crucial to formulate strategies that foster student motivation and confidence. Meanwhile, fear was examined as a descriptor that symbolizes a lack of appreciation of or interest in computer programming as a discipline (Rogerson & Scott, 2010). Accordingly, there is a prevalence of feelings of discomfort or apprehension among tertiary students. In addition to internal factors (e.g., attitude and motivation), external factors (e.g., teachers and their strategies) were also emphasized as catalysts for the inhibiting force of fear. In an attempt to reduce, if not eliminate, programming anxiety, different strategies have been proposed, such as utilizing online interactive coding platforms (Figueroa Jr. & Amoloza, 2015), gamified online courses (Garcia & Revano, 2021), technology-supported interactive strategies (Jiang et al., 2020), group learning approaches (Garcia, 2021), educational programming languages (Demir, 2022), and others. A common denominator among these pedagogies is the provision for a more engaging and active learning experience.

Almost four decades ago, one study conducted a controlled experiment to compare voice and keyboard as input modalities in writing computer programs (Leggett & Williams, 1984). The basis of this study leans on the promising enrichments offered by voice inputs in crafting effective user interfaces that minimize keyboard operations and maximize real-time naturalistic HCI. Through the measures of accuracy, speed, and efficiency, the experiment discovered that voice competes reasonably well with the keyboard. Nonetheless, it was hypothesized further that voice could have outperformed the keyboard in all aspects had the computer processing power been sufficient and participants were more experienced in voice-enabled devices. This assumption warrants further investigation. Following recent advancements in ASR technologies, this study replicates the experiment and likewise extends the evaluation by assessing the pedagogical effectiveness of a voice-driven coding approach. Understanding the effectiveness of voice inputs using ASR technologies may present new and exciting opportunities for teaching and learning computer programming. Accordingly, the findings of this experiment will benefit teachers and students of computer programming in their intended academic outcomes.

In this study, we explicitly referred to the use of voice instead of a keyboard to write source code as voice programming. This programming by voice approach is a relatively underdeveloped research area and deserves further investigation in light of recent developments in ASR technologies and HCI concepts. There are also significant health concerns about computer-related injuries caused by excessive typing. In addition, some motor disabilities prevent computer users from using a keyboard and mouse. Thus, the significance of our study is not constrained to programming pedagogies but also implications that extend to health disability inclusion. These research questions (RQs) guided our study:

  • Is there a significant difference in terms of attitude and self-efficacy before and after the activities?
  • Is there a significant difference in the code correctness between voice and typed input?
  • Is there a significant difference in the coding speed in terms of modality and difficulty of machine problems?

Background of the Study

Automatic Speech Recognition

For human beings, verbal communication is the most natural form of communication. Therefore, teaching computers to learn and understand natural human languages (e.g., natural language processing) is an unsurprising idea. Supported by existing voice and speech technologies, a naturalistic interaction between users and computers and other digital devices using verbal commands are no longer a fictional scenario (Alharbi et al., 2021). This interaction is largely attributed to the continuous advancements in the field of speech signal processing, particularly in ASR technologies.

Voice User Interface

There has been a growing number of devices that integrate a voice-user interface (VUI) making ASR more convenient. VUI is a technology that provides ASR capabilities to assist users in interacting with their electronic devices using voice commands. Some applications include home automation systems (Roy et al., 2021), smart speakers (Baimirov et al., 2022), service robots (Stavropoulou et al., 2020), and others. In education, VUI can also support the teaching and learning process. For instance, schools can enhance their student support services through VUI-enabled applications. One example is the chat system that assists computer science students in their academic concerns, including referencing, academic writing, and programming (Seeroo & Bekaroo, 2021).

Computer-Related Injuries

Computer-related injuries (e.g., carpal tunnel syndrome) are a prevalent health issue among computer users. Daily computer use of at least four hours significantly increased frequent health complaints, especially on hands, fingers, and wrists (Hakala et al., 2010). There is also extensive evidence that keyboard users are susceptible to Repetitive Strain Injury (also known as overuse syndrome) (Keller et al., 1998). For computer programmers who are prolonged keyboard users, this health issue is a serious problem. Ergonomic keyboards are a solution that has been extensively used by typists (Ripat et al., 2010). Albeit this product is also applicable for coders, eliminating keyboards and replacing them with VUI devices is worth exploring. Some studies have already examined a hands-free computer interface approach in voice-assisted software modeling (Black et al., 2019) and speech-based integrated programming environments (Begel & Graham, 2006; Elmaghraby, 1989).

Students with Disabilities

VUIs are also appealing to users who experience difficulties in the conventional graphical user interface (Vacher et al., 2015). For instance, a series of co-design workshops were conducted to develop VUIs with and for visually-impaired students (Metatla et al., 2019). The inspiration for this study was the proliferation of voice-based personal assistant devices. It was learned that VUIs possess a significant potential for creating accessible and inclusive interactions in an academic setting. Another proof supporting this claim is the utilization of a “programming by voice” approach for motorically challenged children (Cordero et al., 2021; Okafor, 2022; Wagner et al., 2012). The empirical findings from prior works have propelled ASR technologies and VUI devices into the education sector as a potential instructional technology intervention.

Methodology

Research Design

This cross-sectional experimental research with a one-group pretest-posttest design is an empirical investigation of ASR and voice programming. In recent years, ASR technologies through VUI devices have been catapulted into many sectors of society, making them a subject of interest among researchers. Instead of an exploratory nature of research like in previous investigations, we purposely selected an experimental methodology to examine cause-effect relationships. Driven by the same notion that voice is the most natural form of human communication, we replicated a controlled experiment conducted almost four decades ago (Leggett & Williams, 1984). This experiment evaluated voice versus keyboard as modalities for writing computer programs. We extended this experiment by assessing the efficacy of voice programming in terms of attitude, self-efficacy, code correctness, and coding speed.

Setting and Sample

Students from an institute of technology in the capital region of the Philippines were randomly selected to join the study. This non-sectarian, private higher educational institution offers four-year information technology and computer science programs. At the time of our research, the Bachelor of Science in Information Technology (BSIT) has four specializations, such as Animation and Game Development (BSIT-AGD), Digital Arts (BSIT-DA), Business Analytics and/or Service Management (BSIT-SMBA), and Web and Mobile Applications (BSIT-WMA). Regardless of multiple specializations, the BSIT degree program positions the computer programming courses as a potent foundation not only for academic development but also for a future computing career and employability prospects. This standpoint is noticeable in the numerous programming courses in the curriculums. The sample size was computed using Slovin’s formula n = N ÷ (1+Ne2) and the inclusion criteria were (1) a passing grade in the introductory programming course and (2) enrollment in any of the subsequent courses (e.g., Object-Oriented Programming). With a population of 126 programming students, the preferred sample size was 96. All students agreed and submitted an informed consent form.

Measurement and Data Collection

In our data collection, we utilized a survey questionnaire that has been similarly used in another programming study (Garcia, 2021). This questionnaire contains demographic information and measures of programming attitude and self-efficacy using validated scales namely the Attitude Scale of Computer Programming Learning (ASCOPL) and the Computer Programming Self-Efficacy Scale (CPSES), respectively. The demographic information section is composed of students’ age, gender, program specialization, and prior programming grade. The ASCOPL is a five-point Likert with three constructs (i.e., willingness, negativity, and necessity) measuring attitude toward learning computer programming. The CPSES is an evaluation tool with five constructs (i.e., algorithm, logical thinking, debug, control, and cooperation) that measure students’ beliefs in their capability to perform well in computer programming courses. The questionnaire was distributed before and after the experiment for the pretest and posttest comparison. The experiment was comprised of programming activities with three difficulty levels. The easy level includes Input and Output, Arithmetic Operation, and Variable Manipulation. The average level includes Conditional Statements, Looping Structures, and Standard Library Functions. Finally, the difficult level includes Array Data Structures and User-Defined Functions. Some of the programming activities we used for each level are as follows:

Easy: Write a program that will prompt users to input their name, section, and other custom information about them as well as class schedules (at least five courses with complete details such as the time, room, teacher, etc.) After students encoded all their details, clear the whole screen and then display the output in an organized, COR-inspired layout.

Moderate: Write a program that reads a date from the user and computes its immediate successor. For example, if the user enters values that represent 2022-11-18 then your program should display a message indicating that the day immediately after 2022-11-18 is 2022-11-19. If the user enters values that represent 2022-11-30 then the program should indicate that the next day is 2022-12-01. If the user enters values that represent 2022-12-31 then the program should indicate that the next day is 2023-01-01. The date will be entered in numeric form with three separate input statements: one for the year, one for the month, and one for the day. Never mind the leap year for now.

Difficult: Write a program that can simulate an ATM. In this program, you should be able to integrate different programming techniques such as conditional statements, looping constructs, and functions. It is also important to use a multidimensional array for the card and PIN (login).

Data Analysis

The collected data were analyzed using IBM SPSS Statistics 26.0. We utilized descriptive statistics to report the demographic information. Since ordinal data were produced by the ASCOPL and CPSES instruments, we used the Wilcoxon signed-rank test for the comparison of pretest and posttest scores of attitude and self-efficacy (RQ1). Rather than the efficiency of the algorithm, we graded the programming activities based on code correctness (i.e., code is running and achieves the correct output) making it a dichotomous variable. Thus, McNemar's test was utilized via a cross-over design where programming students write solutions to machine problems using both keyboard and voice (RQ2). For RQ3, we employed MANOVA to determine whether the coding speed varies in terms of difficulty levels of activities. All levels have ten pre-made activities and each activity was written in three different formats (n = 30) for randomization purposes.

Results

Demographic Profile

Table 1 presents the profile of the respondents. A total of 96 programming students joined the experimental study. The mean age was 18.78 ± 0.81 years, and the majority were male students enrolled in a BSIT-WMA program. In the earlier programming course, their average grade ranged from 81-85% (mean = 83%).

Profile Classification f %
Age Less than 18 years old
18 years old and above
13
83
18.75
81.25
Gender Male
Female
66
30
68.75
31.25
Specialization BSIT-DA
BSIT-WMA
BSIT-AGD
BSIT-SMBA
13
39
25
19
13.54
40.63
26.04
19.79
Previous Programming Grade 70-75
76-80
81-85
86-90
91-95
96-100
7
19
37
18
10
5
7.29
19.79
38.54
18.75
10.42
5.21

Attitude and Self-Efficacy

The evaluation of programming input modalities in terms of attitude and self-efficacy exhibited mixed findings according to Wilcoxon signed-rank test (Table 2). In terms of attitude, only negativity was the significant construct decreasing from 3.85 ± 1.44 to 3.55 ± 1.15 (p = 0.174). Although the scores decreased, it is still a positive result as it implies that negative perceptions of computer programming were reduced after the experiment. Both willingness (3.11 ± 1.50 to 3.49 ± 1.15) and necessity (from 3.22 ± 1.42 to 3.40 ± 1.18) increased but not significantly (p = 1.000). When it comes to self-efficacy, control was the only significant construct decreasing from 4.01 ± 0.81 to 3.36 ± 1.12 (p = 0.11). Unfortunately, this finding indicates that students were more in control of using the keyboard than voice inputs.

Factor Constructs Pretest
(± SD)
Posttest
(± SD)
p-value
Attitude Willingness
Negativity
Necessity
3.11 ± 1.50
3.85 ± 1.44
3.22 ± 1.42
3.49 ± 1.15
3.35 ± 1.15
3.40 ± 1.18 
1.000
0.044
1.000
Self-Efficacy Logical Thinking
Algorithm
Debug
Control
Cooperation
3.43 ± 1.50
2.53 ± 1.17
3.58 ± 1.11
4.01 ± 0.81
3.65 ± 1.16
3.10 ± 1.40
2.69 ± 1.15
3.98 ± 0.85
3.36 ± 1.12
4.05 ± 0.83
1.000
1.000
0.823
0.011
1.000

Code Correctness

The evaluation of programming input modalities in terms of code correctness likewise revealed mixed findings according to McNemar's test. Both easy (p = 0.031) and moderate (p = 0.008) activities yielded significant changes when students switched to voice as the input modality. For the difficult level, switching the modalities did not prompt significant changes (p = 0.250). In the crosstabulation (Table 3), it was shown that 29 students initially got incorrect solutions but correctly answered the easy problems after switching to voice programming. The same results can be observed in average problems, where 20 students originally got incorrect answers but made correct solutions after using voice as the input modality. Finally, there were more negative changes in the difficult level (correct to incorrect) albeit not significantly.

Difficulty Levels Keyboard Voice p-value
Incorrect Correct
Level 1 – Easy Incorrect
Correct
Total
9
6
15
29
52
81
0.031
Level 2 – Average Incorrect
Correct
Total
15
8
23
20
53
73
0.008
Level 3 – Difficult Incorrect
Correct
Total
15
13
28
8
60
68
0.250

Coding Speed

According to the one-way MANOVA analysis (Table 4), the coding speed between keyboard and voice as the input modality was significant in terms of difficulty levels (p = 0.039; Wilk's Λ = 0.542, partial η2 = 0.434). Interestingly, voice is faster (167.03 ± 81.06 seconds) than keyboard (173.30 ± 76.45 seconds) when the difficulty level of the machine problem is easy. On the other hand, students finish their moderate and difficult activities faster when they use a keyboard (538.76 ± 249.16 and 886.80 ± 172.24 seconds) instead of the voice programming technique (545.07 ± 253.09 and 900.80 ± 176.68 seconds). This finding indicates that voice programming may only be useful for easy activities.

Coding Speed Keyboard Voice
Level 1 – Easy
     M ± SD
     Min – Max

173.30 ± 76.45
43 – 292

167.03 ± 81.06
30 – 294
Level 2 – Average
     M ± SD
     Min – Max

538.76 ± 249.16
107 – 994

545.07 ± 253.09
112 – 997
Level 3 – Difficult
     M ± SD
     Min – Max

886.80 ± 172.24
608 – 1196

900.80 ± 176.68
631 – 1314

Discussion

In this experimental study, we examined voice programming as a coding approach in terms of attitude, self-efficacy, coding speed, and code correctness among programming students. This is a replication and extension of an experiment on using voice as an input modality in computer programming (Leggett & Williams, 1984). A total of 96 random students from an institute of technology in the capital region of the Philippines participated in a series of programming activities with three difficulty levels for both voice and keyboard modalities. Overall, our study showed mixed findings.

According to our findings, the negativity towards computer programming as a discipline significantly decreased after using the voice modality. Computer programming has a reputation for being a difficult course and many students are afraid of learning it because of this perception (Garcia, 2021). More importantly, students are more likely to believe computer programming is difficult when they have negative impressions of the subject. The fear factor is also detrimental because it diminishes intrinsic motivation and negatively influences student attitudes (Rogerson & Scott, 2010). Therefore, the effect of voice programming opens positive opportunities for teachers to engage their students who have a negative attitude toward this subject. One possible explanation for the reduction of negativity could be attributed to the enjoyment when using voice inputs. In a coding workshop, it was found that fun programming activities have a positive effect on student attitudes towards coding (Tisza & Markopoulos, 2021).

Despite the positive effect of voice programming on attitude, the opposite is evident in the self-efficacy construct. According to students, using a voice interface significantly decreased their perceived control. This finding is consistent with what has been discovered in the original experiment (Leggett & Williams, 1984), where less input task completion rate through the voice editor was recorded compared to the key editor. One possible explanation is that the keyboard has been the primary text input device since the introduction of computers and so people are more accustomed to it than to voice and speech recognition technologies. The positive and negative impact of employing a voice programming approach to attitude and self-efficacy, respectively, requires teachers to balance their implementation of this programming pedagogy. Importantly, it was found that perceived academic control is a crucial factor to predict dropout intention through the mediation of anxiety (Respondek et al., 2017).

In terms of code correctness, it appears that voice modality benefits students only when solving easy and moderate machine problems. This finding could be attributed to the small cognitive load demanded by these activities, allowing students to enjoy the voice interface more. However, when the activity is difficult and requires strict concentration, using a voice modality did not elicit significantly positive changes. Thus, it would be more beneficial to introduce the voice programming approach in the early stage of students’ coding journey. This tactic is also compatible with the necessity to teach basic concepts first (e.g., syntax, variables, expressions, and operators) before complex algorithms (Garcia et al., 2022).

The coding speed also varied depending on the difficulty of the programming activities, which is expected because easy ones require less time to finish and vice versa. However, it was clear that students were able to maximize the voice interface to finish easy activities but not moderate and difficult ones. This finding is similar to previous studies that found the speech input method was faster than keyboard input (Hauptmann & Rudnicky, 1990; Ruan et al., 2018). However, it contradicts the original experiment (Leggett & Williams, 1984), but the authors claimed that voice input as the mode of coding would have been more competitive with the keyboard had the technologies been sufficient. With the readiness of more advanced ASR technologies, it explains why voice inputs can be faster than keyboard inputs. This finding also verifies our result regarding code correctness rate, indicating the benefits of voice programming to easy machine problems.

Our results offer considerable implications for programming teachers and students given the widespread belief that computer programming is a challenging subject (Connolly et al., 2009; Garcia, 2021). With the positive influence of a voice programming approach on student attitudes, teachers may incorporate the voice interface into their laboratory activities. When students do not have a negative attitude toward the course, they are more likely to develop their self-efficacy and academic performance (Tisza & Markopoulos, 2021). However, caution must be taken in terms of the timing of implementation. According to our results, the voice programming approach is only advantageous when the activities are easy and short. Thus, implementing it in activities that impose a higher cognitive load may result in negative effects academically. Finally, previous studies underscored the need for ASR technologies with adequate technical capabilities to ensure the success of a voice-driven programming approach (Leggett & Williams, 1984; Ruan et al., 2018). It emphasizes the necessity for schools to invest in equipment and training to ensure the success of its implementation.

There are limitations to our study that present future avenues for research. First, our study was limited to students enrolled in a computing degree, which indicates that we did not incorporate other non-computing degrees (e.g., engineering) that also offer computer programming courses. In coding-based events, it was highlighted to involve students from different programs as they may have different perceptions of the activity (Garcia, 2022). In terms of attitude and self-efficacy, the instrument was self-administered, which may result in bias and social desirability. The experiment was also reliant on the employment of voice technologies, which indicates that schools must modify their classroom and ensure a voice-enabled development environment and a microphone are available. It may also be challenging to have all students talking at the same time – a potential issue that we have avoided because of the mandatory online education during the pandemic. Finally, future researchers may consider replicating the experiment with the participation of programmers with different levels of ability. It is possible that more advanced programmers who are used to writing codes and able to formulate logic correctly may find the voice programming approach more beneficial.

Conclusion

In this paper, we investigated the pedagogical potential of a voice programming approach concerning attitude, self-efficacy, code correctness, and coding speed. Our results demonstrate that although voice as an input modality decreases negativity, it also decreases control. This opposite effect reveals that both attitude and self-efficacy factors are positively and negatively affected, respectively, by the voice programming approach. Using a voice interface also allows students to code faster when the activities are easy but not when they are moderate or difficult. In our code correctness analysis, we found that utilizing voice input is only desirable for easy and moderate machine problems. Overall, our study upholds the pedagogical potential of utilizing voice as an input modality in writing computer programs. Considering these results, future researchers may explore the best way to integrate voice technologies either to replace or supplement keyboards.