There is limited research on the effectiveness of aural versus audiovisual input in incidental second language vocabulary learning (IVL). The current study aims to determine which mode of input, viewing or listening, is more conducive to IVL, and if the level of visual aid for the words leads to a significant difference in word gains. English Language Teaching majors (N=58) were randomly assigned to either watch animated educational videos or listen to the audio of the same videos. Both groups were given a pretest and posttest of meaning recognition and recall. The results suggest that both modes of input are equally effective for vocabulary learning and that the role of visual aids may not be as significant as previously thought. The study also found that prior vocabulary knowledge level positively correlated with vocabulary gains in the viewing group, but not in the listening group. The results are discussed in reference to the multimedia, temporal contiguity, and coherence principles of the cognitive theory of multimedia learning and the dual-coding theory.