Over 60% of human languages use pitch patterns (i.e., tones) to distinguish lexical meanings or grammatical forms of words. In speech, tones and segments belong to different acoustic dimensions, and their relative cue weighting in spoken word recognition appears to create some controversy in the literature. Recently, more advanced techniques, like Eye-tracking or EEG, demonstrated the crucial role of lexical tones in spoken word recognition. In this talk, we report our recent findings, based on a series of Eye-tracking experiments, supported by Australian Research Council over the last 4 years. We first demonstrate the incrementality of tone processing in spoken word recognition and highlight tonal listeners’ high sensitivity to F0 in real time language processing. Second, we will show the obligatory role of lexical tones, with or without their presence in the acoustic input, in bilingual spoken word recognition. At the theoretical level, these empirical efforts show that lexical tones, as part of the phonological representations of tonal languages, play a comparable role as segments in Speech Perception and Spoken Word Recognition in both monolingual and bilingual populations.