It is commonly argued that music originated in human evolution as an adaptation to selective pressures. In this paper we present an alternative account in which music originated from a more general adaptation known as a Theory of Mind (ToM). ToM allows an individual to recognise the mental and emotional state of conspecifics, and is pivotal in the cultural transmission of knowledge. We propose that a specific form of ToM, Affective Engagement, provides the foundation for the emergence of music. Underpinned by the mirror neuron system of empathy and imitation, music achieves engagement by drawing from pre-existing functions across multiple modalities. As a multimodal phenomenon, music generates an emotional experience through the broadened activation of channels that are to be empathically matched by the audio-visual mirror neuron system.