Voice Assistant (VAs) are increasingly popular for human-computer interaction (HCI) smartphones. To help users automatically conduct various tasks, these tools usually come with high privileges and are able to access sensitive system resources. A comprised VA is a stepping stone for attackers to hack into users’ phones. Prior work has experimentally demonstrated that VAs can be a promising attack point for HCI tools. However, the state-of-the-art approaches require ad-hoc mechanisms to activate VAs that are non-trivial to trigger in practice and are usually limited to specific mobile platforms. To mitigate the limitations faced by the state-of-the-art, we propose a novel attack approach, namely Vaspy, which crafts the users’ “activation voice” by silently listening to users’ phone calls. Once the activation voice is formed, Vaspy can select a suitable occasion to launch an attack. Vaspy embodies a machine learning model that learns suitable attacking times to prevent the attack from being noticed by the user. We implement a proof-of-concept spyware and test it on a range of popular Android phones. The experimental results demonstrate that this approach can silently craft the activation voice of the users and launch attacks. In the wrong hands, a technique like Vaspy can enable automated attacks to HCI tools. By raising awareness, we urge the community and manufacturers to revisit the risks of VAs and subsequently revise the activation logic to be resilient to the style of attacks proposed in this work.