Due to the open nature of voice input, voice assistant (VA) systems (e.g., Google Home and Amazon Alexa) are vulnerable to various security and privacy leakages (e.g., credit card numbers, passwords), especially when issuing critical user commands involving large purchases, critical calls, etc. Though the existing VA systems may employ voice features to identify users, they are still vulnerable to various acoustic-based attacks (e.g., impersonation, replay, and hidden command attacks). In this work, we propose a training-free voice authentication system, WearID, leveraging the cross-domain speech similarity between the audio domain and the vibration domain to provide enhanced security to the ever-growing deployment of VA systems. In particular, when a user gives a critical command, WearID exploits motion sensors on the user's wearable device to capture the aerial speech in the vibration domain and verify it with the speech captured in the audio domain via the VA device's microphone. Compared to existing approaches, our solution is low-effort and privacy-preserving, as it neither requires users' active inputs (e.g., replying messages/calls) nor to store users' privacy-sensitive voice samples for training. In addition, our solution exploits the distinct vibration sensing interface and its short sensing range to sound (e.g., 25cm) to verify voice commands. Examining the similarity of the two domains' data is not trivial. The huge sampling rate gap (e.g., 8000Hz vs. 200Hz) between the audio and vibration domains makes it hard to compare the two domains' data directly, and even tiny data noises could be magnified and cause authentication failures. To address the challenges, we investigate the complex relationship between the two sensing domains and develop a spectrogram-based algorithm to convert the microphone data into the lower-frequency "motion sensor data"to facilitate cross-domain comparisons. We further develop a user authentication scheme to verify that the received voice command originates from the legitimate user based on the cross-domain speech similarity of the received voice commands. We report on extensive experiments to evaluate the WearID under various audible and inaudible attacks. The results show WearID can verify voice commands with 99.8% accuracy in the normal situation and detect 97.2% fake voice commands from various attacks, including impersonation/replay attacks and hidden voice/ultrasound attacks.