In online karaoke, the decision process in choosing a song is different from that in music radio, because users usually prefer songs that meet their vocal competence besides their tastes. Traditional music recommendation methods typically model users' personalized preference for songs in terms of content and style. However, this can be improved by considering the degree of matching the vocal competence (e.g. pitch, volume, and rhythm) of users to the vocal requirements of songs. To this end, in this paper, we develop a karaoke recommender system by incorporating vocal competence. Along this line, we propose a joint modeling method named CBNTF by exploiting the mutual enhancement between non-negative tensor factorization (NTF) and support vector machine (SVM). Specifically, we first extract vocal (i.e., pitch, volume, and rhythm) ratings of a user for a song from his/her singing records. Since these vocal ratings encode users' vocal competence from three aspects, we treat these vocal ratings as a tensor, exploit an NTF method, and learn the latent features of users' vocal metrics. These factorized features are simultaneously fed into an SVM classifier and then we use the trained classifier to predict the overall rating of a user with respect to a song. In addition, we propose an enhanced objective function to exploit the mutual enhancement between NTF and SVM, and devise an effective method to solve this objective as a coupled least-squares optimization problem via a maximum margin framework. With the estimated model, we compute the similarity between users and songs in terms of pitch, volume and rhythm and recommend songs to users. Finally, we conduct extensive experiments with real-world online karaoke data. The results demonstrate the effectiveness of our method.