TY - GEN
T1 - Representation learning via semi-supervised autoencoder for multi-task learning
AU - Zhuang, Fuzhen
AU - Luo, Dan
AU - Jin, Xin
AU - Xiong, Hui
AU - Luo, Ping
AU - He, Qing
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2016/1/5
Y1 - 2016/1/5
N2 - Multi-task learning aims at learning multiple related but different tasks. In general, there are two ways for multi-task learning. One is to exploit the small set of labeled data from all tasks to learn a shared feature space for knowledge sharing. In this way, the focus is on the labeled training samples while the large amount of unlabeled data is not sufficiently considered. Another way has a focus on how to share model parameters among multiple tasks based on the original features space. Here, the question is whether it is possible to combine the advantages of both approaches and develop a method, which can simultaneously learn a shared subspace for multiple tasks and learn the prediction models in this subspace? To this end, in this paper, we propose a feature representation learning framework, which has the ability in combining the autoencoders, an effective way to learn good representation by using large amount of unlabeled data, and model parameter regularization methods into a unified model for multi-task learning. Specifically, all the tasks share the same encoding and decoding weights to find their latent feature representations, based on which a regularized multi-task softmax regression method is used to find a distinct prediction model for each task. Also, some commonalities are considered in the prediction models according to the relatedness of multiple tasks. There are several advantages of the proposed model: 1) it can make full use of large amount of unlabeled data from all the tasks to learn satisfying representations, 2) the learning of distinct prediction models can benefit from the success of autoencoder, 3) since we incorporate the labeled information into the softmax regression method, so the learning of feature representation is indeed in a semi-supervised manner. Therefore, our model is a semi-supervised autoencoder for multi-task learning (SAML for short). Finally, extensive experiments on three real-world data sets demonstrate the effectiveness of the proposed framework. Moreover, the feature representation obtained in this model can be used by other methods to obtain improved results.
AB - Multi-task learning aims at learning multiple related but different tasks. In general, there are two ways for multi-task learning. One is to exploit the small set of labeled data from all tasks to learn a shared feature space for knowledge sharing. In this way, the focus is on the labeled training samples while the large amount of unlabeled data is not sufficiently considered. Another way has a focus on how to share model parameters among multiple tasks based on the original features space. Here, the question is whether it is possible to combine the advantages of both approaches and develop a method, which can simultaneously learn a shared subspace for multiple tasks and learn the prediction models in this subspace? To this end, in this paper, we propose a feature representation learning framework, which has the ability in combining the autoencoders, an effective way to learn good representation by using large amount of unlabeled data, and model parameter regularization methods into a unified model for multi-task learning. Specifically, all the tasks share the same encoding and decoding weights to find their latent feature representations, based on which a regularized multi-task softmax regression method is used to find a distinct prediction model for each task. Also, some commonalities are considered in the prediction models according to the relatedness of multiple tasks. There are several advantages of the proposed model: 1) it can make full use of large amount of unlabeled data from all the tasks to learn satisfying representations, 2) the learning of distinct prediction models can benefit from the success of autoencoder, 3) since we incorporate the labeled information into the softmax regression method, so the learning of feature representation is indeed in a semi-supervised manner. Therefore, our model is a semi-supervised autoencoder for multi-task learning (SAML for short). Finally, extensive experiments on three real-world data sets demonstrate the effectiveness of the proposed framework. Moreover, the feature representation obtained in this model can be used by other methods to obtain improved results.
KW - Autoencoder
KW - Multi-task learning
KW - Representation Learning
KW - Semi-supervised Learning
UR - http://www.scopus.com/inward/record.url?scp=84963543323&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84963543323&partnerID=8YFLogxK
U2 - 10.1109/ICDM.2015.22
DO - 10.1109/ICDM.2015.22
M3 - Conference contribution
AN - SCOPUS:84963543323
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 1141
EP - 1146
BT - Proceedings - 15th IEEE International Conference on Data Mining, ICDM 2015
A2 - Aggarwal, Charu
A2 - Zhou, Zhi-Hua
A2 - Tuzhilin, Alexander
A2 - Xiong, Hui
A2 - Wu, Xindong
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 15th IEEE International Conference on Data Mining, ICDM 2015
Y2 - 14 November 2015 through 17 November 2015
ER -