TY - JOUR
T1 - S2OSC
T2 - A Holistic Semi-Supervised Approach for Open Set Classification
AU - Yang, Yang
AU - Wei, Hongchen
AU - Sun, Zhen Qiang
AU - Li, Guang Yu
AU - Zhou, Yuanchun
AU - Xiong, Hui
AU - Yang, Jian
N1 - Funding Information:
This work is supported by the National Natural Science Foundation of China under Grant (62006118, 62006119, 61836013, 91746301), Natural Science Foundation of Jiangsu Province of China under Grant (BK20200460, BK20190444). CCF-Baidu Open Fund (CCF-BAIDU OF2020011), Baidu TIC Open Fund. Authors’ addresses: Y. Yang (corresponding author), H. Wei, G.-Y. Li, and J. Yang, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing, Jiangsu, 210094, China, emails: {yyang, weihc, guangyu.li2017, csjyang}@njust.edu.cn; Z.-Q. Sun, Nanjing Normal University, 1 Wenyuanlu, Nanjing, Jiangsu, 210023, China; email: enderman19980125@outlook.com; Y. Zhou, Computer Network and Information Center, Chinese Academy of Sciences, 2 Dongshengnanlu, Beijing, 100083, China; email: zyc@cnic.cn; H. Xiong, Rutgers University, 1 Washington Park, Newark, New Jersey, 07102; email: hxiong@rutgers.edu. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. © 2021 Association for Computing Machinery. 1556-4681/2021/08-ART34 $15.00 https://doi.org/10.1145/3468675
Publisher Copyright:
© 2021 Association for Computing Machinery.
PY - 2022/4
Y1 - 2022/4
N2 - Open set classification (OSC) tackles the problem of determining whether the data are in-class or out-of-class during inference, when only provided with a set of in-class examples at training time. Traditional OSC methods usually train discriminative or generative models with the owned in-class data, and then utilize the pre-trained models to classify test data directly. However, these methods always suffer from the embedding confusion problem, i.e., partial out-of-class instances are mixed with in-class ones of similar semantics, making it difficult to classify. To solve this problem, we unify semi-supervised learning to develop a novel OSC algorithm, S2OSC, which incorporates out-of-class instances filtering and model re-training in a transductive manner. In detail, given a pool of newly coming test data, S2OSC firstly filters the mostly distinct out-of-class instances using the pre-trained model, and annotates super-class for them. Then, S2OSC trains a holistic classification model by combing in-class and out-of-class labeled data with the remaining unlabeled test data in a semi-supervised paradigm. Furthermore, considering that data are usually in the streaming form in real applications, we extend S2OSC into an incremental update framework (I-S2OSC), and adopt a knowledge memory regularization to mitigate the catastrophic forgetting problem in incremental update. Despite the simplicity of proposed models, the experimental results show that S2OSC achieves state-of-the-art performance across a variety of OSC tasks, including 85.4% of F1 on CIFAR-10 with only 300 pseudo-labels. We also demonstrate how S2OSC can be expanded to incremental OSC setting effectively with streaming data.
AB - Open set classification (OSC) tackles the problem of determining whether the data are in-class or out-of-class during inference, when only provided with a set of in-class examples at training time. Traditional OSC methods usually train discriminative or generative models with the owned in-class data, and then utilize the pre-trained models to classify test data directly. However, these methods always suffer from the embedding confusion problem, i.e., partial out-of-class instances are mixed with in-class ones of similar semantics, making it difficult to classify. To solve this problem, we unify semi-supervised learning to develop a novel OSC algorithm, S2OSC, which incorporates out-of-class instances filtering and model re-training in a transductive manner. In detail, given a pool of newly coming test data, S2OSC firstly filters the mostly distinct out-of-class instances using the pre-trained model, and annotates super-class for them. Then, S2OSC trains a holistic classification model by combing in-class and out-of-class labeled data with the remaining unlabeled test data in a semi-supervised paradigm. Furthermore, considering that data are usually in the streaming form in real applications, we extend S2OSC into an incremental update framework (I-S2OSC), and adopt a knowledge memory regularization to mitigate the catastrophic forgetting problem in incremental update. Despite the simplicity of proposed models, the experimental results show that S2OSC achieves state-of-the-art performance across a variety of OSC tasks, including 85.4% of F1 on CIFAR-10 with only 300 pseudo-labels. We also demonstrate how S2OSC can be expanded to incremental OSC setting effectively with streaming data.
KW - Open set classification
KW - embedding confusion
KW - incremental learning
KW - semi-supervised learning
UR - http://www.scopus.com/inward/record.url?scp=85114990151&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85114990151&partnerID=8YFLogxK
U2 - 10.1145/3468675
DO - 10.1145/3468675
M3 - Article
AN - SCOPUS:85114990151
SN - 1556-4681
VL - 16
JO - ACM Transactions on Knowledge Discovery from Data
JF - ACM Transactions on Knowledge Discovery from Data
IS - 2
M1 - 34
ER -