Abstract
While self-supervised learning techniques are often used to mine hidden knowledge from unlabeled data via modeling multiple views, it is unclear how to perform effective representation learning in a complex and inconsistent context. To this end, we propose a new multi-view self-supervised learning method, namely consistency and complementarity network (CoCoNet), to comprehensively learn global inter-view consistent and local cross-view complementarity-preserving representations from multiple views. To capture crucial common knowledge which is implicitly shared among views, CoCoNet employs a global consistency module that aligns the probabilistic distribution of views by utilizing an efficient discrepancy metric based on the generalized sliced Wasserstein distance. To incorporate cross-view complementary information, CoCoNet proposes a heuristic complementarity-aware contrastive learning approach, which extracts a complementarity-factor jointing cross-view discriminative knowledge and uses it as the contrast to guide the learning of view-specific encoders. Theoretically, the superiority of CoCoNet is verified by our information-theoretical-based analyses. Empirically, our thorough experimental results show that CoCoNet outperforms the state-of-the-art self-supervised methods by a significant margin, for instance, CoCoNet beats the best benchmark method by an average margin of 1.1% on ImageNet.
Original language | English (US) |
---|---|
Pages (from-to) | 7220-7238 |
Number of pages | 19 |
Journal | IEEE Transactions on Knowledge and Data Engineering |
Volume | 35 |
Issue number | 7 |
DOIs | |
State | Published - Jul 1 2023 |
All Science Journal Classification (ASJC) codes
- Information Systems
- Computer Science Applications
- Computational Theory and Mathematics
Keywords
- Unsupervised learning
- Wasserstein distance
- multi-view
- regularization
- representation learning
- self-supervised learning