Deep joint two-stream Wasserstein auto-encoder and selective attention alignment for unsupervised domain adaptation

Chen, Zhihong; Chen, Chao; Jin, Xinyu; Liu, Yifu; Cheng, Zhaowei

doi:10.1007/s00521-019-04262-1

Deep joint two-stream Wasserstein auto-encoder and selective attention alignment for unsupervised domain adaptation

Original Article
Published: 03 June 2019

Volume 32, pages 7489–7502, (2020)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Zhihong Chen¹,
Chao Chen¹,
Xinyu Jin¹,
Yifu Liu¹ &
…
Zhaowei Cheng¹

1072 Accesses
13 Citations
Explore all metrics

Abstract

Domain adaptation refers to the process of utilizing the labeled source domain data to learn a model that can perform well in the target domain with limited or missing labels. Several domain adaptation methods combining image translation and feature alignment have been recently proposed. However, there are two primary drawbacks of such methods. First, the majority of the methods assume that synthetic target images have the same distribution as real target images, and thus, only the synthetic target images are employed for training the target classifier, which makes the model’s performance significantly dependent on the quality of the generated images. Second, most of the methods blindly align the discriminative content information by merging spatial and channel-wise information, thereby ignoring the relationships among channels. To address these issues, a two-step approach that joints two-stream Wasserstein auto-encoder (WAE) and selective attention (SA) alignment, named J2WSA, is proposed in this study. In the pre-training step, the two-stream WAE is employed for mapping the four domains to a shared nice manifold structure by minimizing the Wasserstein distance between the distribution of each domain and the corresponding prior distribution. During the fine-tuning step, the SA alignment model initialized by the two-stream WAE is applied for automatically selecting the style part of channels for alignment, while simultaneously suppressing the content part alignment using the SA block. Extensive experiments indicate that the combination of the aforementioned two models can achieve state-of-the-art performance on the Office-31 and digital domain adaptation benchmarks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on Image Data Augmentation for Deep Learning

Article Open access 06 July 2019

A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets

Article 10 June 2021

CLIP-Adapter: Better Vision-Language Models with Feature Adapters

Article 15 September 2023

Notes

https://github.com/A-bone1/J2WSA.

References

Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: International conference on neural information processing systems, pp 1097–1105
Zhang H, Ji Y, Huang W, Liu L (2018) Sitcom-star-based clothing retrieval for video advertising: a deep learning framework. Neural Comput Appl. https://doi.org/10.1007/s00521-018-3579-x
Google Scholar
Badrinarayanan V, Kendall A, Cipolla R (2015) Segnet: a deep convolutional encoder-decoder architecture for scene segmentation. IEEE Trans Pattern Anal Mach Intell 99:1–1
Google Scholar
Gatys LA, Ecker AS, Bethge M (2016) Image style transfer using convolutional neural networks. In: The IEEE conference on computer vision and pattern recognition, pp 2414–2423
Candela JQ, Sugiyama M, Schwaighofer A, Lawrence ND (2009) Dataset shift in machine learning. MIT Press, Cambridge
Google Scholar
Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359
Google Scholar
Kang G, Zheng L, Yan Y, Yang Y (2018) Deep adversarial attention alignment for unsupervised domain adaptation: The benefit of target expectation maximization. In: European conference on computer vision, pp 401–416
Long M, Wang J, Ding G, Sun J, Yu PS (2014) Transfer joint matching for unsupervised domain adaptation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1410–1417
Tzeng E, Hoffman J, Zhang N, Saenko K, Darrell T (2014) Deep domain confusion: maximizing for domain invariance. arXiv preprint arXiv:1412.3474
Long M, Zhu H, Wang J, Jordan MI (2017) Deep transfer learning with joint adaptation networks. In: International conference on machine learning, pp 2208–2217
Sun B, Feng J, Saenko K (2016) Return of frustratingly easy domain adaptation. AAAI 6(7):8
Google Scholar
Sun B, Saenko K (2016) Deep coral: correlation alignment for deep domain adaptation. In: European conference on computer vision. Springer, pp 443–450
Zellinger W, Grubinger T, Lughofer E, Natschläger T, Saminger-Platz S (2017) Central moment discrepancy (cmd) for domain-invariant representation learning. arXiv preprint arXiv:1702.08811
Tzeng E, Hoffman J, Saenko K, Darrell T (2017) Adversarial discriminative domain adaptation. In: The IEEE conference on computer vision and pattern recognition (CVPR), vol 1, p 4
Ganin Y, Ustinova E, Ajakan H, Germain P, Larochelle H, Laviolette F, Marchand M, Lempitsky V (2016) Domain-adversarial training of neural networks. J Mach Learn Res 17(1):2030–2096
MathSciNet MATH Google Scholar
Hoffman J, Tzeng E, Park T, Zhu J-Y, Isola P, Saenko K, Efros AA, Darrell T (2018) Cycada: cycle-consistent adversarial domain adaptation. In: International conference on machine learning, pp 1989–1998
Bousmalis K, Silberman N, Dohan D, Erhan D, Krishnan D (2017) Unsupervised pixel-level domain adaptation with generative adversarial networks. In: The IEEE conference on computer vision and pattern recognition (CVPR), vol 1, p 7
Zhang H, Sun Y, Liu L, Wang X, Li L, Liu W (2018) Clothingout: a category-supervised gan model for clothing segmentation and retrieval. Neural Comput Appl. https://doi.org/10.1007/s00521-018-3691-y
Google Scholar
Liu M-Y, Breuel T, Kautz J (2017) Unsupervised image-to-image translation networks. In: Advances in Neural Information Processing Systems, pp 700–708
Liu AH, Liu Y-C, Yeh Y-Y, Wang Y-C F (2018) A unified feature disentangler for multi-domain image translation and manipulation. In: Neural information processing systems, pp 2590–2599
Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville AC, Bengio Y (2014) Generative adversarial nets. Adv Neural Inf Process Syst 27:2672–2680
Google Scholar
Kingma DP, Welling M (2014) Auto-encoding variational bayes. In: International conference on learning representations
Long M, Cao Y, Wang J, Jordan M (2015) Learning transferable features with deep adaptation networks. In: International conference on machine learning, pp 97–105
Yang Z, Yu W, Liang P, Guo H, Xia L, Zhang F, Ma Y, Ma J (2018) Deep transfer learning for military object recognition under small training set condition. Neural Comput Appl. https://doi.org/10.1007/s00521-018-3468-3
Google Scholar
Jiang B, Chen C, Jin X (2018) Unsupervised domain adaptation with target reconstruction and label confusion in the common subspace. Neural Comput Appl. https://doi.org/10.1007/s00521-018-3846-x
Google Scholar
Zhu JY, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: IEEE international conference on computer vision, pp 2242–2251
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision. Springer, pp 818–833
Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization In: The IEEE conference on computer vision and pattern recognition, pp 2921–2929
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D et al (2017) Grad-cam: Visual explanations from deep networks via gradient-based localization. In: ICCV, pp 618–626
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: International conference on learning representations
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. In: International conference on machine learning, pp 2048–2057
Yang Z, He X, Gao J, Deng L, Smola AJ (2016) Stacked attention networks for image question answering. In: 2006 IEEE conference on computer vision and pattern recognition (CVPR), pp 21–29
Sharma S, Kiros R, Salakhutdinov R (2015) Action recognition using visual attention. In: International conference on learning representations
Lee C-Y, Osindero S (2016) Recursive recurrent nets with attention modeling for OCR in the wild. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), 2016, pp 2231–2239
Ji Y, Zhang H, Wu QJ (2018) Salient object detection via multi-scale attention CNN. Neurocomputing 322:130–140
Google Scholar
Tolstikhin IO, Bousquet O, Gelly S, Schoelkopf B (2018) Wasserstein auto-encoders. In: International conference on learning representations
Villani C (2003) Topics in optimal transportation. AMS Graduate Studies in Mathematics, p 370
Bousquet O, Gelly S, Tolstikhin I, Simon-Gabriel C-J, Schoelkopf B (2017) From optimal transport to generative modeling: the vegan cookbook. arXiv preprint arXiv:1705.07642
Smola A, Gretton A, Song L, Schölkopf B (2007) A hilbert space embedding for distributions. In: International conference on algorithmic learning theory, pp 13–31
Google Scholar
Gretton A, Sriperumbudur B, Sejdinovic D, Strathmann H, Balakrishnan S, Pontil M, Kenji F (2012) Optimal kernel choice for large-scale two-sample tests. In: Advances in neural information processing systems, pp 1205–1213
Netzer Y, Wang T, Coates A, Bissacco A, Wu B, Ng AY (2011) Reading digits in natural images with unsupervised feature learning. In: Nips workshop on deep learning and unsupervised feature learning
Lecun YL, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Google Scholar
Hull JJ (2002) A database for handwritten text recognition research. IEEE Trans Pattern Anal Mach Intell 16(5):550–554
Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: The IEEE conference on computer vision and pattern recognition (CVPR)
Arbelez P, Maire M, Fowlkes C, Malik J (2011) Contour detection and hierarchical image segmentation. IEEE Trans Pattern Anal Mach Intell 33(5):898–916
Google Scholar
Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T (2014) Decaf: a deep convolutional activation feature for generic visual recognition. In: International conference on machine learning, pp 647–655

Download references

Acknowledgements

This work was supported by the Opening Foundation of the State Key Laboratory (No. 2014KF06) and the National Science and Technology Major Project (No. 2013ZX03005013).

Author information

Authors and Affiliations

Institution of Information Science and Electrical Engineering, Zhejiang University, Hangzhou, 310037, Zhejiang, China
Zhihong Chen, Chao Chen, Xinyu Jin, Yifu Liu & Zhaowei Cheng

Authors

Zhihong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Chao Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xinyu Jin
View author publications
You can also search for this author in PubMed Google Scholar
Yifu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhaowei Cheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xinyu Jin.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, Z., Chen, C., Jin, X. et al. Deep joint two-stream Wasserstein auto-encoder and selective attention alignment for unsupervised domain adaptation. Neural Comput & Applic 32, 7489–7502 (2020). https://doi.org/10.1007/s00521-019-04262-1

Download citation

Received: 10 December 2018
Accepted: 14 May 2019
Published: 03 June 2019
Issue Date: June 2020
DOI: https://doi.org/10.1007/s00521-019-04262-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep joint two-stream Wasserstein auto-encoder and selective attention alignment for unsupervised domain adaptation

Abstract

Access this article

Similar content being viewed by others

A survey on Image Data Augmentation for Deep Learning

A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets

CLIP-Adapter: Better Vision-Language Models with Feature Adapters

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Deep joint two-stream Wasserstein auto-encoder and selective attention alignment for unsupervised domain adaptation

Abstract

Access this article

Similar content being viewed by others

A survey on Image Data Augmentation for Deep Learning

A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets

CLIP-Adapter: Better Vision-Language Models with Feature Adapters

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation