Skip to main content
Log in

Deep joint two-stream Wasserstein auto-encoder and selective attention alignment for unsupervised domain adaptation

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Domain adaptation refers to the process of utilizing the labeled source domain data to learn a model that can perform well in the target domain with limited or missing labels. Several domain adaptation methods combining image translation and feature alignment have been recently proposed. However, there are two primary drawbacks of such methods. First, the majority of the methods assume that synthetic target images have the same distribution as real target images, and thus, only the synthetic target images are employed for training the target classifier, which makes the model’s performance significantly dependent on the quality of the generated images. Second, most of the methods blindly align the discriminative content information by merging spatial and channel-wise information, thereby ignoring the relationships among channels. To address these issues, a two-step approach that joints two-stream Wasserstein auto-encoder (WAE) and selective attention (SA) alignment, named J2WSA, is proposed in this study. In the pre-training step, the two-stream WAE is employed for mapping the four domains to a shared nice manifold structure by minimizing the Wasserstein distance between the distribution of each domain and the corresponding prior distribution. During the fine-tuning step, the SA alignment model initialized by the two-stream WAE is applied for automatically selecting the style part of channels for alignment, while simultaneously suppressing the content part alignment using the SA block. Extensive experiments indicate that the combination of the aforementioned two models can achieve state-of-the-art performance on the Office-31 and digital domain adaptation benchmarks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. https://github.com/A-bone1/J2WSA.

References

  1. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: International conference on neural information processing systems, pp 1097–1105

  2. Zhang H, Ji Y, Huang W, Liu L (2018) Sitcom-star-based clothing retrieval for video advertising: a deep learning framework. Neural Comput Appl. https://doi.org/10.1007/s00521-018-3579-x

    Google Scholar 

  3. Badrinarayanan V, Kendall A, Cipolla R (2015) Segnet: a deep convolutional encoder-decoder architecture for scene segmentation. IEEE Trans Pattern Anal Mach Intell 99:1–1

    Google Scholar 

  4. Gatys LA, Ecker AS, Bethge M (2016) Image style transfer using convolutional neural networks. In: The IEEE conference on computer vision and pattern recognition, pp 2414–2423

  5. Candela JQ, Sugiyama M, Schwaighofer A, Lawrence ND (2009) Dataset shift in machine learning. MIT Press, Cambridge

    Google Scholar 

  6. Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359

    Google Scholar 

  7. Kang G, Zheng L, Yan Y, Yang Y (2018) Deep adversarial attention alignment for unsupervised domain adaptation: The benefit of target expectation maximization. In: European conference on computer vision, pp 401–416

  8. Long M, Wang J, Ding G, Sun J, Yu PS (2014) Transfer joint matching for unsupervised domain adaptation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1410–1417

  9. Tzeng E, Hoffman J, Zhang N, Saenko K, Darrell T (2014) Deep domain confusion: maximizing for domain invariance. arXiv preprint arXiv:1412.3474

  10. Long M, Zhu H, Wang J, Jordan MI (2017) Deep transfer learning with joint adaptation networks. In: International conference on machine learning, pp 2208–2217

  11. Sun B, Feng J, Saenko K (2016) Return of frustratingly easy domain adaptation. AAAI 6(7):8

    Google Scholar 

  12. Sun B, Saenko K (2016) Deep coral: correlation alignment for deep domain adaptation. In: European conference on computer vision. Springer, pp 443–450

  13. Zellinger W, Grubinger T, Lughofer E, Natschläger T, Saminger-Platz S (2017) Central moment discrepancy (cmd) for domain-invariant representation learning. arXiv preprint arXiv:1702.08811

  14. Tzeng E, Hoffman J, Saenko K, Darrell T (2017) Adversarial discriminative domain adaptation. In: The IEEE conference on computer vision and pattern recognition (CVPR), vol 1, p 4

  15. Ganin Y, Ustinova E, Ajakan H, Germain P, Larochelle H, Laviolette F, Marchand M, Lempitsky V (2016) Domain-adversarial training of neural networks. J Mach Learn Res 17(1):2030–2096

    MathSciNet  MATH  Google Scholar 

  16. Hoffman J, Tzeng E, Park T, Zhu J-Y, Isola P, Saenko K, Efros AA, Darrell T (2018) Cycada: cycle-consistent adversarial domain adaptation. In: International conference on machine learning, pp 1989–1998

  17. Bousmalis K, Silberman N, Dohan D, Erhan D, Krishnan D (2017) Unsupervised pixel-level domain adaptation with generative adversarial networks. In: The IEEE conference on computer vision and pattern recognition (CVPR), vol 1, p 7

  18. Zhang H, Sun Y, Liu L, Wang X, Li L, Liu W (2018) Clothingout: a category-supervised gan model for clothing segmentation and retrieval. Neural Comput Appl. https://doi.org/10.1007/s00521-018-3691-y

    Google Scholar 

  19. Liu M-Y, Breuel T, Kautz J (2017) Unsupervised image-to-image translation networks. In: Advances in Neural Information Processing Systems, pp 700–708

  20. Liu AH, Liu Y-C, Yeh Y-Y, Wang Y-C F (2018) A unified feature disentangler for multi-domain image translation and manipulation. In: Neural information processing systems, pp 2590–2599

  21. Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville AC, Bengio Y (2014) Generative adversarial nets. Adv Neural Inf Process Syst 27:2672–2680

    Google Scholar 

  22. Kingma DP, Welling M (2014) Auto-encoding variational bayes. In: International conference on learning representations

  23. Long M, Cao Y, Wang J, Jordan M (2015) Learning transferable features with deep adaptation networks. In: International conference on machine learning, pp 97–105

  24. Yang Z, Yu W, Liang P, Guo H, Xia L, Zhang F, Ma Y, Ma J (2018) Deep transfer learning for military object recognition under small training set condition. Neural Comput Appl. https://doi.org/10.1007/s00521-018-3468-3

    Google Scholar 

  25. Jiang B, Chen C, Jin X (2018) Unsupervised domain adaptation with target reconstruction and label confusion in the common subspace. Neural Comput Appl. https://doi.org/10.1007/s00521-018-3846-x

    Google Scholar 

  26. Zhu JY, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: IEEE international conference on computer vision, pp 2242–2251

  27. Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision. Springer, pp 818–833

  28. Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization In: The IEEE conference on computer vision and pattern recognition, pp 2921–2929

  29. Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D et al (2017) Grad-cam: Visual explanations from deep networks via gradient-based localization. In: ICCV, pp 618–626

  30. Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: International conference on learning representations

  31. Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. In: International conference on machine learning, pp 2048–2057

  32. Yang Z, He X, Gao J, Deng L, Smola AJ (2016) Stacked attention networks for image question answering. In: 2006 IEEE conference on computer vision and pattern recognition (CVPR), pp 21–29

  33. Sharma S, Kiros R, Salakhutdinov R (2015) Action recognition using visual attention. In: International conference on learning representations

  34. Lee C-Y, Osindero S (2016) Recursive recurrent nets with attention modeling for OCR in the wild. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), 2016, pp 2231–2239

  35. Ji Y, Zhang H, Wu QJ (2018) Salient object detection via multi-scale attention CNN. Neurocomputing 322:130–140

    Google Scholar 

  36. Tolstikhin IO, Bousquet O, Gelly S, Schoelkopf B (2018) Wasserstein auto-encoders. In: International conference on learning representations

  37. Villani C (2003) Topics in optimal transportation. AMS Graduate Studies in Mathematics, p 370

  38. Bousquet O, Gelly S, Tolstikhin I, Simon-Gabriel C-J, Schoelkopf B (2017) From optimal transport to generative modeling: the vegan cookbook. arXiv preprint arXiv:1705.07642

  39. Smola A, Gretton A, Song L, Schölkopf B (2007) A hilbert space embedding for distributions. In: International conference on algorithmic learning theory, pp 13–31

    Google Scholar 

  40. Gretton A, Sriperumbudur B, Sejdinovic D, Strathmann H, Balakrishnan S, Pontil M, Kenji F (2012) Optimal kernel choice for large-scale two-sample tests. In: Advances in neural information processing systems, pp 1205–1213

  41. Netzer Y, Wang T, Coates A, Bissacco A, Wu B, Ng AY (2011) Reading digits in natural images with unsupervised feature learning. In: Nips workshop on deep learning and unsupervised feature learning

  42. Lecun YL, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Google Scholar 

  43. Hull JJ (2002) A database for handwritten text recognition research. IEEE Trans Pattern Anal Mach Intell 16(5):550–554

    Google Scholar 

  44. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: The IEEE conference on computer vision and pattern recognition (CVPR)

  45. Arbelez P, Maire M, Fowlkes C, Malik J (2011) Contour detection and hierarchical image segmentation. IEEE Trans Pattern Anal Mach Intell 33(5):898–916

    Google Scholar 

  46. Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T (2014) Decaf: a deep convolutional activation feature for generic visual recognition. In: International conference on machine learning, pp 647–655

Download references

Acknowledgements

This work was supported by the Opening Foundation of the State Key Laboratory (No. 2014KF06) and the National Science and Technology Major Project (No. 2013ZX03005013).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xinyu Jin.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, Z., Chen, C., Jin, X. et al. Deep joint two-stream Wasserstein auto-encoder and selective attention alignment for unsupervised domain adaptation. Neural Comput & Applic 32, 7489–7502 (2020). https://doi.org/10.1007/s00521-019-04262-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-019-04262-1

Keywords

Navigation