Dynamic parameterized learning for unsupervised domain adaptation*

2023-12-11 02:41:00RunhuaJIANGYahongHAN

Frontiers of Information Technology & Electronic Engineering 2023年11期

Runhua JIANG, Yahong HAN

1College of Intelligence and Computing, Tianjin University, Tianjin 300350, China

2Tianjin Key Lab of Machine Learning, Tianjin University, Tianjin 300350, China

E-mail: ddghjikle1@gmail.com; yahong@tju.edu.cn

Abstract: Unsupervised domain adaptation enables neural networks to transfer from a labeled source domain to an unlabeled target domain by learning domain-invariant representations.Recent approaches achieve this by directly matching the marginal distributions of these two domains.Most of them,however, ignore exploration of the dynamic trade-offbetween domain alignment and semantic discrimination learning, thus rendering them susceptible to the problems of negative transfer and outlier samples.To address these issues, we introduce the dynamic parameterized learning framework.First, by exploring domain-level semantic knowledge, the dynamic alignment parameter is proposed, to adaptively adjust the optimization steps of domain alignment and semantic discrimination learning.Besides, for obtaining semantic-discriminative and domain-invariant representations, we propose to align training trajectories on both source and target domains.Comprehensive experiments are conducted to validate the effectiveness of the proposed methods, and extensive comparisons are conducted on seven datasets of three visual tasks to demonstrate their practicability.

Key words: Unsupervised domain adaptation; Optimization steps; Domain alignment; Semantic discrimination https://doi.org/10.1631/FITEE.2200631 CLC number: TP391

1 Introduction

Deep neural networks have achieved remarkable success in diverse computer vision tasks (Li S et al.,2021a;Zhang MY et al., 2021;Wu KH et al., 2023).Unfortunately, most of them rely on access to massive well-labeled training data.However, collecting a large number of well-labeled samples is timeconsuming and laborious.Neural networks tend to have poor generalization on new datasets due to the distribution discrepancy issue (Li S et al., 2021a).As a result,unsupervised domain adaptation(UDA)methods have been widely studied.

The mainstream methodology of UDA methods is to learn domain-invariant representations by optimizing alignment loss functions.For example,Zhang YC et al.(2019) proposed to learn domaininvariant representations by minimizing the multikernel maximum mean discrepancy metric.Long et al.(2017)enhanced transferability of learned features by aligning the joint distributions of multiple domain-specific layers.Subsequently, several research works have found that semantic discriminability is critical.Specifically, Dai et al.(2021)proposed to regularize the geodesic path among intermediate, source, and target domains, thus obtaining strong class discrimination ability.Wang XM et al.(2019) introduced two types of attention mechanisms to highlight discriminative and transferable samples.By emphasizing domain-invariant or semantic discrimination, these loss functions can effectively enhance the representation ability of neural networks.However, independently considering these two characteristics causes networks susceptible to negative transfer and outlier samples.

In fact,for various domain adaptation(DA)scenarios, the degrees of alignment and discrimination should be different.Fig.1 provides an intuitive example.It can be seen that other than bikes, this image contains components of other categories.Excessive alignment will emphasize components with small distribution discrepancy, e.g., backpacks or helmets, while excessive pursuit of discriminability will make neural networks overfit to the real-world domain.Motivated by this observation,some methods have been introduced to find dynamic balance.For instance, some works(Liu et al.,2021;Yang SQ et al.,2021;Balgi and Dukkipati,2022)achieved this consideration by abstracting UDA as the combination of two related tasks: supervised classification of labeled source data and discriminative clustering of unlabeled target data.Furthermore, Tian et al.(2022) proposed to regularize the relationship between domain alignment and discrimination by class-level weights, while Xiao and Zhang (2021)introduced a sample-weighting mechanism.However,because samples randomly drawn from source and target domains cannot cover all valuable categories,these methods are still sensitive to outlier samples in a mini-batch.

Fig.1 Taking an image of a bike in the real-world domain as an example, there exist components of other categories.Excessive alignment will force neural networks to concentrate on backpacks or helmets, which have smaller cross-domain discrepancies than other components

To address this issue, we introduce dynamic parameterized learning (DPL) framework.On one hand, the dynamic alignment parameter (DAP) is calculated by recurrently estimating the consistency between class distributions of source and target domains.During optimization,the processes of domain alignment and discrimination learning are dynamically adjusted by taking DAPs as the weights of loss functions.On the other hand, by considering that these two processes cannot be absolutely separated,we propose the previously contrastive loss(PCL)and adaptive learning parameter(ALP) to constrain the optimization trajectories (Cazenavette et al., 2022)on source and target domains.In each iteration,PCL and ALP first estimate the variations between the previous and current parameters and then enforce the parameter trajectories on the target domain to match those on the source domain.With the above methods, UDA models are able to adaptively adjust the learning processes, thus becoming robust to the outlier samples and negative transfer.

The main contributions of this work can be concluded as follows:

1.We introduce the DPL framework to address the issues of outlier samples and improper trade-offbetween domain alignment and semantic discrimination learning.

2.In this framework, DAP is proposed to dynamically adjust the degrees of domain alignment and semantic discrimination learning.We also propose PCL and ALP to enable learning of crossdomain discriminative representations.

3.Comprehensive experimental analysis is provided to demonstrate the effectiveness of the proposed methods.Extensive comparisons are conducted on seven datasets of regression,classification,and person re-identification(Re-ID)tasks to validate the practicability of the proposed methods.

2 Related works

UDA has been extensively studied in various visual tasks (Han YH et al., 2021; Li S et al., 2021a;Wu AM et al.,2022).Among various tasks, domain adaptation regression is a basic but challenging task due to its continuous solution space.Chen XY et al.(2021)found that regression methods are not robust to feature scaling.Therefore, they introduced a geometrical loss function, termed the representation subspace distance (RSD) loss, to learn transferable representations.Different from regression,classification generates discrete labels by normalizing multiclass probabilities.Most of existing methods follow the methodology of aligning intermediate representations of convolution neural networks.As a result,these features should contain both class discriminability and domain transferability.For discriminability, Li MX et al.(2020) took attention mechanism into account in enhanced transport distance(ETD)and then proposed the attention-aware transport distance and entropy-based regularization.Han ZY et al.(2022)demonstrated that neural networks always include nontransferable parameters, which should not be always updated for domains with small distribution discrepancies.For transferability, the domain-adversarial neural network (DANN) (Ganin et al.,2016)achieves UDA classification by both generative adversarial network and adversarial learning strategies.Dai et al.(2021)found that forming an intermediate domain can greatly enhance the representative ability of learned features.According to these research works, it can be concluded that enhancing semantic discriminability or domain transferability of deep representations can improve generalization performance.

However, how to balance semantic discriminability and domain transferability for effective UDA is still underexplored.Liu et al.(2021)demonstrated that cross-domain alignment should be achieved based on underlying discriminative features of source and target domains, rather than all features.Therefore, they introduced the cycle selftraining (CST) to learn the underlying discriminative features of source and target domains by selfsupervised learning.In contrast, other works (Yang SQ et al., 2021; Balgi and Dukkipati, 2022) considered the domain adaptation task as supervised learning on the source domain and unsupervised learning on the target domain.Both of them first learned discriminative representations from the source domain and then fine-tuned these representations on the target domain by adopting clustering schemes.Moreover, Li WK and Chen (2022) demonstrated that the target risk is bounded by both intra-domain structure and between-domain discrepancy.Wang XM et al.(2019)proposed transferable attention for domain adaptation (TADA), which consists of local attention generated by region-level domain discriminators and global attention generated by imagelevel discriminators.Although the motivations of this present work and the above methods are similar, there exist two important differences.First, the proposed DPL achieves the above motivation from the model optimization perspective.DPL dynamically weights the optimization steps on the source and target domains, while these methods generate sample-wise, class-level, or subdomain-level weights for random samples.Second,DPL is a flexible framework and thus can be used to solve not only DA classification but also regression and Re-ID.

3 Proposed methods

To introduce the proposed methods,we formally define the labeled source domainDs= (xs,ys) and the unlabeled target domainDt= (xt).At each iteration, a neural network with encoderf(·) and projection headg(·)takesxsandxtas the inputs and the correspondingysas supervision.Without losing generality, we assume the loss function of training the neural network as follows:

where the discrimination lossLDtakesysandg(f(xs)) to supervise semantic discrimination learning, and the alignment lossLAaims at enhancing domain transferability.αis a predefined hyperparameter forLA.

3.1 Dynamic alignment parameter

As discussed earlier, optimizing Eq.(1) with fixedαeasily causes imbalance between domain alignment and semantic discrimination learning.Therefore,previous methods(Xiao and Zhang,2021;Tian et al., 2022) proposed to weight each sample or category according to its contributions.However, they are still sensitive to outlier samples in a mini-batch.In addition, computing domain discrepancy based on limited samples tends to be inaccurate.Therefore, we propose to use domain-level knowledge for achieving dynamic balance.In general, there are two kinds of methods for modeling domain-level knowledge (Pan YW et al., 2019; Tanwisuth et al., 2021).The first is to form average latent features for all categories in each domain(Pan YW et al.,2019),while the other is to learn parametric representations (Tanwisuth et al., 2021).Specifically, Tanwisuth et al.(2021) demonstrated that with optimization, parameters in the mapping head are encouraged to form domain-invariant prototypes.Therefore, we can obtain prototypes for source and target domains by optimizing two mapping heads,i.e.,gs(·) andgt(·).In each iteration, they respectively handle features of source or target domain.After being optimized,parameters ings(·)andgt(·),termed asθsandθtrespectively, are encouraged to form prototypes for each domain.Therefore,the dynamic balance can be achieved by estimating the similarity between these prototypes.When prototypes of different domains are similar, the representation space within neural networks is robust to the domain discrepancy.Thus, neural networks should focus more on discrimination learning.Otherwise, optimization contribution of domain alignment should be emphasized.To this end, we define DAP as follows:

whereNis the number of the parametersθ, cos(·)is the cosine distance, andλis a hyper-parameter for regularizing the range of the alignment loss function.By taking DPL to replaceαin Eq.(1), the optimization of domain alignment and discrimination learning can be adaptively controlled.

3.2 Cross-domain discriminability

As Fig.2 presents, in each iteration, neural networks are optimized with dynamic balance between domain alignment and discrimination learning.However,domain alignment and discrimination learning are two closely related processes.Therefore,we further consider jointly enhancing domain transferability and semantic discriminability for effective DA.It is straightforward to find that aligning classlevel representations can achieve this purpose (Tian et al.,2022).However,estimating the transferability for each class is still challenging due to the unavailability of target labels.Therefore, we propose to achieve the above consideration from the optimization perspective.

The core of our method involves constraining the optimization trajectories on the source and target domains to be synchronized, such that semantic knowledge within learned representations is forced to be transferable.Specifically, as discussed in previous works (Liu et al., 2021; Yang SQ et al., 2021;Balgi and Dukkipati, 2022), domain adaptation can be seen as two processes: supervised learning on the source domain and unsupervised learning on the target domain.During optimization of objective functions such as Eq.(1), the decision boundary on the source domain is formed according to the source label, while the boundary of the target domain is obtained by minimizing the statistical distance between the source and target samples.By matching the optimization trajectories of the source and target domains,these two boundaries are hoped to share similar discrimination.Thus, semantic discriminability of the learned representations can be maintained as highly as possible,while transferability is assured by cooperative learning.Accordingly, here we propose PCL based on contrastive learning (Li QB et al.,2021).For supervising the optimization steps on the target domain, PCL is defined as follows:wherefp(·) denotes the previous encoder and cos(·)is the cosine similarity.τis a temperature parameter that equals 0.5.By replacingxtin the second term of the denominator asxs, PCL can be used to supervise the optimization steps on the source domain.With similar methodology of contrastive learning,optimizing the proposed PCL forces the encoderf(·)to decrease domain-specific information in the latent space and maximize the semantic consistency in representations learned by the current and previous parameters.

Fig.2 Illustration of the proposed methods.In each iteration,optimization steps (Cazenavette et al.,2022)on the source and target domains are constrained to be semantically similar.Meanwhile, loss functions of domain alignment and discrimination learning are dynamically balanced by the proposed dynamic parameterized learning

We hope that the proposed PCL can be dynamically optimized.Therefore,we further propose ALP as the confidence of PCL.This is achieved by measuring the representative ability of learned features(Yang YC and Soatto,2020;Wang W et al.,2022).If these features can effectively represent the distributions of the input samples,values of the ALP become large,thus encouraging cross-domain transferability.Otherwise, ALP suppresses the performance of the PCL.Concretely, givenxsandf(xs), frequency distribution of the input imagexsis obtained by Fourier transform and compared with distributions off(xs).Formally, the above operations can be defined as follows:

where Pool(·) and FFT(·) denote the max-pooling operation and Fourier transform, respectively.γis a predefined parameter for regularizing the range of ALP.We can also obtain ALPtby replacingxsbyxt.

3.3 Overall training objective

The overall training objective of this work includes discrimination lossLD, alignment lossLAwith the proposed DAP,and the proposed PCL with ALP.Formally,we propose to optimize the following function:

where the last two terms in the right-hand side are computed to match and supervise the optimization steps on the source and target domains.As the above objective presents, the proposed framework can be used with different alignment lossLAand various discrimination lossLDto solve different tasks.

4 Experiments

In this section, the proposed methods are evaluated and compared with several state-of-the-art UDA methods on three tasks.Thus, we first introduce these three tasks and implementation details.Then, variations of the proposed methods are verified to validate their effectiveness.Finally, the proposed methods are compared with existing methods to demonstrate their practicability.

4.1 Tasks and datasets

4.1.1 UDA regression

UDA regression models take images as inputs and generate constants to describe the visual characteristics of the input images.In this work,datasets dSprites (Higgins et al., 2016) and MPI3D (Gondal et al.,2019)are taken to make comparisons.dSprites is a two-dimensional(2D)synthetic dataset for deep representation learning.It consists of three domains:Color (C), Noisy (N), and Scream (S).These three domains contain 737 280 images,which have five factors of variations.Following Chen XY et al.(2021),three factors are used for the regression experiments:scale, positionX, and positionY, whose possible values are taken from [0.5, 1], [0, 1], and [0, 1], respectively.MPI3D is a simulation-to-real dataset of three-dimensional(3D) objects.There are three domains in this dataset: Toy (T), Realistic (RC), and Real(RL).Each of these domains has 1 036 800 images with seven factors.According to previous works(Gondal et al., 2019;Chen XY et al., 2021),the values of the horizontal and vertical axes are evaluated for regression.For comparing model performance,the mean absolute error(MAE) distance is taken as the metric.

4.1.2 UDA classification

For the classification task, the popular Office-Home(Venkateswara et al.,2017),Office-31(Saenko et al., 2010), and DomainNet (Peng et al., 2019)datasets are used.The Office-Home dataset contains 15 500 images of 65 categories.These images can be split into four distinct domains: Artistic images (Ar), Clip art (Cl), Product images (Pr), and Real-World images (Rw).Therefore, 12 adaptation tasks are formed to evaluate the proposed methods.Office-31 is a real-world benchmark dataset for DA classification.In this dataset,4110 images of 31 categories can be split into three domains: Amazon(A),Webcam(W), and DSLR(D).As a result,six adaptation tasks are set up to make comparisons.

4.1.3 UDA person Re-ID

In this work, three datasets of UDA Re-ID are used to make comparisons, as done by Dai et al.(2021).To be specific, the Market-1501 dataset,which contains over 32 000 images, is first taken(Zheng et al., 2015).Each identity in this dataset has multiple images under different cameras, leading to this dataset being quite challenging.Then,DukeMTMC-ReID (Ristani et al., 2016) is adopted as a different domain.This dataset contains more than 2×106frames of 1080 pixels, which are taken from 60-FPS (FPS is short for frame per second)videos captured by eight cameras.Finally, the realworld dataset MSMT17(Wei LH et al.,2018)is used.This dataset contains 4101 identities and 126 441 bounding boxes.Following Dai et al.(2021), four adaptation tasks are formed: DukeMTMC-ReID→Market-1501, Market-1501→DukeMTMC-ReID,Market-1501→MSMT17, and DukeMTMC-ReID→MSMT17.Popular metrics such as mean average precision(mAP)and Rank-1/5/10(R1/5/10)of cumulative match characteristic (CMC) are used to evaluate the performances.

4.2 Implementation details

4.2.1 UDA regression

Following Chen XY et al.(2021),all images are resized to 224×224 pixels and data augmentation is not used.During training, the batch size is fixed as 128.The learning rates off(·)andg(·) are set to 0.1 and 1.0, respectively.For updating the network parameters, mini-batch stochastic gradient descent(SGD) with a momentum of 0.95 and weight decay of 0.000 05 is adopted.During the experiments,pretrained ResNet-18 (He et al., 2016) and two fully connected layers are taken as models.For optimization, the discrimination lossLDand alignment lossLAare defined as the MAE and RSD functions, respectively (Chen XY et al., 2021).All experiments are implemented on a single GeForce RTX 2080Ti GPU and trained about 20 000 iterations.

4.2.2 UDA classification

Following standard protocol for UDA classification (Li S et al., 2021a; Xiao and Zhang, 2021), we use all the labeled source data and unlabeled target data as training samples and evaluate on the unlabeled target data.To fairly compare with existing methods, we use pretrained ResNet-50 or ResNet-101 (He et al., 2016) as the encoder.In the experiments, the discrimination loss and alignment loss are defined the same as those in previous research(Li S et al., 2021a).All input images are cropped to 224×224 pixel dimensions.The mini-batch SGD optimizer with momentum of 0.9 and the learning rate strategy described by Ganin and Lempitsky (2015)is used to learn the parameters.All experiments are conducted on a single GeForce RTX 2080Ti GPU with 20 000 iterations in total.

4.2.3 UDA person Re-ID

For all UDA Re-ID experiments, the input images are resized to 256×128,and common data augmentation operations are applied.During training,each batch consists of 64 source images with 16 identities and 64 target images with 16 pseudo identities.The initial learning rate is set to 3.5e-4, which will be divided by 10 at the 20thand 40thepochs,respectively.The Adam optimizer with the weight decay of 5×10-4and momentum of 0.9 is adopted in our experiments.Similar to Dai et al.(2021),ResNet-50 pretrained on ImageNet and domain-specific batch normalization layers are adopted as the backbone network.In this work,all experiments of this task are implemented on two GeForce RTX 3090 GPUs.During testing, there are no postprocessing techniques.

4.3 Experimental analysis

In this subsection, we present several experiments to demonstrate the effectiveness of the proposed methods.Due to the regression task being more sensitive than the other two tasks, all experiments are conducted on the dSprites dataset.For comprehensive validation, we first conduct an ablation study based on Eq.(5).Then, we compare the variants of the proposed DAP.Finally,the benefits of matching the optimization trajectories on the source and target domains are discussed.

4.3.1 Ablation study

As Eq.(5) indicates, DAP serves as the coeffi-cient of the alignment loss.The proposed PCL and ALP supervise the optimization trajectories on the source or target domain.Therefore, the ablation study is conducted by removing one of these contributions.The corresponding results are presented in Table 1,from which we can find that removing these contributions always degrades the final performance.First,by comparing the top and bottom rows,it can be observed that by removing the proposed DAP,neural networks are easily affected by inaccurate balance, leading to suboptimal performance.Similarly,without supervising and matching of the optimization trajectories on the source or target domain,the model performance tends to be worse.In addition,the results in the second and fourth rows indicate that the source trajectory has a more important influence than the target trajectory.This verifies the motivation of cross-domain discriminability.Last but not least, the proposed ALP, which adaptively controls the performance of the PCL,is beneficial for the overall performance.

Even with the above results,it is still not demonstrated that DPL can address the issue of outlier samples.Therefore, we present three experiments to validate this ability.Specifically, DPL is trained with outlier samples generated by three methods: (1)perturbing source labels with random noise;(2)perturbing source images with random noise;(3)replacing the source or target images with samples from the other domain.As can be seen from Figs.3a–3c,when the source labels are perturbed by noise randomly sampled from [0, 0.1], [0, 0.5], and [0.5, 1],the proposed DPL can achieve similar MAEs.Similarly, it is robust to outlier images caused by random noise.Last but not least, when the source or target images are replaced, the proposed DPL still tends to work well on S→C and S→N,indicating that the proposed methods can adaptively adjust domain alignment and discrimination learning.

4.3.2 Analysis of dynamic alignment parameter

For further validating the effectiveness of the proposed DAP, we change the hyper-parameterλin Eq.(2).The corresponding results are presented in Fig.4a.For analysis, we further provide the performance after replacing DAP by a constant of 0.001 and cos(·) by the Kullback–Leibler (KL) distance.As shown in Table 2, the performance in Fig.4a always outperforms these results.From Fig.4a, it is obvious that with smallλ,the values of DAP become small, and thus the effectiveness of the alignment loss is reduced,leading to inadequate alignment and worse performance(Xiao and Zhang,2021).By comparing the results of the N→C and S→N scenarios,it can be found that the values ofλdo not seriously influence the overall performance.Therefore, it is demonstrated that even without suitableλ, the proposed DAP can still coordinate with alignment loss functions to achieve dynamic trade-offand effective representations.In addition, it can be found that with suitableλ, methods with DAP can remarkablyoutperform those with 0.001 as the hyper-parameter.Last but not least,by comparing Fig.4a and Table 2,we conclude that compared with KL divergence,the cosine distance adopted in Eq.(2) is more effective.From the above results, it is demonstrated that the proposed DAP can effectively estimate and promote the balance between domain alignment and discrimination learning.

Table 2 Experimental results related to the proposed dynamic alignment parameter

Table 1 Ablation study of the proposed methods

Fig.3 Experimental results of whether dynamic parameterized learning can address the issue of outlier samples: (a–c): perturbing source labels by noise randomly sampled from [0, 0.1], [0, 0.5], [0.5, 1]; (e–g):perturbing source images by noise randomly sampled from [0, 0.1], [0, 0.5], [0.5, 1]; (d, h): replacing source or target images by samples from the other domain.Values in the horizontal axis denote the ratio of the number of perturbed samples to that of all the samples

Fig.4 Experimental results obtained by varying λ and γ: (a) the proposed DAP; (b) PCL with different weights; (c) influence of hyper-parameters in ALPs; (d) influence of hyper-parameters in ALPt.Note that in(c) and (d), only LPCL-s or LPCL-t is used with the proposed ALP.The proposed DPL is used only in (a)

4.3.3 Effectiveness of matching optimization trajectories

Based on the above experiments, we consider the effectiveness of matching the optimization trajectories by PCL and ALP.To this end, we first give PCLswith different constants as the hyperparameters.From Fig.4b, it can be found that these constant hyper-parameters always degrade the regression performance.By comparing Figs.4b and 4c,we note that with the proposed ALP,it is not necessary to find the optimal hyper-parameter ofLPCL-sby numerous experiments.After being equipped with the ALP, the proposed PCL can effectively constrain the optimization trajectory,leading to the best performance on most tasks.This phenomenon demonstrates that by matching the learning process on the source and target domains,models can achieve better adaptation performance.In addition, as the best performance on each adaptation task is achieved with differentγvalues, we conclude that the proposed PCL and ALP can be easily applied to various adaptation scenarios.In Fig.4, we can further compare the influence of ALPsand ALPtby observing Figs.4c and 4d.It is easy to find that with the sameγ, methods trained with ALPsalways work better than those with ALPt.Therefore, the core consideration of PCL and ALP is demonstrated to be empirically reasonable, i.e., using the optimization trajectory on the source domain to guide the learning process on the target domain.

As PCL aims at matching the optimization trajectories on different domains, there exists a natural question about whether the steps of the matched trajectories should be the same.For addressing this issue, we implement ALPsand ALPtwith different values ofγ.As a result, the matched trajectories are controlled to various steps.Table 3 illustrates the results of the above experiments.First, whenγof ALPsand ALPtbecomes different, the overall performance is drastically degraded.This validates the reason that steps of the matched trajectories should be similar.In addition, it is obvious that matching the short-term trajectories is more effective than aligning the long-term steps.Overall, we demonstrate that the steps of the matched trajectories should be similar for learning effective representations.

4.4 Comparison

4.4.1 UDA regression

The proposed methods are first compared with regression methods on both dSprites and MPI3D datasets.According to the methodology,these methods can be split into two categories: machine learning and deep representation learning.To name a few, transfer component analysis (TCA) (Pan SJ et al., 2011) and joint distribution optimal transport (JDOT) (Courty et al., 2017) are two classical machine learning methods.Both of them focus on importance weighting.For deep representation learning methods, RSD (Chen XY et al.,2021) is the latest approach, which concentrates on aligning orthogonal bases.The experimental results are presented in Table 4, from which we can find that our method achieves state-of-the-art performance.To be specific, on the dSprites dataset,the proposed methods outperform ResNet-18 on all adaptation scenarios.Compared with DANN, the proposed methods tend to achieve better performance on difficult tasks such as C→N and N→S.On the MPI3D dataset, our methods outperform RSD on most scenarios except RC→RL.In addition,since RSD can be seen as the baseline model of this work, it is demonstrated that adaptively learning semantic discriminability and dynamically aligning domain-invariant knowledge can improve the overall adaptation performance.

Table 3 Performance of matching different optimization trajectories on the source and target domains

Table 4 Experimental results on the UDA regression task

4.4.2 UDA classification

Results of UDA classification are illustrated in Tables 5–9.From Table 5, it can be found that this work outperforms all the methods on average accuracy.This is attributed to the improvement on tasks such as Ar→Cl, Ar→Pr, Cl→Ar, and Cl→Pr.However,the proposed methods achieve comparable performance on other adaptation tasks.The above phenomenon demonstrates the benefits of the dynamic balance between domain alignment and discrimination learning.On the Office-31 dataset, it is easy to find that almost all the methods cannot perform well on difficult tasks such as D→A,while DPL achieves the best performance.In addition,the proposed methods can be extended to multi-source DA.As can be seen from Tables 8 and 9, DPL achieves the best performance on the multi-source DA cases.Compared with close-set DA,multi-source,open-set,and partial DA are more difficult cases.However,it is easy to find that under the open-set and partial-set settings, DPL is sensitive to the mismatch of label spaces.In our future research, this issue will be addressed by further considering the discrimination learning process of each class.

4.4.3 UDA person Re-ID

Results of UDA person Re-ID are summarized in Table 10.It can be found that the proposed methods achieve state-of-the-art performance.Specifically,compared with generative adversarial network(GAN) based transferring method such as pose disentanglement and adaptation network (PDA-Net)(Li YJ et al., 2019), the proposed methods tend to show better performance by considering the crossdomain discriminability between source and target domains.For the fine-tuning methods such as AD-Cluster (Zhai et al., 2020) and noise resistible mutual-training (NRMT) (Zhao F et al., 2020), the

Table 5 Experimental results on Office-Home

Table 6 Experimental results on Office-31

Table 7 Accuracy (%) on DomainNet for unsupervised domain adaptation

performance improvement achieved by our methods demonstrates that excessively or inadequately aligning the source and target domains is not beneficial for the high-level object Re-ID task.By comparing the joint training methods(Zhong et al.,2019,2021)and our methods, it can also be validated that the dynamic alignment between different domains can enhance the overall adaptation performance.Last but not least,compared with multi-source DA methods (Bai ZC et al., 2021),the performance improvement further validates that matching the optimization trajectories on different domains is beneficial for learning effective features.Overall, from the above experiments, we demonstrate that our methods achieve state-of-the-art performance on the UDA Re-ID task.

Table 8 Accuracy comparisons with multi-source domain adaptation methods on Office-31 dataset

Table 9 Accuracy comparisons with multi-source domain adaptation methods on Office-Home dataset

Table 10 Experimental results on UDA Re-ID

5 Conclusions

In this work,we propose to explore the dynamic balance between semantic discrimination learning and domain alignment learning.We propose DPL to achieve task-adaptive balance by modeling domainlevel semantic knowledge, which is more robust and effective than previous methods based on samplelevel or class-wise knowledge.In addition, we propose PCL and ALP to adaptively match the optimization trajectories on source and target domains,thus obtaining cross-domain discriminative representations.Extensive experimental results are presented to validate the rationality and effectiveness of the proposed methods.

Contributors

Runhua JIANG designed the research and drafted the paper.Yahong HAN helped organize the paper.Runhua JIANG revised and finalized the paper.

Compliance with ethics guidelines

Yahong HAN is a corresponding expert ofFrontiers ofInformation Technology&Electronic Engineering, and he was not involved with the peer review process of this paper.Runhua JIANG and Yahong HAN declare that they have no conflict of interest.

Data availability

The data that support the findings of this study are openly available in Transfer-Learning-Library at https://github.com/thuml/Transfer-Learning-Library.

Frontiers of Information Technology & Electronic Engineering2023年11期

Frontiers of Information Technology & Electronic Engineering的其它文章: A multimodal dense convolution network for blind image quality assessment#; Smooth tracking control for conversion mode of a tilt-rotor aircraft with switching modeling*; A modified harmony search algorithm and its applications in weighted fuzzy production rule extraction*#; Software development in the age of intelligence:embracing large language models with the right approach; Hybrid-driven Gaussian process online learning for highly maneuvering multi-target tracking*; High-emitter identification for heavy-duty vehicles by temporal optimization LSTM and an adaptive dynamic threshold*#