Bayesian estimation‐based sentiment word embedding model for sentiment analysis

2022-05-28 15:16JingyaoTangYunXueZiwenWangShaoyangHuTaoGongYinongChenHaoliangZhaoLuweiXiao

Jingyao Tang|Yun Xue|Ziwen Wang|Shaoyang Hu|Tao Gong|Yinong Chen|Haoliang Zhao|Luwei Xiao

1Guangdong Provincial Key Laboratory of Quantum Engineering and Quantum Materials,School of Physics and Telecommunication Engineering,South China Normal University,Guangzhou,China

2College of Mathematics and Informatics&College of Software Engineering,South China Agricultural University,Guangzhou,China

3School of Foreign Languages,Zhejiang University of Finance&Economics,Hangzhou,Zhejiang,China

4Educational Testing Service,Princeton,New Jersey,USA

5School of Computing,Informatics and Decision Systems Engineering,Arizona State University,Tempe,USA

Abstract Sentiment word embedding has been extensively studied and used in sentiment analysis tasks.However,most existing models have failed to differentiate high-frequency and lowfrequency words.Accordingly,the sentiment information of low-frequency words is insufficiently captured,thus resulting in inaccurate sentiment word embedding and degradation of overall performance of sentiment analysis.A Bayesian estimation-based sentiment word embedding (BESWE) model,which aims to precisely extract the sentiment information of low-frequency words,has been proposed.In the model,a Bayesian estimator is constructed based on the co-occurrence probabilities and sentiment probabilities of words,and a novel loss function is defined for sentiment word embedding learning.The experimental results based on the sentiment lexicons and Movie Review dataset show that BESWE outperforms many state-of-the-art methods,for example,C&W,CBOW,GloVe,SE-HyRank and DLJT1,in sentiment analysis tasks,which demonstrate that Bayesian estimation can effectively capture the sentiment information of low-frequency words and integrate the sentiment information into the word embedding through the loss function.In addition,replacing the embedding of low-frequency words in the state-of-the-art methods with BESWE can significantly improve the performance of those methods in sentiment analysis tasks.

1|INTRODUCTION

The past decade has witnessed the flourish of word embedding models together with their successful application in natural language processing research.Word embedding models have been widely used in the tasks of entity recognition [1],word sense disambiguation [2,3],dependency parsing [4],machine translation [5],sentiment analysis [6,7] and so on.Generally,the current word embedding models can be categorized into two types [8],namely the prediction-based models [9-13] and the count-based models [14-16].The former type adopts language models to predict the next word based on its context,whereas the latter uses the global word co-occurrence counts to derive word embeddings.As such,the word embedding models are able to capture the context information and map the semantically similar words into neighbouring points within the word embedding space[17].Nevertheless,considering that semantically similar words may have opposite sentiment polarities [18],the widely used word embedding models have challenges in dealing with sentiment information.According to Yu et al.[19],among the top 10 semantically similar words to each target word,about 30% of them appears to have an opposite sentiment polarity to that of the target word.For this reason,research is still ongoing to develop sentiment word embedding algorithms that can appropriately tackle both semantic and sentiment information.

Recently,the sentiment word embedding models have been put forward to tackle sentiment information.Similar to the traditional word embedding models,the recently developed sentiment word embedding models can also be classified into the prediction-based models and count-based models.On the one hand,in the prediction-based models [18,35-37],the word sentiment is taken as the local information for model learning.That is,by using the neural networks or predicting functions,the sentiment word embedding can be learnt and the sentiment polarities can be classified.On the other hand,for the count-based models [20,21],the sentiment word embedding is derived from the word-sentiment matrices of the labelled corpora,in which case,the sentiment information is used as the global statistics.

Despite of these new models,the application of sentiment word embedding models is still limited.Current state-of-the-arts fail to dedicatedly identify and process the low-frequency words in the texts,which result in the loss of low frequency but significant information [22].Based on the Zipf's law[23],a large number of words in any corpus are often low-frequency words[24].According to statistics,in the Stanford Sentiment Tree(SST) corpus [25],the words with the occurrence rates lower than 5 account for 72.35%of all the words,and those lower than 10 account for 84.06%.In addition,because of the small number of low-frequency word samples,the learning process is prone to fall in over-fitting of the model.Notably,the low-frequency words carry not only little sentiment information but also the noise that might affect model learning.Instead of analyzing the low-frequency words,many current sentiment word embedding models directly utilize the word information for sentiment word embedding learning.For example,in the labeled singular value decomposition(LSVD)model,the word-context matrix and the word-sentiment matrix are stitched together to obtain the sentiment word embedding through the singular value decomposition.Likewise,the LGloVe model and the DLJT1 model use both matrices as the loss function of the least squares algorithm for model learning.In this way,the sentiment word embedding models are absent of extracting the precise sentiment of lowfrequency words among the word representations.Thus,this suspending issue will result in the low accuracy of sentiment polarity classification.

On the task of sentiment analysis,we tend to propose a novel method for sentiment word embedding learning.We design and deploy a Bayesian estimation-based sentiment word embedding(BESWE)model.Inspired by the GloVe,we introduce the sentiment probability as well as derive the loss function.Accordingly,the learnt word embedding is deemed best able to convey both the context information and the sentiment information.Seeing that the co-occurrence probability in GloVe is computed via maximum likelihood estimation,which has deficiencies in processing small samples,we thus use the Bayesian estimating method to pick up the sentiment information from the word-sentiment matrices for model learning.By introducing the prior knowledge,the Bayesian estimation can overcome the defect of insufficient sentiment information [26] and increase the robustness of the model[27].Our model is especially more accurate in capturing the sentiment information of lowfrequency words,due to its distinctive structure.On the one hand,we use a Bayesian estimator to compute the co-occurrence probabilities and the sentiment probabilities based on wordcontext matrices and word-sentiment matrices from the corpus.On the other hand,we exploit the learning of sentiment word embedding in a novel way by dedicatedly constructing a loss function.We compare our method against other state-ofthe-art models using identical experimental setups.In these configurations,the proposed model obtains a decent working performance in various natural language processing(NLP)tasks results and is considerably better than classical approaches in low-frequency word sentiment identification.

The major contributions of this work are threefold and can be summarized as follows:

(1) Based on the Bayesian estimation principle,our sentiment probability computation method aims to extract the sentiment information of low-frequency words from the word-sentiment matrix.

(2) Our BESWE model is innovatively developed for the sentiment analysis of low-frequency words,which integrates sentiment information with word embeddings and achieves a higher accuracy in tasks involving low-frequency words and low-frequency sentences sentiment analysis.

(3) The conceptual framework of BESWE can be applied to other word embedding models.By capturing the sentiment of low-frequency words,the sentiment polarity classification accuracy of the current models can be largely improved by integrating BESWE.

The rest of this study is organized as follows.Section 2 presents the prerequisites needed for understanding the proposed model.Section 3 describes the proposed sentiment word embedding model.Section 4 presents the experimental results.A related work to our model is given in Section 5.Conclusions are drawn in Section 6.

2|PREREQUISITE

This section introduces the basic knowledge of GloVe model and its parameter estimation principle,with the purpose of facilitating the description of subsequent model architecture.

2.1|GloVe

The GloVe model is a word-embedding method that combines evidence from the local context and the global counts [28].Typically,the method involves three distinguished wordsi,jandk.Bothiandjare target words,whilekstands for the context words.Letxbe the matrix representing the word-word co-occurrence counts.We can define the elementxikas the times for wordkappearing in the contexti.Correspondingly,xi=∑kxikindicates the total occurrence counts of any word within the context of wordi.Therefore,the co-occurrence probability ofkin the context word ofiis:

Then,Pik/Pjkdenotes the relation ofitokandjtok.As long askhas the similar relations toiandj,that is,both relevant or irrelevant,the ratioPik/Pjkwould be close to 1.The information within the ratio of co-occurrence probabilities can be formulated into:

wherew∈Rnrefers to the target word embedding and~w∈Rnto the context embedding.

2.2|Parameter estimation principle

On this occasion,we tend to prove that the co-occurrence probabilityPikcan be derived from the maximum likelihood estimation.

For every single target wordi,xitimes Bernoulli experiments are conducted to extract the context independently and randomly[29].In each experiment,there existVdifferent outcomes.The occurrence number of theVthoutcome,together with its probability,is represented byxikandPik,respectively.

If the random variablestands for the occurrence times of all the possibilities,in whichXikis the number of occurrence for thekthone,the parameterXimust obey the multinomial distribution,which can be written as:

To maximize the log-likelihood function in Equation (4),setting up the target function can be viewed as an equality constrained optimization problem:

Accordingly,the corresponding Lagrangian function can be formulated as:

together with determining the partial derivatives ofPik

Seeing that the parameterλis with respect to

It is clear however that the estimation ofPikis written as:

Obviously,the co-occurrence probability in GloVe is equivalent to the maximum likelihood estimating outcome of the co-occurrence probability.Theoretically,the maximum likelihood estimation is a statistical estimating method on abundant samples.In contrast,the Bayesian estimation is more effective in small sample processing [26].

3|METHODOLOGY

This section introduces the architecture of our BESWE model and its working principle.

Notation:

· ForandVrepresenting the number of words,we definexas the word-context matrix,together withxikandPik,respectively as the occurrence counts and the occurrence probability of wordkin context of wordi,which are from the setsand

·stands for the total occurrence counts of any word within the context of wordi.

· For,we definetas the word-sentiment matrix,together withti1as the number of positive texts including wordiandti0as the number of negative texts including wordi,whereti=ti0+ti1.

· Forwe defineBias the probability of wordibeing positive and 1-Bias the probability of wordibeing negative.

·wistands for word embedding of wordias target.

·~wistands for word embedding of wordias context.

·sistands for bias embedding of wordi.

3.1|Model architecture

The architecture of BESWE model is shown in Figure 1.Our sentiment word embedding model is developed based on the foundation of GloVe.Since the feasibility of maximum likelihood estimation for parameter calculation in GloVe is demonstrated,we start with using the co-occurrence probability for word context learning and the sentiment probability for word sentiment learning.By traversing the corpus with sentiment labels,the co-occurrence countsxi→and sentiment countsti→can be obtained.Aiming at dealing with lowfrequency words,the co-occurrence number is sent to Bayesian estimator for co-occurrence probability computation.Likewise,the sentiment probability can be obtained in the same manner by using sentiment counts of the word.At this stage,the estimation of lnPikand lnBican be obtained,which arecikandei,respectively.Based on these two outcomes,the loss function is constructed for sentiment word embedding learning,which is delivered as:

The loss function is minimized via AdaGrad optimizer[30].Along these processes,the learning ofwi,~wiandsiis conducted wherewistands for the sentiment word embedding of the BESWE model.

More details of the Bayesian estimator and the loss function of sentiment word embedding learning are described as follows.

3.2|Bayesian estimator

Based on the working principle of the proposed model,the cooccurrence probability and the sentiment probability are calculated by the Bayesian estimator.Distinctively,the estimation of co-occurrence probability comes from the current D-GloVe model [16].According to the parameter distribution assumption in Section 2.2,is assumed to obey the Dirichlet prior distribution with parameteras shown in Figure 2.Thus,the Bayesian estimation of lnPikis:

wherenkis the occurrence time of wordkin the learning samples,andλ1is the regulatory factor.

FIGURE 1 Model architecture

FIGURE 2 Probability distribution of parameter estimation of DGloVe model

wheremkstands for the number of texts of the sentiment labelk,andλ2is the regulatory factor for word sentiment learning.

FIGURE 3 Probability distribution of parameter estimation of BESWE model

3.3|Loss function

Considering the semantic information characterization in GloVe,we now take the ratio of sentiment probabilities to address the relationship of word sentiment.More details about the words'sentiment relation and the ratio are presented in the Appendix 7.1.For wordsiandj,the sentiment relation is expressed asBi/Bj.The objective function is established as:

wherewjandsjstand for the word embedding and bias embedding of wordj,respectively.

Assuming thatFis confirmed as the homomorphism between groups(R,+)and(R>0,×),we incorporate the sentiment information into the word embeddings.Based on Equations (2) and (22),we get:

Considering the properties of group homomorphism,we transform the above equation according to the addition commutative law into:

In line with the basic theory of GloVe,the loss function of BESWE can be constructed via the same manner:

wherecikandeiare derived from Equations (14) and (18),respectively.

4|EXPERIMENT

In this section,the working performance of the proposed BESWE model is evaluated.Some state-of-the-art word embedding models,along with the learning of specific word representations,are taken for comparison.To this end,the task of word similarity is carried out.So as to deliver the sentiment embeddings,the word-and the sentence-level sentiment analysis using different models is taken into account.The sentiment analysis tasks are further divided into three subtasks,which verify the efficiency on basic sentiment classification for all words,low-frequency word sentiment identification and integration with other baseline models.The outline of the experiments is exhibited in Figure 4.

4.1|Experiment settings

Dataset of word embeddings:The dataset SST is used for the model training.There are five classes annotations within SST,which are originally ‘very negative’,‘negative’,‘neutral’,‘positive’and‘very positive’.To facilitate the processing,the classes‘positive’and‘very positive’are taken to represent the polarity of positive while the‘negative’and‘very negative’represent the negative.As such,both categories are applied to the experiments.

Baseline models:We compare the effectiveness of the proposed model to other widely used models.Specifically,the models of word embeddings,such as C&W [10],CBOW [13]and GloVe [15],together with models of sentiment embeddings,including SE-HyRank [18] and DLJT1 [21],are implemented.For all the models used in this work,the word representation dimension is 50,the learning rate is set as 0.05.All the parameters associated are finetuned to get better results.

Task 1,Word similarity measure:The capacity of word embedding models is verified via the processing in standard word similarity.Comprehensively,the average working performance on word similarity tasks is obtained on the dataset EN-WS-353-ALL,EN-WS-353-SIM and SCWS.We utilize the word embeddings to calculate the similarity score of the word pairs first.Together with the standard similarity score provided by the dataset,we shall thus compute the correlation coefficient.The evaluation metric in this task is the computed correlation coefficient,which is detailed elsewhere [31].

Task 2,Word‐level sentiment analysis:On this occasion,the support vector machine classifier is used for training the words vectors,with each word vector representing specific sentiment.The popular sentiment polarity lexicons,NRC and MPQA [32],are taken as the ground truth of the word sentiments.The number of positive and negative words for MPQA is 2301 and 4151 while those for NRC is 2231 and 3324.TheN-fold cross validation with the valueN=5 andN=10 is performed on all datasets.The working performance refers to the classification accuracy of the models.

The word-level sentiment analysis is carried out via the following three subtasks.

Task 2.1,Basic word‐level sentiment analysis:The usual task for evaluating the performance is to capture the sentiment information of the words.Thus,we generate the word embeddings and apply them to the sentiment classification of the sentiment lexicon.

Task 2.2,Low‐frequency word sentiment analysis:Since the Bayesian estimation principle is able to deal with lowfrequency words,we pick the words with the frequencies lower than 5 and 10 from the SST corpus for investigation.In this way,the low-frequency word embeddings are collected for sentiment analysis.

Task 2.3,BESWE integration with other models:In order to improve the classification accuracy,we integrate BESWE with the baseline models.A specific word embedding set is developed,which contains low-frequency word embeddings from BESWE and nonlow-frequency word embeddings from any other baseline model.By using these word embeddings for model learning,the sentiment analysis is carried out.

Task 3,Sentence‐level sentiment analysis:Considering the sentiment analysis for sentence,the Movie Review dataset[33]is used,which contains 10,622 samples with the proportion of each polarity 1:1.We use a convolutional neural network(CNN),namely text-CNN,with its online implementation[34].The inputs of text-CNN are word embeddings.The samples are divided into training,validating and testing,with the ratio of 6:2:2.The training episode is set as 200 epochs using the default settings.Based on the validation outcomes,we apply the optimal model to testing.The evaluation metrics are the classification accuracy and F1 score.

Similarly,there are also three subtasks like in the word-level sentiment analysis.

Task 3.1,Basic sentence‐level sentiment analysis:As stated in Task 2.1,we take the words from Movie Review for sentiment analysis.

Task 3.2,Low‐frequency sentence sentiment analysis:In this task,we select the low-frequency words whose occupation is over 1 0%in the sentence to be the low-frequency sentence samples.Thereby the low-frequency sentences are taken for sentiment analysis.

Task 3.3,BESWE integration with other models:Similar to Task 2.3,the integration of BESWE with each baseline model is devised to deal with both the low-frequency words and the nonlow-frequency words in the sentences from the Movie Review dataset.

4.2|Experimental results

FIGURE 4 Workflow of experiments

Task 1,Word similarity measure:The outcomes of the word similarity task are given in Table 1.It is apparent that our BESWE method outperforms the other methods on all three datasets,verifying the capability of capturing sufficient semantic information for analysis.The sentiment word embeddings are generally more competitive than the basic word embeddings on the task of word similarity.The main reason is that,by exploiting the sentiment information,semantically more accurate word embeddings are obtained to optimize the working performance.Moreover,the Bayesian estimationprinciple is effective in tackling the large number of lowfrequency words in the corpus,based on which more accurate sentiment word embeddings are accessible.Accordingly,one can easily see a considerable gap between our model and the baseline models.

TABLE 1 Word similarity results

Task 2.1,Basic word‐level sentiment analysis:The wordlevel analysis is carried out on the dataset of single-word entries.For basic word-level sentiment analysis,our model is proved to be a competitive alternative to classical word embedding models(Table 2).Furthermore,the BESWE model obtains the best outcome with 10-fold cross validation on the dataset NRC.

Task 2.2,Low‐frequency word sentiment analysis:The BESWE model shows a better accuracy in tackling lowfrequency words,as shown in Table 3.For the frequencies lower than 5 (LF-5) and 10 (LF-10),our model shows its stability in sentiment polarity classification,compared to the baseline models.As mentioned in the result of Task 1,the lowfrequency word processing ability contributes to establishing precise sentiment word embeddings.By contrast,there is also a performance gap between BESWE and other algorithms for the dataset NRC.

Task 2.3,BESWE integration with other models:In line with the results of Task 2.2,we combine the proposed model to baseline models,not only to show its superiority,but also to optimize the word-level sentiment analysis outcomes of baseline models.As long as the BESWE algorithm is capable of processing low-frequency words,the low-frequency word embeddings (i.e.LF-5 and LF-10 from BESWE) are incorporated into the nonlow-frequency word embeddings from the baseline.The outcomes of the integration method are reported in Table 4.The classification accuracy is significantly improved for most models.The maximum performance gap of 4.44%is observed in the integration of DLJT1 and BESWE against the basic DLJT1 for the dataset MPQA.

Task 3.1,Basic sentence‐level sentiment analysis:The effectiveness of the proposed model is further evaluated on the sentence-level sentiment analysis tasks.From Table 5,we see that the BESWE model has a better accuracy than any other baseline methods.

Task 3.2,Low‐frequency sentence sentiment analysis:In line with the outcome on low-frequency words,the sentiment analysis on low-frequency sentence exceeds the baselines in both evaluation settings.The highest classification accuracy ofthe BESWE,based on the word frequencies lower than 5(LF5),is 81.25%,as shown in Table 6.Meanwhile,the maximum performance gap of 9.75% is observed against the DLJT1.Comparing Tables 5 and 6,the improvement in accuracy against the baseline in Table 6 outperforms that in Table 5.One possible explanation is that the low-frequency sentences in Task 3.2 contain more low-frequency words.Since the sentiment word embeddings learnt via the Bayesian principle is considerably more informative,it is reasonable to expect better performance in low-frequency words analysis,as it is the case.Besides,the accuracy on ‘LF5’ in Table 6 overperforms that of‘LF10’,which further verifies the significance of our model.

TABLE 2 Basic word-level sentiment analysis results

Task 3.3,BESWE integration with other models:Similar to Task 2.3,since the working performance on low-frequency sentence is highlighted,we integrate the corresponding word embeddings to improve the classification accuracy of the baseline.The outcomes of the integration methods,which combine the baseline and the proposed BESWE,are reported in Table 7.In contrast,these outcomes fail to exceed that from the direct use of the proposed model.That is,for the sentiment analysis of low-frequency words,BESWE always obtains the best and most consistent results in the identification of sentiment polarity.Whereas,the application of BESWE can narrow the performance gap.For the C&W model,a 10%improvement is presented.

Effects of λ1and λ2:The hyperparameters in the BESWE model include the regulatory factorsλ1andλ2used to represent the semantic and sentiment information.In this experiment,to obtain the optimal settings,we vary the values ofλ1andλ2withinto learn the BESWE model.In this way,we get 64 different BESWE models.

The results on the low-frequency sentence sentiment analysis against different hyperparameter settings are shown in Figures 5 and 6.The former indicates the accuracy on the lower-than-5-frequency words while the latter indicates that of the lower-than-10-frequency words.The variation ofλ1does not cause a significant difference,while that ofλ2has a negative correlation with the accuracy.According to Figure 5,the highest accuracy of the sentence-level sentiment analysis is 81.25% at the pointλ1andλ2=0.01.Likewise,in Figure 6,the optimal values ofλ1andλ2are both 0.01,which lead to an accuracy of 80.50%.

TABLE 3 Low-frequency word sentiment analysis results

TABLE 4 Word sentiment analysis results of integrating models

TABLE 5 Basic sentence-level sentiment analysis results

To sum up,these experimental results clarify the effectiveness of the proposed sentiment word embedding.The BESWE model outperforms other state-of-the-art models in word similarity measure.In the sentiment analysis at both word and sentence levels,our method still shows comparableoutcomes.Specifically,our model produces considerably better results than the baseline methods on sentiment information capturing of both low-frequency words and low-frequency sentences.Moreover,by integrating the lowfrequency word embeddings from BESWE to other models,the classification accuracies of the baseline models improve to a large extent.

TABLE 6 Low-frequency sentence sentiment analysis results

TABLE 7 Sentence sentiment analysis results of integrating models

FIGURE 5 The sensitivity of λ1 and λ2 on the BESWE(LF5)in lowfrequency sentence sentiment analysis

FIGURE 6 The sensitivity of λ1 and λ2 on the BESWE(LF10)in lowfrequency sentence sentiment analysis

5|RELATED WORK

5.1|Word embeddings

As pointed out in the Introduction section,both predictionbased word embedding models and count-based word embedding models are applied to learn word embeddings [8].Fundamentally,Bengio et al.[9],establish neural network language model to predict target words using preceding contexts,and thus to learn the word embeddings.Following this theory,Collobert and Weston [10,11] put forward a CNN to predict the target word,based on not only preceding but also succeeding contexts.As currently edge-cutting predictionbased methods,CBOW and skip-gram models [12,13] are of simple single-layer architecture.Both of these models can efficiently compute word embeddings from large-scale datasets.Besides,researchers also focus on using the global word context co-occurrence counts in the corpus for learning word embeddings,which are taken as the basis of count-based methods.Deerwester et al.[14] novelty propose the latent semantic analysis model for exploiting word-document cooccurrence counting to learn word embedding for the first time.As a most widespread model,GloVe [15] computes the word-word co-occurrence counts via a specific weighted least squares model.A consistent and competitive result can be obtained on the tasks of sentiment analysis.

5.2|Sentiment word embeddings

Likewise,prediction-based sentiment word embedding models and count-based sentiment word embedding models are dedicatedly studied.In 2011,Maas et al.[35] apply a logistic regression as the predictor to learn sentiment word embeddings.On the other hand,an approach based on recursive autoencoders [36] is designed that year to learn the vector representations of phrases and full sentences,which exploits the vector representations at each node of the hierarchy and uses softmax classifier for sentiment label prediction.Tang et al.[18] propose the hybrid ranking method for learning sentiment embeddings by regulating the traditional C&W model,which encodes sentiment information in the continuous representation of words.With the development of deep learning networks,Lan et al.[37] construct a CNN to detect the semantic and sentiment information.Hereafter,two kinds of information are integrated to generate sentiment word vectors.In terms of count-based sentiment word embedding models,Li et al.[21] incorporate the sentiment count into model learning by proposing a variety of count-based models(e.g.DLJT1)on the foundation of GloVe.Furthermore,LSVD and LGloVe [20] are developed as the improvement of SVD and GloVe,respectively,In these models,the application of word-label counts facilitates the learning of sentiment word embedding with label information.

Different from these works,our work mainly focuses on resolving the issues of low-frequency word sentiment analysis.Little attention is paid to this topic currently within the NLP domain.Our model targets at extracting the sentiment information by obtaining a more accurate sentiment word embedding.Notably,our model is orthogonal to the aforementioned models.Our model overperforms the state-of-the-arts while the adaptation of our approach into these methods also results in a better working performance.

6|CONCLUSIONS

In this work,a novel BESWE model is designed and deployed on the tasks of sentiment analysis.Aiming to obtain both the semantic and the sentiment information,a Bayesian estimator is developed to compute the co-occurrence probability and the sentiment probability.Furthermore,a loss function for the purpose of sentiment word embedding learning is constructed.We test our model on a variety of tasks to evaluate its working performance.Experimental results indicate that the BESWE model is a comparable alternative to the state-of-the-art methods in word similarity identification,word-and sentence-level sentiment analysis.Specifically,our model outperforms other methods on low-frequency word and lowfrequency sentence sentiment polarity classification to demonstrate its efficacy.By integrating the BESWE into the baseline,the classification accuracy can be improved considerably compared to the basic models.

This study offers a creative and practical method for both the semantic and sentiment information capturing.Distinctively,our model shows its superiority in dealing with low-frequency words and thus results in a higher accuracy in sentiment analysis.

ACKNOWLEDGEMENTS

This work was supported by the National Statistical Science Research Project of China under Grant No.2016LY98,the Science and Technology Department of Guangdong Province in China under Grant Nos.2016A010101020,2016A01 0101021 and 2016A010101022,the Characteristic Innovation Projects of Guangdong Colleges and Universities (Nos.2018KTSCX049 and 2018GKTSCX069),the Science and Technology Plan Project of Guangzhou under Grant Nos.201802010033 and 201903010013,the Bidding Project of Laboratory of Language Engineering and Computing of Guangdong University of Foreign Studies (No.LEC2019 ZBKT005).

ORCID

Jingyao Tanghttps://orcid.org/0000-0003-1651-8480

How to cite this article:Tang,J.,et al.:Bayesian estimation-based sentiment word embedding model for sentiment analysis.CAAI Trans.Intell.Technol.7(2),144-155 (2022).https://doi.org/10.1049/cit2.12037

APPENDIX

SENTIMENT RELATION OF DIFFERENT WORDS USING BAYESIAN ESTIMATION

Specific sentiment relations calculated by Bayesian estimation are presented in Table 8.

TABLE 8 Sentiment relation of different words using Bayesian estimation