A Novel Collective User Web Behavior Simulation Method

2021-12-16 06:38HongriLiuXuZhangJingjingLiandBailingWang
Computers Materials&Continua 2021年3期

Hongri Liu, Xu Zhang,Jingjing Li and Bailing Wang,*

1College of Computer Science &Technology, Harbin Institute of Technology at Weihai, Weihai, 264209,China

2Cyberspace Security Research Center, Peng Cheng Laboratory, Shenzhen, 518000,China

3Research Institute of Cyberspace Security, Harbin Institute of Technology, Harbin,150001, China

Abstract:A collective user web behavior simulation is an import means for generating a large-scale user network behavior in a network testbed or cyber range.Existing studies almost focus on individual web behavior analysis and prediction,which cannot simulate human dynamics that widely exist in large-scale users’behaviors.To address these issues,we propose a novel collective user web behavior simulation method, in which an algorithm for constructing a connected virtual social network is proposed, and then a collective user web behavior simulation algorithm is designed on the virtual social network.In the simulation method, a new epidemic information dissemination algorithm based on the SIR model is proposed to drive the user web behavior with Breadth—First Search algorithm on the connected virtual social network.We specially build an experiment environment with 12 servers by using Docker container technology and then perform a wide range of experiments with different user scales to evaluate the method.The experimental results demonstrate that not only the degrees of the social network but also the time intervals of the collective users’ web behavior can be well fitted to a power-law distribution and show that our simulation method can well simulate a collective user web behavior.

Keywords: Web behavior simulation; virtual social network; human dynamics;power-law distribution

1 Introduction

Facing an increasingly severe network security issues in the Internet,researchers and organizations have constructed all kinds of large-scale of network testbeds to simulate the Internet and then conduct experiments for analyzing these issues,such as Emulab Network Testbed,NCR(National Cyber Range)[1]and StarBED[2].In large-scale network testbeds or cyber ranges, virtualization technologies are widely employed to simulate the hardware and network, which constitute the infrastructure of the network testbed.Meanwhile, the user network behavior on network testbed is generated by simulating the user behavior on the Internet [3,4].In recently years, for predicting or simulating individual behaviors on some applications, many user network behavior models and simulation methods have been proposed [5-11]and achieved promised results.However,these methods cannot be directly applied in collective user network behavior simulation because of the lack of interactions among individuals and other issues.

Based on human dynamics,many researchers have devoted the studies on behavior analysis[12-15]and have achieved fruitful results.However, a collective user network behavior simulation is rarely touched in current studies for the challenges in the scale and the method of simulation in a network testbed.Our interest arises out of the question of how to perform a collective user network behavior simulation that follows human dynamics in a real network environment.In the simulation, the key is how to drive the collective user to perform one kind of network behavior while following the laws of human dynamics.

In this paper,we study a collective user web behavior simulation method.In detail,we first construct a virtual social network in which each virtual node is modeled as an agent and the degrees of the agents follow a power-law distribution, a form of heavy-tailed distribution.Then, we simulate the information dissemination process among the agents based on an epidemic information dissemination model,in which all agents reply a message to a specific post in the forum after infected and then infects its neighbors.We finally validate that the time intervals between two consecutive replies follow a power-law distribution,which agrees with the laws of human dynamics.The experimental results demonstrate that our method is capable of simulating the collective user web behavior realistically.

The goal of our work is to study the method of simulating a collective user network behavior in a large network testbed or cyber range,instead of studying the mechanism of the heavy-tailed phenomenon in the real social network such as Twitter and Facebook.To the best of our knowledge,this work is the first study on the method of simulating a collective user network behavior in a real network environment.

In summary,the main contributions of our work are listed as follows:

1.We design a virtual social network construction method, which improves the PLOD (power-law out degree) [16] algorithm with the BA (Barabási-Albert) model [17,18] and has the ability to quickly establish a connected virtual social network whose degree distribution follows a power-law distribution.

2.An information dissemination algorithm,the key of the simulation method,is proposed to drive the individual user’s network behavior.We give the calculation method of the length of waiting time and define the process of the epidemic information dissemination in the algorithm.

3.We design a real network environment to evaluate the performance of the simulation method on the Docker container swarm.As an agent, each Docker container posts a reply to the specific post of the same forum by running our proposed algorithms in the connected social network.We validate that the time intervals of consecutive replies follow a power-law distribution.

The rest of the paper is organized as follows:Section 2 reviews related work,Section 3 introduces the approach,Section 4 presents our experiments and the result,and Section 5 discusses and concludes the paper.

2 Background

A collective user behavior simulation plays an essential role in multiple internet applications.Many researchers have focused on the prediction and simulation of individual network behavior.

2.1 User Network Behavior Modeling and Predicting

Kaderka et al.[19] build a tool named BeCos, which not only allows users to specify system and component behavior but also enables system engineers to clearly define behavior in a semantically rigorous manner with behavior ontology and scenario ontology.Yang et al.[20] analyzed the influence of multiple factors on email reply behavior and developed a model to predict whether a receiver will reply to the sender.Moreover, they characterized the influence of rich factors on email reply behavior.Incorporating structure, usage, and content data from a real website, Loyola et al.[8] proposed a new method to analyze web user behavior by using an Ant Colony Optimization algorithm, which can explain approximately 80% of real usage with a predefined similarity measure.Si et al.[21] studied the users’post and reply behaviors in Tianya, an internet forum, and found that users’ posts, replies, in-degrees and out-degrees of replied message distribution all follow a power-law distribution, which is consistent with the achievements of other researchers.

In terms of the analysis approaches, Rajabi et al.[9] defined user behavior categories using an unsupervised learning algorithm and then proposed a data-driven approach to analyze user behavior.Liu et al.[10] presented a new user behavior modeling method on the user’s clicking behavior of search engines, in which a convolutional neural network(CNN) architecture is constructed to build click models,and the parameters from traditional click models are used to restrict the meaning in the hidden layer of the CNN.This model is considered as the best with respect to the evaluation metric of click perplexity.Park et al.[7] analyzed the behavior of image search on a large-scale query log from Yahoo Image Search, which assumed that a user’s behavior is dependent on the query type.Aiming at revealing highlevel patterns in the behavior of game players, Sifa et al.[22] performed four experiments on a 6 million player dataset covering a total playtime of over 5 billion hours across more than 3000 games distributed via the Steam platform, which provided unique insights into the behavior of the game player.Xiao et al.[23] studied human behavior in Sina Microblog of China and found that not only on the individual level but also on the group level,the message releasing time intervals obey power-law distribution.

2.2 User Network Behavior Simulation

Agent-based models(ABM)[24,25]are computational models that simulate the actions and interactions of autonomous agents (individual or collective entities) to assess their effects within an environment.In company with the increase of computing power, ABM based user behavior simulation techniques (e.g.,BDI) have been widely used in social simulation [26-29], e.g., Kampmann et al.[29] proposed an automatic parameter identification approach for personality driving behavior models by comparing the behavior data of human and artificial drivers in the same virtual environment.This method can also be applied to compare human and artificial driving behavior.Also, Flamino et al.[30] concluded that human patterns of temporal preference are consistent and resilient by analyzing users of several real-world datasets.Further, they integrated those patterns into large-scale agent-based models and simulated the users’ activity to validate predictive accuracy.Xu et al.[31] build an agent-based model to simulate the knowledge transfer and integration processes among irrational individuals and then explored the effects of individual irrationality and network structure on collective performance.Although ABM is widely used to simulate human behavior in multiple research areas such as auto driving, the research on simulating user network behavior is quite rare.In [32], the BP-BDI model is proposed to simulate the user’s behavior on email, in which a behavior learning model based on BP neural network is used to learn the user’s mail behavior.Diallo et al.[33] first analyzed the web simulation by using the principles of service orientation,platform independence and interoperability, and then proposed a layered approach for the web simulation.In terms of Human-machine interaction (HMI) simulation, Amirkhanyan et al.[5] proposed a user behavior simulation scheme based on user behavior state graphs, which are capable to generate simple user behavior on Windows-family virtual machines.

To sum up,there are many researches on individual behavior modeling and simulation based on ABM and make some notable research results, whereas, in collective user network behavior simulation, these research results cannot be applied directly.On the one hand, different from traffic simulations and crowding behavior simulation in an emergency, the collective user’s network behavior is spontaneous and lasting for several days or even months.On the other hand, the individual network behavior simulation methods are not suitable for collective user behavior simulation for lack of interactions among users.

In this paper, we propose a novel collective user web behavior simulation method based on ABM.Firstly, we construct a connected undirected virtual social network, and then use epidemic information dissemination model to simulate the information dissemination process among the agents in the virtual social network to drive individual user’s web behavior simulation.The time calculation method makes the time intervals of two consecutive network behaviors follow a power-law distribution.It should be noticed that our work is different from [23], in which they simulated multi-agent interactions in a real-life social network topology to verify whether the results are consistent with empirical observations.In addition, their simulation happened in an emulation environment without any real network behavior.However, our work is to explore a simulation method that can construct a virtual social network and simulate a collective user’s web behavior that follows human dynamics in a real network environment.

3 Research Method

In this section,we first propose a connected virtual social network construction algorithm by combing the PLOD algorithm with the BA model.In the virtual social network,each node is modeled as one agent.We divide the agents into three categories,the susceptible agent,the infected agent and the recovered agent.The SIR is modified,named as s_SIR,to simulate the information dissemination among the agents,which drives individual network behavior simulation.Taking web behavior for example, one agent selected randomly from the social network replies to an specific post of the same forum after Δti(i∈[1, n]) length of time and then infects its neighbors.The infected agent will repeat the same process.The abstract simulation process of the collective user web behavior is demonstrated as Fig.1.

Figure 1: The abstraction process of collective user web behavior simulation

In Fig.1, (a) shows isolated agents, (b) a virtual social network constructed by our algorithm, (c) one agent is infected (individual web behavior simulation), (d) The neighbors are infected, and (e) all the agents are infected (collective user web behavior simulation complete).We assume one infected agent will recover if it has infected all its susceptible neighbors and the recovered agent will never be infected anymore.In the information dissemination process, two connected agents communicate with each other to simulate the infection process by using UDP.Once infected, the agent will perform web behavior simulation after Δti(i∈{1,2, …,n})length of time.

3.1 Virtual Social Network Construction Algorithm

There are two classical virtual social network generation algorithms,one is the PLOD,and the other is the BA model.The degree distribution resulting from the BA model follows a power-law distribution of the form P(k) ~k-3, which cannot generate other degree distributions directly.The PLOD algorithm can construct a virtual social network that the degrees of the agents obey power-law distribution by assigning a degree to each agent.However, it cannot guarantee that the virtual social network is connected.To construct a connected virtual social network with specified degree centrality, we improve the PLOD algorithm with the BA model, which makes the degrees of the agents follow a power-law distribution.The algorithm is described as Algorithm 1.

Algorithm 1.A connected social network construction with specified degree Input: n is the number of agents in the virtual social network.degree_max is the maximum degree of the social network.S1={n1,n2,…,nn},S2={},S3={},ni ,(i ∈{1,2,…,n})represent the agents in the social network.Output: a virtual connected social network in which the degrees of the agents follow power-law distribution.Description:Begin:Step 1.Generate a set of n integers{d1,d2,…,dn}that are consistent with the power-law distribution,where di is the degree of ni in S1 and max{d1,d2,…,dn}≤degree_max;Step 2.Randomly select an agent ni from S1 as the initial network and add it to S2,then delete ni from S1;Step 3.Randomly select an agent ni from S1 and link it to a random agent nk in S2 with probability α=(di+ dk)/ SUM(dj,where nj∈S1).di =di-1,dk= dk -1 (i∈[1,length(S1)] and k∈[1,length(S2)]);Step 4.if dg ==0(ng∈{ ni,nk }) then add ng to S3 ,else add agent ni to S2;Step 5.if S1 != {}then jump to Step 3 else jump to Step 6;Step 6.Random select two agents ni and nk from S2 whose degrees are not 0 and link them with probability α =(di+ dk)/SUM (dj,where nj∈S2).di= di-1, dk= dk -1;Step 7.if dg ==0(ng∈{ ni,nk }) then add dg to S3 ;Step 8.jump to Step 6 until S2= {};End

In Algorithm 1, the agents are divided into three non-intersecting collections, S1, S2, and S3.S1is the collection of isolated agents in the social network, S2is the collection of the connected agents whose degrees are not zero in the virtual network, and S3is the collection of agents whose degrees are 0.The virtual social network grows up until S1= {} and S2= {}.The α →1 when |S2|→0.Therefore, the virtual social network constructed is a connected network.Also, the time complexity of Algorithm 1 is O(n2).Furthermore, we implement the method proposed in [34] to calculate the set {di} for the degrees of the agents.

We assume that the{di}satisfies a power-law distribution p(di)∝di-β,and an upper limit M is assigned to degree_max in Algorithm 1.The purpose is to generate a set{di|di∈[1,M]∧diis an integer ∧p(di)∝di-β}.The cumulative distribution is

Accordingly,Eq.(1)can be approximated by a continuous form,as

The ratio of F(Δt)(Δt∈[1,M]) to F(M) is denoted by r,as

0 ≤r ≤1 and

If M →∞,then M1-β→0 and hence (4)can be approximated as

For different β, the average intervals avg(Δt) is different.To exclude the bias influenced caused by avg(Δt),a correction factor u(β)is introduced to Eq.(5),as

where u(β)satisfies

For simplicity but without loss of generality,we set u(1)= 1, hence

To sum up,given M and β,u(β)can be estimated directly.The set{di}can be generated by selecting a random r ∈[0,1]and then calculating Δtiby using Eq.(6).Furthermore,if Δt′is not an integer,we further rectify Δt′.Namely,Δt′=with probability Δt′-while Δt′=with probability-Δt′.The virtual relationship is established and maintained by the communication among agents.

3.2 Web Behavior Simulation Algorithm

We apply the SIR model to simulate the information dissemination process among the agents and meanwhile trigger the web behavior simulation, which is named the s-SIR algorithm as described in Algorithm 2.More specifically, the status of agents in the virtual social network is divided into three types,namely, susceptible agents,infected agents,recovered agents.

Before describing our s-SIR algorithm, we first give the following definitions:

1.In the initial state,each agent in the social network is susceptible.

2.Each agent is irrational; in other words,it will infect all susceptible neighbors.

3.Each infected agent will become a recovered agent after infecting all neighbors and will never be infected anymore.

The infection process is implemented by one agent transferring a message to its neighbors one by one in the virtual social network.The agent is viewed as infected once it receives a specified message from a neighbor,and then replies to the same post after waiting a specified length of time.

?

In Algorithm 2, each infected agent vi(i∈{1, 2,…, n}) will wait for the length of tibefore replying a message to an specific post of the same forum, and then send a message to each susceptible neighbors vjwith Breadth—First Search.At some point of time, multiple agents are performing web behavior simulations simultaneously.The length of waiting time ti(i = 1, …, n) also follows a power-law distribution.The calculation method of tiis given in the Eqs.(1)-(7).To simulate the web behavior realistically, we use an expansion factor γ to extend the web behavior simulation process.Namely, the length of the waiting time tiis expanded by γ times.

To simulate collective user web behavior,a construction method of a virtual social network is proposed,of which the degrees of nodes follow a power-lower distribution.Next,a web behavior simulation method named s-SIR based on SIR has been designed to drive each agent performing a reply to the post in the virtual social network.Not only the distribution of agents’ degrees but also the time intervals of post behaviors follow a power-law distribution.In other words, we abstract an undirected graph G = {vi, di, ti} to describe the agents in the virtual social network, where diis the degree of vi, and tiis the length of waiting time from being infected to perform the simulation.Once the G is determined, the collective user web behaviors can be simulated,and finally,the degree and waiting time intervals follow a power-law distribution.

4 Experiments

In our experiments,we use a Docker container swarm to simulate the collective user.We deploy 6000 and 10000 Docker containers respectively to evaluate our proposed simulation method.First, the virtual social network constructed method is validated,and then we perform web behavior simulation in the virtual social network with our proposed s-SIR algorithm.Finally, a power-law function is used to fit test data, and the standard error is employed to evaluate the realistic of the simulation method.Our experiments demonstrate that the method is capable of simulating the collective user web behavior realistically.

4.1Experiment Setup

We deploy 12 high-perfor mance servers (Intel Xeon E5-2650 V4, 64G memory, 2 NICs and 2T hard disk) to conduct the experiments, of which 10 servers running Docker containers are used to perform collective user web behavior simulation and another server serves as a web server where a forum is deployed and the 12th server is used to generate the virtual social network and calculate the time of replying to the post of the forum.Each agent replies a message to the specific assigned post of the forum in the web server.The agent is implmented by one Docker container, and the Weave Net networking toolkit is employed to connect all the agents.The physical and logical topologies of the experiment is shown in Fig.2.

Figure 2: (a)The physical topology of the experiment network.(b)The logical topology of the agent server

As shown in Fig.2, the servers are connected by a router in physical while the agents (Docker containers) are connected by Open vSwitch, a virtual switch, controlled by Weave Net in logical.The implementations of Algorithms 1 and 2 are deployed in the control server and the agents, respectively, to perform the simulation.We deploy 10000 agents at most to evaluate our proposed method, and all the system clocks of the servers are synchronized by NTP before each round of the experiment.Furthermore,the clock synchronization error of 1000 Docker containers in one server is tested and the maximum error is less than 1 s.

4.2 Calculation Time Comparison

First,we compare the calculation time between the PLOD algorithm and the improved PLOD algorithm(Algorithm 1).The number of nodes ranges from 100 to 1000 with the same degree distribution function.The experimental result is shown in Fig.3.

Figure 3: The comparison of the calculation time between the PLOD and improved PLOD

As shown in Fig.3, the calculation time increases as the number of nodes increase.However, the improved PLOD algorithm grows slowly compared with the PLOD.For the improved PLOD algorithm,the maximum calculation time is 0.15 s with 1000 nodes, which is much less than that of the PLOD algorithm.

4.3 Virtual Social Network Construction Experiment

Next, we evaluate the virtual social network construction method with 1000 agents where β = 2.1 in Algorithm 1.The degree distribution and generated graph are demonstrated in Fig.4.

Figure 4: (a) The distribution of the virtual social network log-log plot.(b) The graph of virtual social network

In Fig.4a,the x-axis is the degrees of the agents,and the y-axis represents the number of times of the same degree that appears in the experiment.In the experiment,there are 44 different degrees from{1,2,…,292} for the 1000 agents.Furthermore, most degrees of the nodes are smaller and follow heavy-tailed distribution in the log-log plot.The power-law function appears as a straight line with slope k= -2.1.

4.4 Time Interval Distribution Experiment

In this section, we construct a virtual network with 6000 and 10000 agents, respectively.Firstly, one agent is randomly selected as the source of the infection.Then we drive the web behavior simulation according to the network topology and the length of waiting time.In this section, we set all the infected probability pj= 1 (j = 1, …, n), and the length of each waiting time ti(i = 1, …, n) is multiplied by expansion factor γ (γ = 1500) to extend the waiting time.Finally, we calculate the time intervals of the reply to the post.The time intervals are indirectly obtained from the database of the webserver of the forum.The time interval distribution of 10000 users is illustrated in Fig.5.

Figure 5: (a)The distribution of time intervals of replies to the post with 10000 users.(b)The logarithmic plot of the time intervals of replies to the post with 10000 users

As in Fig.5,the x-axis is the time interval and the y-axis is the number of counts of the time interval.We can conclude:

1.Most of the time interval of agents are located in the tail part,

2.The smaller the time interval,the larger the counts and

3.The larger the time interval,the smaller the counts.

We fit the pow-law function, which is y = 6640 × x-2.1, and the probability density function using logarithmic operations is y= -2.1 ×x +3.8 with standard error 0.0153.

To compare with the simulation result of 10000 users, the experimental result of 6000 users is shown in Fig.6.

Figure 6: (a) The distribution of time intervals of replies to the post with 6000 users.(b)The logarithmic plot of the time intervals of replies to the post with 6000 users

As demonstrated in Fig.6a,the time intervals also exhibit a power-law distribution,which is similar to the 10000 user’s simulation.The probability density function using logarithmic operations is y=-2.0×x+3.7 with standard error 0.0156.Comparing Fig.6 with Fig.5,we can conclude that the standard error does not change obviously(less than 2%).Hence,the performance of the simulation method is stable.

4.5 Standard Error Comparisons Experiment

Finally,we perform extensive experiments to evaluate the standard error of the fitted pow-law function,in which β=3,4,5 in Algorithm 2 and the waiting time expansion factor γ=200,300,500,and 1500 with 6000 agents and 10000 agents,respectively.The experimental results are shown in Fig.7.

Figure 7: (a)The standard error comparison with different β and γ under 6000 users.(b)The standard error comparison with different β and γ under 10000 users

From Fig.7,we can conclude that the standard error does not increase obviously with the increase of the agents’ number.The max standard error appears when γ = 200 and β = 5 (Algorithm l) in both figures.Furthermore, the standard error grows as β increases when γ is fixed.The standard error decreases as γ increases when β is fixed.Furthermore, the simulation process lasts 37.1 h when β = 3 and γ = 1500 with 10000 agents, and 14.6 h with 6000 agents.So, the longer the waiting time, the better the fitting effect and the closer to the real collective user web behavior in a specific range of the length of waiting time.

In our experiments,we first validate the connected social network construction algorithm with specified degree.Compared with the PLOD algorithm, the calculation cost of our proposed method is smaller and ensures that the generated social networked is connected.Base on the virtual social network, we then deploy 6000 and 10000 Docker containers with ten servers to simulate the collective user web behavior with the s-SIR algorithm respectively.By using UDP, we simulate the infection process that occurs among virtual users.The time intervals of the web behavior of the collective user are fitted to pow-law function with a lower standard error, which fits the human dynamics.From the experiments, we can conclude that the standard error is stable when increasing the scale of the users, and the standard error grows when increases β or decreases γ.

5 Discussion and Conclusion

In this work,we have studied the collective user’s web behavior simulation method and validated our proposed method by driving 6000 and 10000 agents to reply to a specific post of the same forum.First, a construction method of a virtual social network whose degrees follow a power-law distribution is proposed.Then, we use a new data structure G = {vi, di, ti} to describe the agents in the virtual network.Based on this network, we design an algorithm named s-SIR to drive each agent to perform web behavior simulation.A calculation method, which can generate a data collection following a power-law distribution, is implemented to create the degrees and the waiting time of the agents in the virtual social work.The degree collection is employed to construct the virtual social network, and the waiting time collection is used to control the message dissemination process by our s-SIR algorithm.We have ignored the time of information dissemination and the replies to the post due to the expansion factor γ is much longer than those time.The experimental results show that the time intervals of the web behavior follow a power-law distribution.

To the best of our knowledge,this is the first work to study on the method of simulating the collective user web behavior in a real network environment.The study results can be widely employed in all kinds of network testbeds to construct a real network scenario in the construction of a large-scale users’ network behavior that follows human dynamics,such as mail behavior and web behavior.

Since our work is the first research in the area of collective user network simulation,it comes with certain limitations.One is that we only simulate 1,0000 users’ web behavior because of the limitation of the hardware.Another possible limitation is that we cannot directly calculate the collection of power-law distribution for degree and waiting time given the distribution function of collective user behavior time interval.In the future, we may want to study the power-law distributions calculation method of degree and waiting time, given the collective user behavior distribution.

Funding Statement:This research was supported by National Key Research and Development Plan under Grant 2017YFB0801804, Key Research and Development Plan of Shandong Province under Grant 2017CXGC0706, Peng Cheng Laboratory Project of Guangdong Province PCL2018KP004, frontier science and technology innovation of China under Grant 2016QY05X1002-2, national regional innovation center scientific and technological special projects Grant 2017QYCX14, University Coconstruction Project in Weihai City.

Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.