Digital Twin for Human-Robot Interactive Welding and Welder Behavior Analysis

2021-04-22 04:05QiyueWangWenhuaJiaoPengWangMemberIEEEandYuMingZhangSeniorMemberIEEE
IEEE/CAA Journal of Automatica Sinica 2021年2期

Qiyue Wang, Wenhua Jiao, Peng Wang, Member, IEEE, and YuMing Zhang, Senior Member, IEEE

Abstract—This paper presents an innovative investigation on prototyping a digital twin (DT) as the platform for human-robot interactive welding and welder behavior analysis. This humanrobot interaction (HRI) working style helps to enhance human users’ operational productivity and comfort; while data-driven welder behavior analysis benefits to further novice welder training. This HRI system includes three modules: 1) a human user who demonstrates the welding operations offsite with her/his operations recorded by the motion-tracked handles; 2) a robot that executes the demonstrated welding operations to complete the physical welding tasks onsite; 3) a DT system that is developed based on virtual reality (VR) as a digital replica of the physical human-robot interactive welding environment. The DT system bridges a human user and robot through a bi-directional information flow: a) transmitting demonstrated welding operations in VR to the robot in the physical environment; b)displaying the physical welding scenes to human users in VR.Compared to existing DT systems reported in the literatures, the developed one provides better capability in engaging human users in interacting with welding scenes, through an augmented VR. To verify the effectiveness, six welders, skilled with certain manual welding training and unskilled without any training, tested the system by completing the same welding job; three skilled welders produce satisfied welded workpieces, while the other three unskilled do not. A data-driven approach as a combination of fast Fourier transform (FFT), principal component analysis (PCA),and support vector machine (SVM) is developed to analyze their behaviors. Given an operation sequence, i.e., motion speed sequence of the welding torch, frequency features are firstly extracted by FFT and then reduced in dimension through PCA,which are finally routed into SVM for classification. The trained model demonstrates a 94.44% classification accuracy in the testing dataset. The successful pattern recognition in skilled welder operations should benefit to accelerate novice welder training.

I. INTRODUCTION

WITH increasing demands on efficiency, quality,individualization, and flexibility, manufacturing is currently evolving to smart manufacturing from traditional automation style [1]. Although there has been no rigorous and consistent definition of “smart manufacturing”, it is a consensus that it is operated using data technology including advanced sensing,internet of things (IoT) [2], big data transmission, storage, and processing [3], artificial intelligence (AI)-enabled learning [4],and so on. These techniques significantly improve the manufacturing processes and system observability, controllability, and customizability.

The digital twin (DT), as a paradigm in IoT, aims to build digital replicas of physical systems upon the big data obtained from sensors [5]. Such a digital replica simulates not only the same elements as contained in a physical system but also the same equipment running dynamics such that it can run as the physical counterpart does in real-time like a twin, and that is where it is named [6], [7]. Through DTs, manufacturing process/system development and optimization can be moved from experimental trial-and-error studies to data-based virtual studies, such that large-volume heuristic experiments need to be done previously can be avoided with reduced development time and cost. DT-driven process optimization has been investigated in additive manufacturing [8], [9], joining and assembling [10]. As DT systems can be easily scaled upon the information availability, DT is also used in the production line and factory levels for their design and optimization. In the design stage of production-lines or factories, DTs are built to simulate their running states without installing high-value,large-size, and complex equipment physically. Furthermore, it is much easier to improve continuously due to the flexible and highly modular components in DTs. Some general blocks in manufacturing, including raw material supply, job scheduling,equipment maintenance, and product transportation, have been digitalized and used in the manufacturing of electronic devices[11], hollow glass [12], and 5G mobile edge computing [13]for their architecture optimization. To respond to the unpredicted disturbance in practical cases, DT is also used as a virtual experiment platform for control algorithm development and verification [14], [15] before the remedial actions take into effect in physical systems [16]. In such applications,the physical systems are protected by their DTs as an additional shield, for robustness improvement and shut-down risk reduction.

To build a DT, the information about the corresponding physical system is collected from installed sensors, which makes it possible to extract/analyze the information of interest to monitor the physical systems in real-time. By doing that,the uncertainty in manufacturing can be captured in real time which is common due to the environment variance. Then, the uncertainty can be managed and addressed by the optimization methods conveniently [17], [18]. Since all the information has been digitalized in DTs in preferable formats for storage,communication, and processing, the DT-based monitoring methods have the following favorable characteristics: 1) high availability regarding large-volume data, especially with the enhancement from cloud storage technology; 2) easy fault detection without personal check; 3) high scalability and customizability because of modularization design; and 4) low complexity for performing multi-tasks. As such, some studies have been conducted to develop DT-based monitoring systems for part variation detection [19], mass imbalance identification in rotating machines [20], fault diagnosis in distributed photovoltaic systems [21], and cloud personal health management [22]. In some complex systems where the information of interest is not explicit, advanced data processing and analysis have been applied. Tao et al. collect data from the DT of wind turbines and train a neural network for monitoring the gear status, including tooth wear, fatigue,and breakage such that the wind turbines can be repaired/replaced before faults occur [23]. Zhou et al. build a DT for power grid and train a neural network to identify critical criteria in the power grid, including critical clearing time (CCT), voltage stability, and low-frequency oscillation damping within one second [24]. By integrating machine learning, DT-based monitoring methods can recognize more essential patterns underlying collected data and shoot troubles more accurately.

In current developed DTs, humans are the observers of the physical systems where the information flow is unidirectional,i.e., human users receive the information from the physical systems or their DTs. However, physical systems cannot receive feedback or actions from human users. This unidirectional working mode is only effective for those processes dominated by machines. For processes where intelligence from humans is needed like precise welding,spraying, and rescue, human-machine interaction (HMI) or human-robot interaction (HRI) [25] needs to be integrated with the DTs such that humans’ operative ability can be enhanced and the roles they play transit from observers to dominators. To facilitate HMI or HRI, we need to select an effective interface from the existing candidates such as joysticks [26], haptic gloves [27], gestures [28], speech [29],and virtual reality (VR) [30], [31]. Among them, VR aims to build a computer-generated virtual environment and offer the immersive visualization and natural interaction to human users with cheaper cost and safer operating environment.Compared with other interfaces, it owns the following advantages such that we believe it is suitable for HMI or HRI when integrated with DTs: 1) some customer-grade VR systems have been commercialized with available application programming interfaces (APIs) for development, e.g., HTC VIVE, Facebook Oculus, and Sony PlayStation VR; 2) the immersive 3D virtual spaces in VR are perfect for visualizing elements and their dynamics in DTs; 3) VR offers natural and direct interactions by motion-tracking handles such that human users can demonstrate their operations as usually without additional adaptive practice. Hence, we select VR as the interface for human interaction and apply it to welding manufacturing as an application case. Compared with other common manufacturing processes, welding involves complex thermo-mechanical-metallurgy reactions such that the quality is dependent on multiple factors and cannot be controlled easily. That is also the reason why some precise welding tasks must be completed by skilled welders rather than welding robots. Currently, there is no reported literature about a digital twin framework for human-robot interactive welding which is the research gap this paper aims to fill. Furthermore, the human welder behaviors and corresponding welding results can be recorded easily for further analysis and constructing a big industrial dataset.

In summary, this study prototypes a DT based on VR for HRI in welding manufacturing, and it is organized as follows.Section II introduces the system configuration of the developed DT for HRI, followed by its working principles in Section III. Section IV gives the principles of data-driven welder behavior analysis, while the experimental verification is done in Section V. Section VI discusses the analysis and modeling results. The conclusions and future work are given in Section VII.

II. SYSTEM CONFIGURATION

As shown in Fig.1, the developed DT for HRI in welding manufacturing is composed of a physical HRI and its DT part.Our whole system is developed based on HTC VIVE VR system. In the physical HRI part, a human user holds a composite torch composed of a manual welding torch and a motion-tracking handle, as shown in Fig.2. This handle is part of HTC VIVE, and its surface markers can be captured by infrared cameras with its inertia measured simultaneously such that it can be tracked in real-time. The tracking accuracy can reach submillimeter for static state [32] and 1–2 mm error in dynamic tracking, which is sufficient to successfully complete some precise grasp and handling tasks [33]. Also,even for the unsuccessful tasks, tracking accuracy is not the main limiting factor but the force, as the human would teleoperate the robot to reach to the desired pose with the iterative visual feedback regardless the tracking error [33].Furthermore, compared with these application tasks in previous works, the movements in welding are relatively slow,which decrease the effect of tracking error further. Therefore,the tracking accuracy from HTC Vive can meet the welding application requirements. The attached handle endows the interactive ability to the manual welding torch such that the human user could demonstrate her/his welding operations naturally and directly offsite. Via the VR system, the demonstrated welding operations are recorded. In the physical welding environment, a robot with six degrees of freedom(6DoF), UR-5, executes the human user’s operations to conduct the welding jobs by attaching a gas tungsten arc welding (GTAW) torch as its end-effector.

Fig.1. Developed DT for HRI in welding manufacturing (a) schematic diagram; (b) real diagram.

Fig.2. The human user demonstrates the welding operations via a composite torch which is composed of a manual torch and a motion-tracking handle.

In this DT, all the vital elements in physical HRI, including the robot, welding scene, and human user, have their digital replicas to precisely simulate the dynamics in the physical world. When building the essential elements in this DT,different strategies are applied due to their various physical properties. For the rigid elements such as the robot, welding torch, and workpiece, their geometrical shapes maintain static,and only their positions and orientations need to be updated.To decrease the data volume and computation cost, their 3D models are pre-built offline and loaded into DT. For the deformable elements such as the weld pool and electrical arc,their images are captured by Point Grey FL3FW03S1C, an industrial camera that is laterally mounted on the robot. The weld pool images then go through a band-pass optical filter centered at 650 nm where the arc is weak, to avoid the intensive arc overwhelming the welding scenes. Then, the filtered images are projected on the workpiece surface under the welding torch in the DT system. Simultaneously, some other key information determining welding quality, including welding current, arc voltage, arc length, and welding speed, is sensed and visualized in text streaming. The DT is developed using C# in Unity, a game engine supporting all mainstream VR hardware.

III. SYSTEM WORKING PRINCIPLES

The working principles of the developed DT for HRI are shown in Fig.3, where four spaces, including human space(H), digital space (D), robot space (R), and welding space (W),coexist with a layered architecture.

Fig.3. System working principles. Physical HRI covers human space, robot space and welding space and its DT exists in digital space.

In one working cycle, the human user demonstrates her/his welding operations OHin human space while observing and analyzing the information IHfrom the DT. This reaction model, OH= ΜIOH(IH), is human user-dependent and characterizes the welding skill. The demonstrated welding operation vector OHincluding welding torch movement and the applied welding current

where PH= [xH, yH, zH]Tdescribes torch movement in 3D space, and QH= [QwH, QxH, QyH, QzH] is the quaternion characterizing the orientation of the welding torch human space. iHis the command welding current.

Then, the human operation is transmitted into the digital space as

where ODis the mapped welding operation in digital space,including position PD, orientation QD, and welding current iD;this transmission is completed via the motion-tracking handle and modeled as MHD. More specifically, the PHand QHcan be sensed in real-time by the internal motion tracking function from HTC VIVE with a 3D coordinate transformation

Fig.4. The human user controls the welding current by sliding the thumb position on touchpad.

The human user slides her/his thumb position p, which is scaled to [–1, 1] and mapped further to welding current range[imin, imax] as the applied welding current

A similar spatial coordinate transformation is done to transmit the welding operations from digital space to robot space

Since UR-5 robot only accepts axis-angle to characterize its orientation, the quaternion is transformed to axis-angle by

A local area network (LAN) is built for information communication between the physical and the digital worlds using TCP/IP protocol. The robot receives the transformed operation OR= [PR, rR, iR] and rotates its joints to reach the target pose [PR, rR] such that the robot can execute the human welding operations in robot space. In the meantime, the welding current is transmitted to the welding power supply via analog input/output (AIO) interface, such that the desired welding current is applied. With the movement of welding torch attached to the robot, the welding process is conducted in the welding space with welding information generated processing, and storage is significantly reduced without sacrificing the onsite scene reconstruction accuracy.

In VR environment, the displayed scenes are from computational simulation where no real elements are involved. This DT is built upon VR, but with the simulation models recursively updated by the real-time sensing data streaming. For example, in the developed DT system, a simulation model of the robot is pre-loaded to the DT system,which is similar as VR. Then the robotic movements are obtained from sensing data, such as the rotation angle of each joint in robots. So, the difference between VR and DT lies in whether real-world information is used to update the simulation models in the virtual display.

On the other hand, the AR environment is built upon the real world, with adding the virtual elements generated from computational simulations to the real-world scenes to enhance the users’ perception of the real world. Instead of being separated spatially as in DT, the real elements and virtual elements in the AR environment are coupled and presented to the users together.

Fig.5. The overflow of proposed welding skill level classification model from demonstrated operation data.

IV. WELDER BEHAVIOR ANALYSIS

Then, the downloaded position sequences are split randomly for constructing a training dataset and testing dataset with a ratio of 9:1. For welding, the travel speed of welding torch is a major parameter determining the heat input to workpiece and weld quality such that it is more important than the absolute position. Therefore, we transform the position sequences to speed sequences by

In order to augment the dataset size, downsampling is applied with the decimation factor M = 30

By downsampling, one original speed sequence with sampling frequency as 90 Hz (with N as length) can generate 30 speed sequences with a sampling frequency of 3 Hz (with N/30 as length), which is adequate since this process is relatively slow. Then, the size of the training dataset and the testing dataset is increased by 30 times.

The model training stage presents the principles of developed FFT-PCA-SVM. Firstly, the downsampled speed sequences v[n] are transformed into the frequency domain based on FFT, which is a more efficient way to realize the discreet Fourier transform (DFT)

where f = [f0, f1, …, fN−1] is the transformed frequency feature vector, and each element fkcharacterizes the spectrum magnitude of travel speed at a specific frequency (k/N)×Fori; N is the length of downsampled sequence length; Fori= 3 Hz is the sampling frequency.

We apply FFT to transform the speed sequences to frequency features considering the following three reasons: 1)the welding skill difference between skilled and unskilled welders can be characterized/classified better in frequency domain compared with the original time domain, which has been verified by previous research works [37]–[39]; 2) each feature fkin the frequency feature vector f is independent with others fl(l ≠ k) since the base functions e–2in/Nin FFT are orthogonal with each other. This independence meets the basic requirements by most machine learning models on training data. Otherwise, a more complex sequential modeling technique is needed; and 3) there is no information losing after FFT since the base function set [e–2i0/N, e–2i1/N, …, e–2i(N–1)/N]is a complete set. In our application, the transformed frequency features are symmetric about zero since the speed sequences are real numbers such that we only keep the positive frequency features. Furthermore, the computed frequency features in x, y, and z are concentrated as the final whose length is 3N/2 +3

In order to eliminate the unbalance among the feature vector F = [f0, f1, …, fN’–1] (N’ = 3N/2 + 3), normalization is applied by

Firstly, the original features are transformed into feature space by a transformed function Φ

where Φ can be a linear or non-linear function that transforms the original features FRinto a high-dimensional space as X. In model training and applying, Φ is defined implicitly, but another function characterizing the inner product of the two transformed feature vectors is defined as kernel function K by

Fig.6. The principles of SVM.

In this transformed feature space, we aim to find a hyperplane, W×X + b = 0, to separate the data into two classes(labeled as Y whose value can be +1 or –1) with maximizing the separating gap width d = 2/||W|| between the two margin hyperplanes W×X + b = ±1

In order to avoid the situation where the data cannot be separated perfectly, the “soft-margin” is used adding an ε as the penalty for imperfect separation conditions

where λ is the trade-off parameter for margin distance and separability; L is the training data size, and this problem can be solved using quadratic programming [40].

V. EXPERIENTIAL VERIFICATION

In order to verify the effectiveness of the developed DT for human-robot interactive welding and developed FFT-PCASVM for welder behavior analysis, six welders, three skilled with certain manual welding training and three unskilled without any training, are invited to complete a simple welding task where the parameters for welding and workpieces are shown in Table I. Each welder completes the welding job ten times and no error occurs. During our experimental verification stage, it is found that the weld beads from the skilled

TABLE I WELDING AND WORKPIECES PARAMETERS

Fig.7. The welded workpieces from the welders with different professional levels. (a) and (b) are the front and back side from skilled welders and the quality is satisfied; (c) and (d) are the front and back side from unskilled ones and the quality is unsatisfied.

welders are uniform and satisfied in both sides, but those from the unskilled welders are not satisfied as shown in Fig.7.

The difference in the resultant welding qualities comes from their demonstrated operations, which are downloaded from the cloud and shown in Fig.8. It can be found intuitively that welding operations demonstrated from the unskilled welders have larger fluctuation (circled in Fig.8) and larger instability in welding quality, compared with the skilled welders. In order to verify the effectiveness of developed FFT-PCA-SVM in analyzing and classifying the welder behaviors from the demonstrated operations, their demonstrated operation data are processed in the next section.

VI. RESULTS AND DISCUSSION

After each invited welder demonstrated 10 times for the same welding task, all the demonstrated data are downloaded from cloud. For each welder, one sequence is selected randomly for constructing the testing dataset with other nine sequences kept for model training. The speed sequences are generated from (9). Fig.9 shows the generated sequences from Fig.8 and its length is 1800 which is the product of welding time and sampling frequency.

Fig.8. Facing the same welding task, the human users with different professional levels demonstrate different welding operations. X is the welding direction; Y is the lateral direction, and Z is vertical direction; (a) from a skilled welder; (b) from an unskilled welder.

Fig.9. The demonstrated speed sequences computed from position sequences (a) from a skilled welder; (b) from an unskilled welder.

Fig.10. The histogram of the computed criterion (W×Φ(FR) + b) for classification in 5-folder cross validation. (a) liner; (b) poly-2; (c) poly-3; (4) poly-4; (5)rbf; (6) tanh.

After downsampling, the sizes of training dataset and testing dataset are 1620 and 180, respectively. The frequency feature vector F is obtained using FFT and its unbalance is eliminated after normalization in (13). In data dimension reduction using PCA, the principal component number M is identified as M =24 by the maximum-likelihood estimation method proposed by Minka [41]. Therefore, the normalized frequency feature dimension is reduced to 24 from 93.

The trade-off parameter λ is applied as 1 to train the SVMs using scikit-learn library [42] with the unskilled operations labeled as –1 and skilled ones labeled as 1. The 5-folder crossvalidation is done with different kernel functions, including liner, 2-degree polynomial (poly-2), 3-degree polynomial(poly-3), 4-degree polynomial (poly-4), radial basis function(rbf), and hyperbolic tangent (tanh). By applying different kernel functions, the data is transformed into different transformed feature space and shows different linear separability, as shown Fig.10 and Table II.

TABLE II 5-FOLDER CROSS VALIDATION PERFORMANCE WITH DIFFERENT KERNELS

Fig.10 shows the distribution of the computed criterion W×Φ(FR) + b. W×Φ(FR) + b = 0 is the hyperplane for classification and the area |W×Φ(FR) + b| < 1 is the separating gap. It can be found that there are two obvious clusters in the transformed feature space and the data shows good separability when the kernel functions are liner, rbf or tanh.Simultaneously, the validation accuracies from these models are quite satisfactory, over 90%. When the kernel functions are polynomials (poly-2, poly-3, and poly-4), there are no obvious separate clusters and the separability is worse though the classification accuracy is good in poly-3 case. In all these models, the SVM with rbf as kernel function gains the best cross validation performance where the prediction accuracy is 94.32%. Therefore, the rbf is applied as the kernel function to train the final SVM model and the testing performance is shown in Fig.11 and Table III.

TABLE III TESTING CONFUSION MATRIX

From Fig.11, it can be found that most testing data points(150 of 180) locate beside the separating gap. They are shown as two separable clusters which is preferable for classification.Table III shows the confusion matrix of the prediction results,among which 82 of 90 demonstrated operation sequences from the unskilled welders are predicted correctly and 88 of 90 demonstrated operation sequences from the skilled ones are predicted correctly. That makes sense since even unskilled welders can also demonstrate some good operation sequences like the skilled welders occasionally, but the probability for the skilled welders to demonstrate the non-professional operations is much lower. In total, the classification accuracy is 94.44% and verifies the effectiveness of the developed method to identify the profession level from their demonstrated operation.

Fig.11. The histogram of the computed criterion (W×Φ(FR) + b ) for classification in testing data.

VII. CONCLUSIONS AND FUTURE WORK

By integrating VR as an HRI interface, the developed DT owns the interactive ability with human users and is applied to welding manufacturing successfully. In such a DT, all the key elements in a physical world have their digital replicas and these replicas run with the same dynamics as the physical ones by the information communicating between the physical and digital worlds. Human welders with different professional levels (skilled and unskilled) can complete the same welding job successfully but demonstrate operations with different patterns. The FFT-PCA-SVM algorithm is developed for identifying the professional levels from the demonstrated data by transforming the speed sequences into frequency domain,reducing dimension, and classifying in transformed feature space. The final testing accuracy is 94.44% that verifies the effectiveness of developed method. In the future work, we plan to investigate efficient novice welder training based on this developed human-robot interactive welding with the recognized patterns from skilled welders and also upgrade this system to support multi-robot collaboration such that some more complex welding operations such as metal-feed and laser-arc hybrid welding can be completed by this system. As such, the system applicability can be increased greatly.