Ruofan Wu, Zhikai Yao, Jennie Si,, and He (Helen) Huang,
Abstract—We address a state-of-the-art reinforcement learning(RL) control approach to automatically configure robotic prosthesis impedance parameters to enable end-to-end, continuous locomotion intended for transfemoral amputee subjects.Specifically, our actor-critic based RL provides tracking control of a robotic knee prosthesis to mimic the intact knee profile. This is a significant advance from our previous RL based automatic tuning of prosthesis control parameters which have centered on regulation control with a designer prescribed robotic knee profile as the target. In addition to presenting the tracking control algorithm based on direct heuristic dynamic programming(dHDP), we provide a control performance guarantee including the case of constrained inputs. We show that our proposed tracking control possesses several important properties, such as weight convergence of the learning networks, Bellman (sub)optimality of the cost-to-go value function and control input, and practical stability of the human-robot system. We further provide a systematic simulation of the proposed tracking control using a realistic human-robot system simulator, the OpenSim, to emulate how the dHDP enables level ground walking, walking on different terrains and at different paces. These results show that our proposed dHDP based tracking control is not only theoretically suitable, but also practically useful.
POWERED lower limb prosthesis provides great promise for amputees to regain mobility in daily life. Its potential has been demonstrated for transfemoral amputees’ walking ability [1], [2]. Such robotic devices rely on an impedance control framework which is designed based on human biomechanics to mimic the central nervous system controlled human joint movements to provide a natural substitute to the lost limb functions. These devices require customization of the impedance parameters for each individual user. Currently,configuration of the powered devices is performed in clinics by technicians who manually tune a subset of impedance parameters over a number of visits of the patient. This procedure is time and labor intensive for both amputees and clinicians. Therefore, an automatic approach to tuning the powered prosthesis parameters is needed.
Automatically configuring the impedance parameter settings has been attempted over the past several years. An untested idea aims at estimating the joint impedance based on biomechanical measurements and a model of the unimpaired leg [3],[4]. This idea may not be practically useful as the biomechanics and the joint activities of amputees are fundamentally different from those of the able-bodied population. Another approach is to constrain the knee kinematics via the relationship of the joint control and intrinsic measurements,which in turn requires careful modeling and thus may not be feasible [5], [6]. Such an approach relies on significant domain knowledge and is tuning time. A cyber expert system was proposed [7] to emulate the prosthetists’ tuning decisions of human experts into configuring the control parameters.This approach heavily relies on the expert’s experience and is not expected to scale well to more joints and different users and tasks.
As those methods all have their fundamental limitations in principled ways, new approaches to configuring the prosthesis control parameters are needed. RL based adaptive optimal control approach is a promising alternative as they have demonstrated their capability of learning from data measurements in an online or offline manner in several realistic application problems including large-scale control problems [8]–[13]. The core of the RL methods is the idea of providing approximate solutions to the Bellman equation of optimal control problems. We have successfully developed several RL algorithms to configure impedance parameter settings, including actor-critic RL [14]–[17] and policy iteration based RL approaches [18]–[21], and systematically tested them in both extensive simulations and in experiments using able-bodied and transfemoral amputee subjects.
All of our RL control approaches to date require a target knee motion profile which can only be subjectively determined. Nonetheless, those results are important as they provided the necessary understanding of the impedance control parameter configuration problem and if RL control is capable of solving this problem. Even though detailed understanding of human locomotion at neurological and biomechanical levels has long been established, the individual human subject’s locomotion dynamics are still not feasible to model accurately by mathematical descriptions as individuals differ physically, biologically and neurologically. Additionally, different locomotion tasks, such as changing pace [22],sloped walking [23] and walking on uneven terrain [24] all have significant influence on human gait behavior. As such,accurately prescribing each and every locomotion behavior for control purposes is not feasible.
Tracking the intact knee joint motion by a prosthetic knee is an intuitive idea as the intact knee kinematics is the most natural and realistic target: it contains actual biological joints’inter-relational information, which makes it a good candidate to replace a subjectively defined knee profile. Studies have shown that bilateral coordination between two legs are needed in the regulation of bipedal walking to maintain stability, and that such interlimb cooperation can be accomplished at a spinal level. Since the spinal level locomotor networks are symmetrically organized [25], sensory and muscle activities of both sides are involved in rhythmic walking. Amputees usually display asymmetrical walking by relying heavily on their intact limbs because of the loss of sensory feedback.
Tracking the intact knee actually has been explored years ago. Grimeset al.developed a mirror control scheme for the stance phase (not a complete gait cycle). It tracked the sound limb’s knee trajectory in the stance phase by multiplying a gain factor to avoid over flexion while a fixed trajectory was applied in swing phase. Bernal-Torreset al.copied the full gait trajectory by the Kalman filter with a biomimetic designed prosthesis but no human experiment or systematic simulations were reported [26]. Joshiet al.developed a control strategy by controlling the swing time to mirror the stride duration of the intact knee while the prosthesis was locked during stance phase [27]. Sahooet al.aimed at mirroring the step length by controlling the push-off force[28]. The above approaches focused on either part of a gait cycle or the outcome measurement such as step length and stance time. None of them has shown feasibility of tracking a completed gait cycle.
Virtual constraints were proposed to generate coordinated joint motions as target joint motion profiles for the robotic knee to track [29]. Biomimetic virtual constraints described the joints’ geometric relationships and were encoded by hybrid zero dynamics [30]. However, there are a few limitations on this approach. Virtual constraints require a simplified human model to establish the geometric relationship among joints. Such a model is difficult to establish for a human-prosthesis system. In a recent work of prosthesis control design based on the virtual constraints [29], only a proportional gain was derived and applied. The overall human-prosthesis performance during locomotion is yet to be demonstrated.
Tracking an intact knee motion poses additional challenges beyond regulation control. Individual prosthesis users demonstrate very different gait patterns, which are also very different from one another, and also from healthy subjects.This may be caused by individual’s physical conditions and/or biological factors such as reduced or lost proprioception and condition of socket fitting. It has long been believed that the intact knee motion pattern changes as the user learns to walk in a prosthesis (our data below also provide corroborating evidence about this), tracking a moving target knee motion has never been demonstrated by any controller. Most existing tracking control designs based on mathematical models of the human-prosthesis dynamics and the reference trajectory, both are difficult or actually impossible to obtain. For data-driven adaptive optimal control, reports on data-driven tracking control are much less than those of regulation control. Most approaches are based on reinforcement learning to establish approximate solution to the HJB equation, yet, few of them have demonstrated feasibility in engineering problems. And even fewer results can be found to provide systematic studies on the control design to demonstrate applicability.
In this paper, we propose an RL tracking control scheme for the robotic knee to mimic the intact knee in different locomotion tasks. In a previous experiment [31], we successfully tested this pilot idea of RL tracking control to automatically configure impedance parameter settings. In this study, we formally formulate the tracking control problem, develop a complete tracking control algorithm based on dHDP, and provide an analytical framework to validate the real time control performance guarantee by using this proposed scheme.The contributions of this work include the following.
1) We provide the first systematic demonstration of an endto-end, continuous walking enabled by automatic tracking control of a wearable lower limb robotic device beyond our previous regulation control results [14]–[21].
2) We show that our proposed tracking control possesses several important properties, such as weight convergence of the learning networks, Bellman (sub) optimality of the cost-togo value function and control input, and practical stability of the human-robot systems.
3) We provide a systematic evaluation of the proposed tracking control using a realistic human-robot system simulator to demonstrate level ground walking, slope walking under various slope angles and walking under different paces. These results show that our proposed dHDP based tracking control is not only theoretically suitable, but also practically useful.
The remaining of this paper is organized as follows. Section II describes the human-prosthesis system and develops dHDP to solve the tracking problem. Section III gives the Lyapunov stability analysis of system. Section IV presents the implementation using Opensim simulations. Section V presents extensive simulations. Discussions and conclusion are presented in Section VI.
Our proposed RL tracking control is built upon the finite state machine (FSM) impedance controller (IC) framework. It is to mimic the torque-generating capability of biological joints to enable natural movement. Humans reportedly control muscle activity to adjust joint impedance in walking[32]–[34], and the compliant behaviors of legs are fundamental to human locomotion [35], [36]. Within this context,impedance represents the inherent property of a mechanical joint. It describes the relationship between external force and motion produced or in other words, it is a dynamic property that governs human joint-torque relationship. In our study, the prosthesis joint is therefore characterized by the stiffness (K),damping ratio (B), and equilibrium position ( θe). The FSM-IC provides intrinsic control in the form of adjustable control torque influenced by impedance parameters. The settings of the impedance parameters as control inputs have to be adjusted or adapted to meet individuals’ needs including their different physical conditions. Our proposed RL tracking control is to automatically provide such needed impedance parameter settings. In turn, the joint impedance affects the knee kinematics such as peak value of knee angle collaboratively. External force such as ground reaction force and human reactions also affect knee angle or peak value.However, many of these factors are difficult, or nearly impossible, to be modeled. Gait duration is influenced by impedance control parameters [17], as well as human’s movement control of residual hip and intact limb in gait.Meanwhile, the transition between double-stance and singlestance will influence the duration as well since they have different dynamics. Therefore, the relationship between impedance and peak knee angle/duration is not deterministic and difficult to model. These challenges motivate us to consider data-driven RL control.
The FSM-IC is common for prosthesis intrinsic control as studies have shown that humans control the stiffness of leg muscles and therefore joint impedance while walking, and compliant behaviors of legs are instrumental for human walking. The impedance controller generates a torque input to the robotic knee based on current knee kinematics and knee joint impedance settings.
Refer to Fig. 1(b), a gait cycle is divided into four phases in the FSM-IC: stance flexion (STF,m=1), stance extension(STE,m=2) , swing flexion (SWF,m=3) and swing extension (SWE,m=4). The phase transitions are determined by knee motion and gait events (heel strike and toe-off) that are obtained from vertical ground reaction forces of both legs.In each phase of the FSM, three impedance parameters(stiffnessK, dampingB, and equilibrium position θe) are provided as inputs to the FSM-IC for gait cyclek
Fig. 1. Two control loops are coordinated to realize automatic control of the prosthetic knee based on an impedance control framework: i) an intrinsic impedance control (left most panel of (a)) that operates at 100 Hz that creates a control torque according to (2) based on the knee kinematics and also the impedance settings, and ii) an RL controller that is updated at each step k. The output is the adjustment of the impedance setting that is to be used in the intrinsic controller. (a) Flow chart of the automatic robotic knee control parameter tuning scheme by dHDP. (b) The dHDP controller to adjust the impedance setting. (c)The structure of the critic network in the RL controller.
The RL controller will adjust these impedance parameters,i.e.,
so that the updated impedance parameters are applied to the FSM-IC to generate knee torque according to (2)
Biomechanical studies have shown that the intact knee joint movements or profiles change as amputees adapt to a prosthetic device [37]. We have observed the same in our pilot study using two human subjects [31]. Fig. 2 is an illustration of the same phenomenon using simulations where the intact knee kinematic trajectories were recorded, and changes in profile features are clearly observed. The goal of the RL controlled robotic knee is therefore to track those time varying intact knee profile features for each and every phase during each and every gait cycle. For a gait cyclek, the robotic knee motion (Fig. 1(b)) featured by the peak knee angle(degrees) and duration(seconds) is measured. Let
Fig. 2. An illustration of how the intact knee profile changes as the robotic knee control parameters adapt during a level ground walking simulation session. The peak angles (blue) and the phase durations (orange) are shown in all phases.
Similarly, we measure the peak knee angle and duration of the intact knee, and let
We consider a human-robot, i.e., an amputee-prosthesis system, a discrete-time nonlinear dynamic system with unknown nonlinearities. For the ease of discussion, we let the dynamics be represented by
where the nonlinear mappingFis Lipschitz continuous on the domain ofwhere Z and U are compact sets with dimensions ofNZandNu, respectively. In the human-robot system under consideration,Frepresents the kinematics of the robotic knee, which is affected by both the human wear and the RL controller. Because of a human inthe-loop, an explicit mathematical model as (7) is intractable or impossible to obtain.
Without causing any confusion and for the sake of convenience, we drop the superscriptm(m=1,2,3,4) in the rest of the paper because all four FSM-ICs and their respective RL controllers share the same structure, although the RL controllers for each phase have different parameterizations or in other words, the control policies are different for each phase even though they have the same structure. Then, the tracking error between the intact knee and the prosthetic knee is defined as
Fig. 1(a) depicts the RL based solution approach to automatically configure the impedance parameters of the robotic knee to track the intact knee joint motion within the FSM-IC framework. Each RL control block corresponds to one of the four FSM phases. As shown in Fig. 1(a), we develop a dHDP based RL tracking control with each of the four dHDP blocks providing impedance parameter settings for each of the four gait phases. Each dHDP block has an action network and a critic network, trained for the given FSM phase only.
In the RL tracking controller, let the state be denoted bysk,and the control input/action network output asukfor gait cyclek, i.e.,
We consider the stage cost in a quadratic form
whereRs∈R2×2andRu∈R3×3are positive definite matrices.Note that, our following results are also applicable to other stage costs such as a reinforcement signal of finite discrete levels or a general bounded sefmi-definite reinforcement signal.
We consider the tracking problem as one to devise an optimal control law via learning from observed data along the human-robot interacting system dynamics. We define the state-actionQ-function or the total cost-to-go as
Note that theQ(sk,uk) value is a performance measure when actionukis applied at statesk. SuchQ(sk,uk) formulation implies that we have considered the optimal adaptive tracking control of the robotic knee as a discrete-time, infinite horizon,discounted problem without knowing an explicit mathematical description of the human-robot interacting dynamics.
For theQ-function in (11), it satisfies the Bellman equation
Assumption 1:The state trajectoryYkof the intact knee is bounded, and the initial robotics knee stateZ0is bounded.
1) Critic Network:Fig. 1(c) depicts the structure of the critic network which is realized by an universal approximator with one hidden layer. Therefore, the approximated value is
The approximation error of the critic network is
In the following, we use the short-hand notationUk−1forU(sk−1,uk−1) and similarly for others. The weightsandare updated as
According to the gradient descend rule as in [38], ∆Wc1,kand ∆Wc2,kcan be written as
where ϕc,kis the output of hidden layers in critic network, andlc,kis the learning rate.
2) Action Network:The output of the action network is the control input
Based on the design principle of dHDP [38], the action network is to minimize the total cost-to-goQ(sk,uk). We defined the prediction error of the action network as
Similarly to (16), the weightsandare updated as
In this section, we provide a qualitative analysis for the weight convergence of the actor-critic networks, the Bellman(sub) optimality of the control policy, and practical stability of the human-prosthesis system.
To guarantee that the second and the third terms in the last expression are negative, we need to choose learning rates in the following manner:
Remark 2 (Practical Stability):From Assumption 1,Remarks 1–3, and Theorem 1, we obtain that all the signals(such asuk,Zk,Yk, andsk) are bounded in the human-robot system (7). As shown in Theorem 2, the approximatedQvalue in (13) and the resulted policy (17) achieve (sub)optimality as time stepkincreases. This demonstrates that the stage cost in (10) approaches zero in the sense of uniformly ultimately boundedness. As such, the tracking error (8)approaches zero in the sense of uniformly ultimately boundedness also as time stepkincreases. Therefore, the considered human-robot system is practically stable.
Table I is a summary of how our proposed dHDP tracking control is implemented and applied to configure the impedance parameters of the robotic knee. Pertinent information for the implementation steps is provided below.
TABLE I IMPLEMENTATION OF SIMULATION STUDIES
We investigated this tracking control problem using OpenSim, a well-established simulator in the field of biomechanics [41]. A bipedal walking model, as shown in Fig. 3(a), includes a body of five rigid-segments, linked through a one degree of freedom pin joint and the pelvis was linked to the ground by a free joint to allow free movement.The model settings, such as segment length, body mass and inertial parameters, followed the lower limb OpenSim model[42]. In this study, the left knee was defined as intact while the right as prosthetic to enable locomotion in a single simulation model. The intact knee was controlled by a fixed set of impedance parameters settings while the prosthetic knee was controlled by the RL controller with its impedance parameters updated for each gait cycle.
Fig. 3. OpenSim model. (a) Five-rigid segment bipedal walking model; (b)The next gait is used to measure the tracking error as the prosthetic knee needs to copy the intact knee that is half gait ahead.
Simulating a gait cycle in OpenSim requires specifying initial model settings such as walking speedvxandvy, both knee angles θLand θRwhich are set to small numbers near stance position. We added artificial Gaussian noises to the statesand controluto simulate sensor noise and actuator noise, a consideration to make the simulations more realistic.The sensor noise magnitude was set at 20% of the performance tolerance bound and the actuator noise was 1%of the maximum action value.
All simulations were carried out in trials. A trial is a continuous experiment of 500 gait cycles when RL tracking control is applied to tuning the impedance parameters of the robotic knee under different simulation scenarios. We performed two sets of simulations: training trials and testing trials. During training, we performed 30 training trials each from a randomly initialized controller, i.e., a set of randomly generated initially feasible impedance parameter settings, that allow the simulator to simulate balanced walking without falling. During testing, we randomly selected 10 successful controllers (i.e., 10 sets of impedance parameters after training) and applied those control policies (i.e., the actor network weights) as initial controller parameter settings for tracking new, untrained trajectory profiles. Then we tested each of the 10 controllers (policy network weights) to perform 30 new trials, each of which has a new set of randomly selected initial impedance parameters.
A trial is considered successful if the tracking error in (8)reached an error tolerance bound (Table II, bottom row).Specifically, for each of the 4 phases (Fig. 1(b)), if the tracking error was within the tolerance bound for 8 out of 10 consecutive gait cycles, tracking process in this phase was considered convergent. If all 4 phases had converged within 500 gait cycles, the trial was a success.
To ensure subjects safety (not stumble or fall), a safetybound was introduced based on the realistic conditions of balanced walking. Specifically, as shown in Table II (top row), the safety bound was set at 1.5 standard deviations above the knee kinematic peak values observed in each phase[43]. If the tracking errors exceed the safety bound, which means the prosthetic profile may place subjects in unsafe areas, the impedance parameters of the prosthetic knee will be reset to the initial impedance. Note however, the actor and critic network weights are retained for further training until meeting tracking criteria.
TABLE II SAFETY BOUND AND TOLERANCE BOUND
In obtaining all results, we set the weighting matrices in the stage cost (10) as:Rs=diag(1,1) andRu=diag(0.1,0.1,0.1).For the critic network, we used 8 hidden layer neurons with hyperbolic activation function and we used a linear output layer. For the actor network, we used 6 hidden layer neurons with hyperbolic activation function and we also used a hyperbolic activation function for the output layer neuron so that the control inputs are constrained. For both networks,learning rate was 0.1. The actor-critic network was updated every gait cycle until reaching trial success.
To evaluate the efficacy of the proposed dHDP tracking control, we performed a systematic simulation study to evaluate human-robot walking performance under RL tracking control. We emulated three walking conditions: ground walking, walking on different terrains and at different paces.
Scenario 1 (Level Ground Tracking):The intact knee was operated by a fixed set of impedance parameters at all time while the robotic knee impedance parameters were controlled by RL controller. Note that, fixed impedance parameters still provide realistic control according to (2) as the intact knee was also controlled by FSM-IC. For both the training stage and the testing stage, the RL controller was required to successfully track the intact knee profile within 500 gait cycles, while the OpenSim setting was placed at constant walking pace under level ground condition.
Scenario 2 (Walking on Different Terrain):To simulate walking on different terrains, we provided different gait profiles for the intact knee which correspond to different impedance control parameters. The changing profiles of the intact knee were then the new moving targets for the prosthetic knee to track. Five randomly initialized impedance parameter sets that could enable balanced walking of the human-robot system were generated. The impedance parameters of the intact knee were randomly selected from this pool of 5 different gait profiles every 20 gait cycles.During training, the controller was required to successfully track the intact knee for 3 consecutive times with respective knee profiles. During testing, a new pool of 5 sets of impedance parameters was generated and used on the intact knee, the respective profiles of which were tracked by the robotic knee. Again, the controller was required to successfully track the intact knee profile for 3 consecutive times.
Scenario 3 (Changing Pace):We examined RL tracking performance when the pace changes. The simulated pace changes were implemented in a sequence of [100%→112%→100%→88%] of the initial pace. Changing pace took place once the controllers successfully tracked the intact knee by meeting convergence criteria. The controller must complete the full sequence to complete the training stage. In the testing stage, the pace change was placed in a different order of[100%→80%→100%→120%] which also signifies a greater variance than the respective conditions for training.Same as in the training stage, each controller must finish the full sequence to be counted as a success.
All three scenarios were simulated in both training and testing stages with the same human subject. Table III shows the tracking performance for all three scenarios.
TABLE III SIMULATION RESULTS OF ALL THREE SCENARIOS
In Scenario 1, a 1 00% success rate was achieved with 64.6 average steps to fully learn to track the intact knee. The RMS tracking error was reduced from 0.0588 to 0.0131 radian of peak angle and from 1 .96% to 0.69% of phase duration. In the test stage, because a trained actor network was used in initialization, an improvement in tuning speed was observed from 64.6 to 36.3 average steps. The RMS tracking error has a similar performance.
In Scenario 2, a 97% success rate was achieved with 55.8 average steps for successfully tracking the intact knee each time. The RMS tracking error was reduced from 0.0581 to 0.0124 radian of peak angle and reduced from 1 .82% to 0.66%of phase duration. In the test stage, the trained policy of actor network shows the ability to track the target profile rapidly.An average of 19.16 tracking steps with a 100% success rate shows the ability to track a changing target effectively.Fig. 4(a) shows an example of typical training and testing trials. The RMS tracking error has a similar outcome in both training and testing as they share the same success criteria.
Scenario 3 focuses on examining tracking performance reflected in gait duration. 80% success rate was achieved with 52.5 average steps for successfully tracking the intact knee at different paces. The RMS tracking error was reduced from 0.0573 to 0.0103 radian of peak angle and reduced from 1.90% to 0 .92% of phase duration. In the test stage, the trained policy of actor network shows a slightly improvement on tracking speed. But it greatly improved the success rate from 0.8 to 0.92. Fig. 4(b) shows an example of typical training and testing trials. The RMS tracking error has a similar outcome in both training and testing as they share the same success criteria.
Fig. 5 shows RMS tracking errors for all trials in all simulation scenarios. A significant reduction in tracking error was observed under all conditions. The difference between training and testing is minor because they use the same randomization in initial impedance parameters and the same convergence criteria.
Fig. 4. Typical trials for Scenario 2 (a) and Scenario 3 (b). The left columns show tracking errors during training while the horizontal green dashed lines represent tolerance bounds and the grey vertical dashed lines represent task/pace transitions. The right columns show tracking trajectories during testing.
Fig. 5. Summary of RMS tracking errors of the peak angle (top) and phase duration (bottom) for all three scenarios. The blue bars represent the initial tracking errors while the red bars represent the final tracking errors.
We have introduced a new RL based tracking control scheme for automatic tuning of robotic knee impedance parameter settings of a robotic knee to track the intact knee kinematics. For the first time, we successfully demonstrated stable and continuous walking in simulations of a human wearing a robotic prosthesis which was designed to track the intact knee motion.
Mirroring the intact knee motion by a prosthetic knee is an intuitive idea which has been proposed for decades, but has not been successfully demonstrated. The robotic knee control to mimic the intact knee joint is a tracking problem in classical control. Even though many control theoretic solutions exist, such as backstepping [44]–[46], observerbased control [47] and nonlinear adaptive/robust control [48],[49], they are inadequate for this problem as they require an accurate mathematical description of the system dynamics,which involve co-adapting human and robot in this case, and which are nearly impossible to obtain. Additionally, those control theoretic approaches focus on the stabilization (in the Lyapunov sense) of the nonlinear dynamic systems without addressing control performance such as the Bellman optimality.
Recently, some results emerged to tackle these issues using data-driven, learning enabled, nonlinear optimal tracking control designs [50]. Unfortunately, many of the reported results have focused on theoretical analyses, which are usually based on requiring a reference model for the desired movement trajectory and/or control trajectory. They are thus not practically useful.
In this paper, we have presented a complete tracking control algorithm based on dHDP. Additionally, we have systematically evaluated the performance of the proposed tracking controller. Our simulation results have shown effectiveness of the tracking controller for different walking tasks that emulate level ground walking, walking on different terrains and at different paces. Based on our previous work, we expect these new results to be verified in human experiments at a future time.
Then, by using the cyclic property of matrix trace, the last term in (52) is bounded by
We obtain (30) by substituting (52), (53) into (50). ■
IEEE/CAA Journal of Automatica Sinica2022年1期