Low Carbon Economic Dispatch of Integrated Energy System Considering Power Supply Reliability and Integrated Demand Response

2022-08-03 01:30:16JianDongHaixinWangJunyouYangLiuGaoKangWangandXiranZhou

Computer Modeling In Engineering&Sciences 2022年7期

Jian Dong，Haixin Wang，Junyou Yang，Liu Gao，Kang Wang and Xiran Zhou

School of Electrical Engineering，Shenyang University of Technology，Shenyang，110870，China

ABSTRACT Integrated energy system optimization scheduling can improve energy efficiency and low carbon economy.This paper studies an electric-gas-heat integrated energy system，including the carbon capture system，energy coupling equipment，and renewable energy.An energy scheduling strategy based on deep reinforcement learning is proposed to minimize operation cost，carbon emission and enhance the power supply reliability.Firstly，the lowcarbon mathematical model of combined thermal and power unit，carbon capture system and power to gas unit(CCP)is established.Subsequently，we establish a low carbon multi-objective optimization model considering system operation cost，carbon emissions cost，integrated demand response，wind and photovoltaic curtailment，and load shedding costs.Furthermore，considering the intermittency of wind power generation and the flexibility of load demand，the low carbon economic dispatch problem is modeled as a Markov decision process.The twin delayed deep deterministic policy gradient(TD3)algorithm is used to solve the complex scheduling problem.The effectiveness of the proposed method is verified in the simulation case studies.Compared with TD3，SAC，A3C，DDPG and DQN algorithms，the operating cost is reduced by 8.6%，4.3%，6.1%and 8.0%.

KEYWORDS Integrated energy system;twin delayed deep deterministic policy gradient;economic dispatch;power supply reliability;integrated demand response

1 Introduction

In recent years，with the development of the world economy and the increasing depletion of fossil fuels，the problem of insufficient energy supply has become increasingly prominent [1，2].On 05 March，2021，China pointed out in the government work report that CO2emissions would be at the peak by 2030 and achieve carbon neutrality by 2060.As an important carrier for the development of the energy internet，the integrated energy system(IES)can promote the coordination and complementation of various energy sources [3，4].Meanwhile，the IES has made significant contributions to building clean，low-carbon and efficient energy systems.However，with the deepening of energy coupling，the IES is confronted with enormous challenges due to fluctuating wind and PV power outputs and the uncertainty of multi-energy demands [5-7].

The operation economy and power supply reliability are two essential factors in energy management and optimization of the IES [8-10].Power supply reliability is an important indicator to measure the stable operation of the power grid，and economic benefit is an important goal in the development of IES [11].

Regarding the low-carbon operation of IESs，Wang et al.[12] proposed an optimal scheduling model based on the carbon trading mechanism.IES operators can purchase or sell carbon quotas in the carbon trading market.The results show that considering carbon trading can reduce the operation cost of the IES.Zhai et al.[13] proposed an economic dispatch method for low-carbon power system considering the uncertainty of electric，thermal and cold loads.Yang et al.[14]proposed an optimal scheduling model for the combination of the microturbine(MT)and power to gas(P2G)units.The typical load scenarios are obtained by scenario generation and reduction techniques to improve wind power consumption and reduce carbon dioxide emissions.Although the above literature can realize the low-carbon operation of the system，the flexible resources of the demand side are not considered.

Considering integrated demand response(IDR)in IES can promote renewable energy consumption and reduce carbon emissions.Bahrami et al.[15] established a power system scheduling model considering demand response(DR)resources and carbon trading and verify its effectiveness in promoting wind power consumption and reducing carbon emissions.Zeng et al.[16] introduced the split-flow carbon capture power plant and DR into the IES to achieve low carbon.

Although remarkable results are achieved in the economic dispatch of the IES，the above models are solved by traditional methods，and the optimization effect is dependent on the prediction accuracy of sources and loads.With the development of the artificial intelligence(AI)technique，reinforcement learning(RL)has been paid more attention to the optimal control of power system [17-20].The RL model can accumulate experience and improve policies by continuous interaction with the environment.In particular，the deep reinforcement learning algorithm combining deep neural network and reinforcement learning is of better adaptive learning ability and optimized decision-making ability for nonconvex and nonlinear problems [20-22].In [23]，microgrid(MG)real-time energy management based on deep reinforcement learning was proposed，and MG energy management is described as a Markov decision process(MDP)to minimize the daily operating cost.In [24]，the energy management of IES was described as a constrained optimal control problem and solved by the asynchronous advantage actor critic algorithm.

The above studies provided the foundation for the application of the DRL approach in the IES.However，most of the above models only consider the economy and security of IES，without consideration of carbon dioxide emissions and other indicators in the system.In addition，in the face of collaborative optimization of multi-energy and energy storage，the model training may be time-consuming and prone to non-convergence.

This paper proposes a TD3-based integrated energy system source-load coordination optimization scheduling framework.Firstly，a combined optimization model of combined thermal and power(CHP)，power to gas(P2G)and carbon capture system(CCS)is established，which can realize thermal-power decoupling and reduce carbon emissions.Secondly，the multi-objective optimization problem of IES is described as a Markov decision process，the environmental model of IES is established，and the action space，state space and reward function of the agent are designed.Finally，the low-carbon economic dispatch problem is solved by the twin delayed deep deterministic policy gradient(TD3)，and the convergence ability and stability of this method are analyzed.Finally，the effectiveness of this method in the low-carbon economic dispatch of the IES is validated.

2 IES Model and Problem Description

Low-carbon operation optimization of the IES aims to improve the economic and environmental benefits of the system with the constraint of safe operation of the system.This paper studies a multi-objective optimization problem with optimal economic cost，carbon emissions，and reliability of system operation.The structure of the IES studied in this paper is shown in Fig.1.The power grid includes combined thermal and power(CHP)，carbon capture system(CCS)，wind power，photovoltaic，battery storage(BS)and electricity load.The gas network comprises the natural gas station，gas storage(GS)and gas load.The thermal supply network mainly consists of thermal storage(HS)and thermal load.The energy conversion equipment mainly includes gas turbine，P2G and MT.

Figure 1:Structure diagram of the IES

2.1 IES Equipment

(1)CHP-CCS-P2G(CCP)Mathematical Model

The CHP units provide electric power and thermal power in the IES.The CHP has high carbon emissions and thermoelectric coupling characteristics，which causes severe environmental pollution.Therefore，according to the literature [25]，the optimization model of CCP is established to reduce carbon emissions and improve the economic benefits of the IES.The principle of the CCP is shown in Fig.2.P2G converts the electric power generated by CHP into natural gas，which strengthens the connection of the power-thermal-gas system and reduces the power output of CHP.The electric power consumed by P2G and CCS is directly taken from electric power generated by CHP.To reduce carbon emissions of CHP and carbon source cost of P2G，the CO2is captured by CCS，and transmitted to P2G for recycling.

Figure 2:The principle of CCP combined optimization model

2.2 IES Model

2.3 IDR Model

As a flexible resource，IDR is conducive to achieving low-carbon economic operation and improving the operation reliability of the IES.IDR can reduce energy demand during peak load periods by reducing，converting and shifting electric power demand.Renewable energy is used to replace high carbon emission units.

2.4 Objective Function

2.5 Objective Function

3 Deep Reinforcement Learning Model for Low Carbon Economic Dispatch of the IES

3.1 Problem Transformation

3.2 Problem Solving Based on TD3 Algorithm

In this paper，the multi-objective optimization problem of low-carbon economic dispatch of the IES is solved by the DRL method.In this section，the optimal problem of low-carbon economic dispatch of the IES is transformed into a DRL framework and solved by the TD3 algorithm.TD3 is a DRL algorithm based on the actor-critic framework and DDPG.The principle of the TD3 algorithm is shown in Fig.3.The actor-critic framework adopts two neural networks.The actor generates action according to state，and critic inputs state and action to generate Q value and learns reward and punishment mechanism to evaluate the behavior selected by the actor.The agent updates relevant parameters in the continuous state by updating the strategy of the actor and achieves the single-step update effect.

In the DDPG algorithm，the actor generates deterministic actions according to the policy functionat=πθ(st).The parameter of the actor-network isθ，and the optimization objective can be described by Eqs.(49)and(50)[17].

whereQ(st，at)is the action-value function，which indicates the expected reward value of the selected actionatin the statest.

The critic is a Q function，which is used to fit the state-action value.The critic evaluates the action in the current state and provides gradient information for the actor.The target is calculated by Eq.(51).

whereQφ′ is the target Q network，ytis the target value of the critic network.

Figure 3:The principle of the TD3 algorithm

In DDPG，there is an overfitting phenomenon of the Q network，leading to the overestimation of the Q value.Under this circumstance，the policy network will affect the final performance by learning wrong information.TD3 algorithm can solve the overestimation problem of DDPG，restrict the overfitting of the Q network and reduce the deviation.Therefore，based on the DDPG framework，the improvements of TD3 are as follows:

(1)Clipped double-Q learning under actor-critic framework [27].In the DDPG algorithm，both target actor-network and target critic network adopt the “soft update” method，which makes the actual network similar to the target network.It is difficult to separate the action selection and policy evaluation.Therefore，in the TD3 algorithm，the target value is obtained by clipping double-Q learning，and the Q value is constrained by two Q networks.Besides，corresponding minimum Q value in two Q networks are adopted to calculate the target Q value.According to the actor networkπθ，the target valueof the critic networkQφ2is equal toas shown in Eq.(52).

(2)Smoothen the target action.In the continuous action space，it is usually expected that the same actions can have similar values [25].Therefore，the action output of the agent is smoothened by adding random noise to the target action.The value function is updated by Eqs.(53)and(54).

whereεis the random noise.

(3)Delay the update of the policy network.Policy network cannot be trained according to poor performance Q network evaluation [28].Therefore，in TD3，the actor is updated after the critic is updated forntimes.The target network parameters are updated by Eq.(55).

whereτis the soft update rate.

4 Case Study

The simulation environment is established in the Gym toolkit of Open AI.The simulation structure of the IES in this paper is composed of IEEE 39-bus power system，6-bus heating system and 20-bus natural gas system [1].The natural gas price is 33 C//m.TD3 algorithm uses the neural network to fit actor and critic network.The number of hidden layers of the actor network is 3，and the number of neurons is 400，200，200.The hidden layer of the critic network is 3，and the number of neurons is 200，100，100.Simulation parameters of the IES are shown in Table 1 [5-7].

Table 1:Simulation parameters of the IES

Table 1(Continued)Parameter Value Parameter Value Parameter Value SOCmin 0.10 Pmax BS，ch 10 PmaxBS，disch 10 ηch 0.90 ηdisch 0.85 γ 0.95 κ11，i 0.16 κ12，i 0.28 κ21，i 0.31 κ22，i 0.22

4.1 TD3 Algorithm Training Process

The data in this paper are taken from Liaoning province，China from 01 November 2020 to 28 February 2021.The training sets include the data from 01 November 2020 to 31 January 2021.The test sets from 01 February to 28th February are used to verify the optimized results after training.Training data is shown in Fig.4.

Figure 4:Historical sample data of IES

The training results of the TD3 algorithm are shown in Fig.5.The reward value obtained by the agent at the initial stage of training is relatively low.The TD3 algorithm gets a stable optimal solution when the episodes approaching about 3000.

Figure 5:The cumulative reward value of agents

4.2 Scenario Analysis

To verify the effectiveness of the proposed multi-objective low-carbon scheduling method，the scheduling results of five different operating scenarios are compared.

Scenario 1:The optimization objective is the operation cost，and the economy of system operation is considered in the optimization process.

Scenario 2:CCP combined optimization model is utilized.

Scenario 3:Power supply reliability is considered based on scenario 2.

Scenario 4:IDR is considered based on scenario 2.

Scenario 5:IDR and power supply reliability are considered based on scenario 2.

The optimal scheduling results in different scenarios are shown in Table 2.Compared with scenario 1，the system operation and total costs in scenario 2 are reduced by $150 and $1416.CCS captures CO2emitted by CHP，and transmits it to P2G，thus saving the cost of P2G transmission and purchasing CO2.Compared with scenario 2，the total cost of the system in scenario 3 is increased by $996 to meet load requirements after considering system reliability.Compared with scenario 2，the power load demand in the peak period and the starting capacity of CHP units in scenario 4 are reduced.Compared with scenario 2，the load rejection of the system in scenario 5 is reduced to 0，and the wind curtailment cost is significantly reduced.At the same time，the PV curtailment costs are reduced to 0，and the total cost of the IES is reduced by $2201.Therefore，the method proposed can improve the reliability of the power supply and economic benefits.

Table 1:The optimal scheduling results in different scenari os

CO2emissions in different scenarios are shown in Fig.6.In the load peak period(20:00-22:00)，scenario 2 emits less CO2than scenario 1 as the requirements of CO2of P2G come from CCS.Scenario 4 has the lowest CO2emissions due to the less CHP unit output during peak load periods.Scenario 3 considers the power supply reliability，improves the output of CHP units，resulting in the highest CO2emissions.Considering power supply reliability and IDR，scenario 5 has slightly higher carbon emissions than scenario 4，but lower than scenario 2.

Figure 6:CO2 emissions in different scenarios

Figure 7:The load shedding rate in different scenarios

The load shedding rate in different scenarios is shown in Fig.7.The load shedding rates in scenario 1 are the same as scenario 2，and the CCP combined optimization model does not affect power supply stability.In CCP combined optimization model，the electric power consumed by P2G and CCS is directly taken from electric power generated by CHP.The integrated energy system generates and consumes no additional power.Compared with scenario 1，the operation stability of the IES in scenario 3 is improved by considering the power supply reliability.Compared with scenario 1，the load demand in scenario 4 is less during the period(7:00-8:00，19:00-22:00)，and the power supply reliability is improved after considering the demand response.

The scheduling results of the IES in different scenarios are shown in Fig.8.In scenario 3，CHP unit output is significantly higher than other scenarios thanks to the consideration of power supply reliability.In scenario 4，the outputs of CHP units during peak load period are lower than those of other scenarios，and the outputs of renewable energy are higher than those of other scenarios during under load period due to the considerations of power supply reliability and IDR.In scenario 5，during the underestimation period(0:00-06:00)of load，the peak load demand is transferred to the underestimation period after demand response，and renewable energy outputs are significantly higher than those in other scenarios.During the peak load period(19:00-21:00)，CO2emissions are reduced while maintaining system reliability.The optimization method based on TD3 can achieve a better control effect in different scenarios.

Figure 8:(Continued)

4.3 Comparison Analysis

SAC，A3C，DDPG and DQN algorithms are selected to verify the effectiveness of the proposed low carbon optimization model.The parameters of SAC，A3C，DDPG and DQN are taken from the literature [29-32].

Figure 8:(Continued)

The reward curves of different comparison methods are shown in Fig.9.The reward values obtained by DDPG and SAC algorithms are similar in convergence.The convergence speed of the A3C algorithm is higher than other methods due to the asynchronous architecture.The proposed optimization method based on TD3 has the best comprehensive performance and can obtain higher reward values.

Figure 8:The power balances of the IES in different scenarios.(a)The power balances of the IES in scenario 1，(b)the power balances of the IES in scenario 2，(c)the power balances of the IES in scenario 3，(d)the power balances of the IES in scenario 4，(e)the power balances of the IES in scenario 5

The optimal scheduling results in different algorithms are shown in Table 3.DQN algorithm has the highest operating cost due to the discretization of the agent action.The optimization results of the SAC algorithm and DDPG algorithm are similar，and the wind power curtailment cost of DDPG is higher than the SAC algorithm.A3C algorithm uses asynchronous mechanism，so the operating cost is lower than DQN，DDPG，and SAC algorithms.TD3 algorithm has the lowest cost and，obtains higher carbon emission benefits.

Table 3:The optimal scheduling results in different algorithms

Figure 9:Reward curve in the training process of different algorithms

5 Conclusions

In this paper，a multi-objective optimization method based on TD3 is proposed for the lowcarbon scheduling problem of the IES.On the power generation side，we develop the CCP combined optimization model.IDR and power supply reliability are considered on the power supply side.Moreover，we describe the multi-objective optimization problem as MDP and use the deep reinforcement learning method based on TD3 to solve it.The proposed method can achieve a better control effect in different scenarios.The results show that the proposed method saves the operation cost of the system and effectively reduces the CO2emission of the IES.Compared with TD3，SAC，A3C，DDPG and DQN algorithms，the operating costs are reduced by 8.6%，4.3%，6.1% and 8.0%，respectively.

Funding Statement:This work was supported in part by the Scientific Research Fund of Liaoning Provincial Education Department under Grant LQGD2019005，in part by the Doctoral Start-up Foundation of Liaoning Province under Grant 2020-BS-141.

Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.

Computer Modeling In Engineering&Sciences2022年7期

Computer Modeling In Engineering&Sciences的其它文章: Introduction to the Special Issue on Computational Mechanics of Granular Materials and its Engineering Applications; The Localized Method of Fundamental Solution for Two Dimensional Signorini Problems; State Estimation of Regional Power Systems with Source-Load Two-Terminal Uncertainties; Interval-Valued Neutrosophic Soft Expert Set from Real Space to Complex Space; A Frost Heaving Prediction Approach for Ground Uplift Simulation Due to Freeze-Sealing Pipe Roof Method; Seed-Oriented Local Community Detection Based on Influence Spreading