Guru PANDIAN, Mihel PECHT,*, Enrio ZIO, Melind HODKIEWICZ
KEYWORDS Boeing 787;Dispatch reliability;Dreamliner;Lithium-ion battery;Reliability;Reliability growth
Abstract The Boeing 787 Dreamliner, launched in 2011, was presented as a game changer in air travel. With the aim of producing an efficient, mid-size, wide-body plane, Boeing initiated innovations in product and process design, supply chain operation, and risk management. Nevertheless,there were reliability issues from the start,and the plane was grounded by the U.S.Federal Aviation Administration (FAA) in 2013, due to safety problems associated with Li-ion battery fires. This paper chronicles events associated with the aircraft’s initial reliability challenges. The manufacturing,supply chain,and organizational factors that contributed to these problems are assessed based on FAA data.Recommendations and lessons learned are provided for the benefit of engineers and managers who will be engaged in future complex systems development.
Boeing, one of the world’s largest aerospace companies, manufactures products that include commercial aircraft and defense, space, and security systems. Boeing’s commercial aircraft business has been in service for nearly 100 years, and its current fleet includes the 737,747,767,777,and 787 families.1The Boeing 787 Dreamliner was introduced to the market in 2011 as a mid-size, dual-aisle, wide-body aircraft. Boeing marketed the 787 as a revolutionary aircraft, with an array of new features that would increase fuel efficiency by 20%and improve passenger comfort.2
Some of the key design innovations in the 787 included the use of composite materials in the wings and fuselage3; Li-ion batteries to power up aircraft systems even before the engine has started, to provide backup to critical loads and support of battery-only braking;and a no-bleed electrical system architecture.4It was also the first time that Boeing replaced the traditional pneumatic system with an electrical power-generating system for starting the engine, anti-icing the wings, and maintaining cabin pressure.5Boeing incorporated the ability to use two types of engines (General Electric’s GEnx and Rolls Royce’s Trent 1000).6Collectively, these and other design changes were introduced to lower operating costs,improve fuel efficiency and cruising speeds, and reduce maintenance costs.7
The path to profitability and realization of these aims was tortuous for Boeing.8The grounding of the 787 in 2013 focused the spotlight not just on the Li-ion batteries, but also on other issues that came to light with the Critical Systems Review Team (CSRT) report.9Furthermore, the recent fatal incidents involving the 737 MAX have refocused the attention on the reliability of Boeing’s aircraft. In particular, the New York Times reported that Boeing has been fostering a culture of pushing products in the market faster rather than ensuring product quality,especially in the South Carolina factory where the 787 s are manufactured.10A study and interviews conducted with current and former employees of Boeing, ranging from floor technicians to quality managers, suggest that the quality of Boeing aircraft has become compromised10and quality issues have spread to defense aircraft as well. According to CNN,the U.S.Air Force has been returning some of the delivered aircraft and has even halted deliveries of aircraft due to the ongoing quality problems.10
This paper focuses on failure data released in the CSRT report,9NTSB reports,11-13and the journals referenced in this paper. Using these reports and the incident reports from the aviation community portals,we have collected data to support a reliability growth analysis. This is a first-of-a-kind study of the reliability of the 787 aircraft and provides technical insights into potential contributing factors.
The paper is organized as follows.Section 2 is a chronology of events before and after the order given by the FAA to ground the 787 fleet and includes a discussion of the review conducted by the FAA and Boeing. The development of the data set to support the reliability growth analysis is described in Section 3.In Section 4,the potential contributors to the reliability issues experienced by Boeing are identified. Section 5 presents the lessons learned.
The first 787 was shipped in the first quarter of 2011,with two to follow in the second quarter.By the end of 2012,49 aircraft had been delivered, primarily to Japanese Airlines.14Table 1 shows the number of 787 aircraft sold in each quarter until 2016.
In July 2012,the Japanese airline ANA grounded five of its 787 s due to potential corrosion risk in some of the engine parts.15This was followed by even bigger problems revealing themselves in the form of fires in 787 aircraft. Two airplane fires associated with the Li-ion batteries of the plane forced the grounding of the worldwide fleet on January 16, 2013.The specifics of the battery failure are described in.16In congressional hearings, Boeing and its suppliers admitted that despite a significant engineering effort of 200,000 hours, they could not identify the root cause of the problem.Nevertheless,Boeing made changes, including a revision of the internal battery components to minimize the chances of initiating a short circuit,as well as better insulation of the cells and the addition of a new containment and venting system.15On March 12,after less than one month of testing, the FAA accepted Boeing’s redesign.
Table A1 in the Appendix presents a list of the 787’s technical issues reported to the authorities and/or in the press,prior to the 2013 grounding of the plane due to the Li-ion battery problem. For each event, the authors have attempted toidentify the system and component failure modes. This information suggests there was a range of different component failure modes responsible for the failure events.
Table 1 Boeing 787 deliveries in quarters.
Furthermore, the 787’s problems persisted even after its‘‘relaunch”. Operational problems between September 2013 and January 2016 are shown in Table A2 in the Appendix.
In July 2014,after three months of redesign and requalification of batteries, Boeing conceded in a press release that the reliability of the 787 was below their initial expectations and below that of their earlier 777 model.17At the same time,they once again attempted to reassure stakeholders that they and their suppliers had already identified suitable corrective actions and initiated or fully implemented them.
Reliability Growth Analysis (RGA) is used in modeling,designing, and improving repairable systems. It is intended to prove the reliability performance of a new or existing product, component, or system over time. To assess this growth,we examined failure events reported in commercial aircraft journals and the NTSB database (listed in Table A1 and A2).As the data on the life(time in service)of the components responsible for the events of Tables A1 and A2 are not available, we use the count of events per month (based on reports in18,19and make the assumption that all defective components are replaced. Based on this, a dataset of the number of events per month has been created and is shown in Table A3 in the Appendix. To determine the total time on test for the aircraft fleet, the following additional assumptions were made:
(1) Aircraft hours is based on number of deliveries by Boeing, as reported in their official orders and deliveries information page. For aircraft delivered in one quarter,it is assumed that they do not go into service until the following quarter.
(2) For failure events, it is assumed that a failure in the quarter occurs at the end of that quarter.
(3) The operational period of the aircraft is assumed to be 50% (half of the number of hours in 90 calendar days),based on the report in Refs.20,21.
Fig. 1 is a time-cumulative event plot of event data from Table A1 and A2 and other databases mentioned above. We note that the slope of the plot decreases with increasing time.This is indicative of increasing reliability. The Duane plot in Fig.2 shows the trend of cumulative Mean Time Between Failures(MTBF)over the flight hours.It can be seen that there is a sharp inflection point at around 115000-160000 hours, which corresponds to the period in the first quarter of 2013. The approximately straight line after the inflection point suggests that the data are consistent with an NHPP (Non-Homogeneous Poisson Process) power law model, which allows us to model the reliability growth using the Crow-AMSAA model.22
The Crow-AMSAA model is generally used to assess reliability growth during development testing. One of the assumptions of the Crow-AMSAA model is that design changes are applied when failures are found, and thus the failure data is also indicative of an updated design configuration. This is not exactly true in practice, but there are retrofit campaigns that are completed on the entire fleet in order to improve dispatch reliability. These retrofits are changes to the design of the faulty component,as well as updates made in the practices of manufacturers, airline operators, airports, and regulators.
Mathematically, the Crow-AMSAA model is a Non-Homogeneous Poisson Process(NHPP),which gives the probability of occurrence of n failures within time T,22
where, N(T) is the random variable ‘number of failures occurred up to time T’, and λ and β are parameters to be estimated, based on the available failure event data. The Maximum Likelihood Estimation (MLE) technique is a classical way to proceed for the estimation of the parameters22and has been used also here. By grouping the data, the number of failures in each quarter has been used to conduct the analysis. The results of the analysis are shown in Table 2.
Fig. 1 Number of flight time hours vs. failure events of the 787 aircraft.
Fig. 2 Duane plot of flight time vs. cumulative mean time between failures (MTBF).
The value of β of 0.7(<1)in Table 2 suggests a decrease in failure rate over time and is indicative of reliability growth.Examining Fig. 2, it can be seen that the reliability of the 787 aircraft (as measured by the MTBF) was deteriorating until the grounding and started improving after the aircraft returned to service (fourth data point). Other parameter estimates such as DMTBF (demonstrated or instantaneous mean time between failures) and DFI (Demonstrated or instantaneous Failure Intensity) are also reported. Their positive values follow from the corrective actions taken by Boeing during the aircraft grounding period,as well as due to a somewhat natural reduction of those problems that typically emerge during the initial stages of the aircraft operation.
Another metric used in aviation to identify component-level reliability is the Mean Time Between Unscheduled Removals(MTBUR), which is related to those maintenance activities carried out on an aircraft but that were not part of the scheduled maintenance:
However,we were not able to find any data on the maintenance activities or the components that were removed/replaced. In principle, this data should be made available to the public by the airlines and the FAA.
Li-ion Batteries(LIBs)were used to power the auxiliary power units and other selected electrical/electronic equipment during ground and flight operations to a larger extent in the 787 than in Boeing’s predecessors. Boeing was required to perform safety assessment for its LIBs as per the FAA’s Special Conditions 25-359-SC, ‘‘Boeing Model 787-8 Airplane; Lithium-Ion Battery Installation”. Although Boeing did pass all the requirements set by the FAA, there were shortcomings in the criteria set for failure and guidance on assumptions that manufacturers could use in their testing. These assumptions were not necessarily supported by engineering rationale and led Boeing to pass the qualification tests. For example, there was an assumption that the internal short circuiting in a cell would only cause that cell to vent and not lead to thermal runaway.11The battery incident in Japan Airlines showed that Boeing didnot put in place mitigation strategies to avoid or contain the consequences of this assumption, were it to prove wrong in practice.
Table 2 RGA of 787 post-service re-entry period.
A confounding factor was the FAA did not consider thermal runaway to be a potential consequence of cell short circuit.Hence, FAA certification engineers did not require thermal runaway testing as part of compliance demonstration. This contributed to a lack of clarity in guidance to certification engineers on translating specific worst-case scenarios to compliance deliverables, such as which test procedure to follow and which test reports to provide in the certification plan. In addition, there were manufacturing defects and absence of thermal management systems. There were also inconsistencies found in the Electric Power System (EPS) safety assessment provided by Boeing with respect to the compliance with the FAA Advisory Circular (AC) 25.1309, ‘‘System Design and Analysis”.11
Eventually,Boeing redesigned the battery system and had it approved by the FAA. The FAA issued a new airworthiness directive to install the redesigned batteries on all 787 airplanes to be returned to service.
From the various documents and trends,it can be argued Boeing did not adopt an effective Reliability Program Plan(RPP),where best practice tasks are implemented to produce reliable products.23Boeing opted to widen its supplier base and reduce costs by including manufacturers who were new to the aircraft development industry. The events that led to delays during manufacturing and failures during operation are a testament to Boeing’s flawed practices.
The following sub-sections describe Boeing’s practices in planning and managing the development cycle and supply chain, the challenges with information sharing with a tiered,globally dispersed supplier base, with developing a proper diagnostics and prognostics approach, with testing of new technologies, and with oversight of a complex product development process.
These factors are identified as potential causes of the operational problems. Furthermore, these deficiencies are seldom independent of each other and can have a compounding effect on product reliability.
Boeing intended to reduce to four years the development period of the 787 (its predecessor 777 was developed in six years)and, at the same time, reduce the development costs from $10 billion to$6 billion.24To do so,Boeing decided to adopt a new supply chain and product development structure.This resulted in a new supply chain structure of approximately 50 tier-1 strategic partners, and many more tier 2, 3, and 4 suppliers,which they would have little or no say over. On top of this,30%of the supply chain was outsourced to manufacturers outside the USA.24
The supply chain structure was responsible for the 2.3 million parts required to build and assemble the aircraft.25The tier-1 partners, such as Alenia Aeronautica (Italy), Messier-Dowty(France),Rolls-Royce(Britain),and Mitsubishi Heavy Industries(Japan),served as integrators responsible for assembling entire subsystems, each having its own specific supply chain.26
The time and cost of production was intended to be reduced by delegating the design, development,and component manufacturer selection process to sub-system suppliers.7The tier-1 partners would be responsible for delivery of complete sections of the aircraft to Boeing, who would then perform the final assembly.7
The rationale behind this business strategy was that the best process skills were increasingly being found outside Boeing factories in the USA, according to Mike Bair, then vicepresident of the 787 program.6This created new supplier bases which were either new to Boeing or new to the aircraft industry as a whole.This included the Lithium-Ion Battery(LIB)manufacturer GS Yuasa,which was selected by Thales Avionics to supply batteries for powering auxiliary devices. As will be discussed later,the inexperience of GS Yuasa in dealing with aircraft products led to inappropriate specification of batteries based on the reliability data from other industrial applications.
This new supply chain structure was a departure from traditional practice, in which the manufacturer was responsible for the assembly of the major subsystems. This tiered system is a complex structure of interacting technical and organizational artifacts. The new and more complex supply chain led to intricacies in assembling many components from different suppliers into a large subsystem that was manufactured by a different supplier. For example, Boeing contrived a modular design for the 787 to enable engine interchangeability between Rolls-Royce and GE engines on the same aircraft.As a result,the interchanging process actually took 15 days against the intended 24 h, because of technical incongruities due to multiple supply chain participants.9Similarly, several ‘‘shimming”issues were found when trying to assemble parts from different suppliers, due to the lack of conformity to tolerances and understanding of design requirements9.
Since the supply chain was spread across the globe,there were challenges in synchronizing changes to the design requirements down through the supply chain and production information back up through the supply tiers.9Boeing tried addressing this challenge by implementing a web-based tool called‘‘Exostar”,which allowed the suppliers to enter their relevant information such as design and production requirements and production status of the components.Contrary to the intended effect,this data sharing process did not improve the visibility across the supply chain due to the discrepancies in accuracy and delay and misinterpretation of data from the tool.The lack of familiarity of aerospace manufacturing standards and cultural differences in terms of workmanship among suppliers from various locations contributed to this inefficiency in data sharing.7,9
For example, the FAA found discrepancies in the dissemination of requirements for the primary electrical power panel from Boeing to United Technologies Aerospace Systems(UTAS), then from UTAS to sub-tier supplier Equipment et Construction Electrique (ECE) and from ECE to its printed circuit board component supplier. The FAA review team found deficiencies also in the process of passing requirements down the levels of suppliers leading to 1) weak design, which then manifested as part malfunctions once they entered service,2)variability in manufacturing,and 3)anomalous behavior of parts.
The bottom-up information flow was similarly hindered as seen in the instance where Vought, a tier-1 supplier, entered into a contract with Advanced Integration Technology (AIT)as a tier-2 supplier to aid in integrating systems. AIT was assigned the responsibility of communicating with other tier-2 and tier-3 suppliers on behalf of Vought.7But due to cultural and geographical differences, the suppliers did not always communicate the proper information. These differences led to delays in supplying parts, which were not visible to Boeing and kept Boeing from responding to delays in a timely manner,and in understanding requirements changes.
Data collection for system Health and Usage Monitoring Systems (HUMS), and Prognostics and systems Health Management (PHM), provides the opportunity to assess the state of operation of the airplane and its components, and predict the reliability and safety.23However, this was not well executed in the Boeing 787 aircraft. For example, the FAA, Boeing, and Japan’s Transport Ministry conducted a thorough analysis on the root cause of failure of the lithium-ion batteries in the 787.12However, they were unable to identify the root cause of the thermal runaway event. Many issues such as production quality problems of contamination, electrolyte evaporation,and over-voltage loads were hypothesized,but were not proven to be conclusive.27The Flight Data Recorder (FDR)collected 363 different measurements before and after the battery fire incident of which only two, the DC feed load current and the APU battery DC bus voltage, were directly related to the faulty batteries.27The FDRs were not designed to collect individual cell data from the Battery Management System(BMS),which could have given insight into the specific battery that caused the thermal runaway.
The FAA review team observed that both existing and new technologies incorporated in the 787 aircraft were not tested for the specific 787 application. The success of these technologies,either in other applications or in previous Boeing aircraft,was assumed to be carried over to the 787 as well.9LIBs,which have become one of the major concerns for 787 reliability,were adopted from another industrial application, and there were no failures reported in such application. Based on the data from this industrial application,GS Yuasa assumed a Poisson distribution for the LIB failure time and estimated a failure rate of less than 1 failure in 10 million flight hours.28However,by the time the 787 was grounded in 2013, the failure rate of the LIBs was 3 in 250,000 hours. The estimate of less than 1 in 10 million hours was based on a 60% confidence interval,while a 90% confidence interval or higher is usually suggested for critical reliability applications such as those of avionics.28
The level of DO-160 testing required was established at the time Boeing submitted the application to design, test, and build the 787 to the FAA, which would have been around 2003 or 2004. Guidance on how to test LIBs was issued in AC 20-184 in October 2015.This could have led to a situation where technology outpaced the regulations.
While the new technologies were given slack in testing, the before quality of the processes that was considered ‘‘stable”was inspected the same technician who carried out the process.29One of the former Department of Transportation inspectors stated that in many cases these self-inspections were actually not conducted and were passed on by the workers who executed the process. This kind of flawed practices has led to many mistakes in the production line as per the Boeing workers.29
Finally, the reliability assumptions for the electronics and the testing of the electronics,including the battery,are of grave concern, in part because Boeing has traditionally assumed the constant failure rate and used the outdated military handbook 217 for its reliability and safety calculations. This handbook was last updated in 1997 and was considered inaccurate and unacceptable for use by the military by a National Academy of Sciences study30,31and for aviation industry as well.31The handbook based method uses field failure data of un-related applications to determine a point reliability value of aircraft without considering its specific complex use conditions.
The 787 aircraft is a complex system with about 2.3 million parts supplied and assembled from manufacturers around the globe.24The CSRT noted9that when an issue was reported during the service of the aircraft, the suppliers removed the parts they deemed to be defective, but often found there was no fault. This could be due to the intermittent nature of electronics systems,32-34In fact, it has been noted that cases of no-fault-found on airplanes can be as high as 80%and Boeing often replaces electronics Line-Replaceable Units(LRUs)with LRUs that were flagged as failed but were no-fault-found once they were removed. This practice is problematic considering the wear out and intermittent failure nature of electronics.
In Boeing’s 787 development model, the integration of subassemblies and final assembly was critical for the hardware and software from different suppliers to fit together and operate properly. This required a balance of providing autonomy to the suppliers to meet the design requirements and keeping a close oversight on the supplier processes. However, Boeing did not opt for on-site supplier,supports which led to absence of a bi-directional technical communication to keep the quality of the parts and sub-assemblies in check.35For example,Mitsubishi Heavy Industries stated that Boeing did not adopt Mitsubishi’s early testing and diagnosis principle,35which in turn led to design flaws being carried over to next tiers of assemblies and eventually to aircraft operation in the field.
Evaluating the reliability of a complex system made of multiple components,like an aircraft,is very difficult especially during the development stages. As a matter of fact, many factors contribute to the difficulty of evaluating reliability during product development: tight scheduling for contracted deliveries,requirements on testing and validation, pressures for cost reduction, multiple tiers of suppliers of the many parts constituting the system, challenges with accurate and timely data sharing, innovative technologies requiring specific testing procedures, and others.
In this paper,operational problems with the Boeing 787 aircraft have been analyzed to identify different manufacturing and organizational factors that have impacted the reliability performance of such a complex system in operation. Reliability metrics, such as cumulative Mean Time Between Failures(MTBF) and cumulative number of failure events, have been estimated from publicly available data. A reliability growth analysis has been performed to study also the impact of corrective actions carried out by Boeing on the performance of the 787 aircraft.
Undoubtedly, there were enormous challenges inherent in the development of a new product like the 787. And with the increase in reliability as one of the goals of the 787 project development, Boeing invested significantly in changes to its engineering and business structure. However, the problems that then occurred in the aircraft’s operation have emphasized the need for strengthening the focus on quality and for developing a reliability-centric approach to supplier selection,training, and production management.
In this regard, some practical guidelines follow. Suppliers should consider IEEE 1332-2012, JA1000-201205, and IEEE 1624 in the development stages. The IEEE 1332-2012 document provides a standard set of reliability program objectives for use between customers and producers, or within product development teams, to express reliability program requirements early in the development of electronic products. SAE adopted the IEEE standard and released it as JA1000, which is followed by various industry sectors.OEMs should take necessary steps to validate the ability of the suppliers to meet the reliability requirements.IEEE 1624-2008 Standard for Organizational Reliability Capability provides guidelines for assessing, in a systematic manner, the effectiveness of an organization’s reliability practices in ensuring or exceeding product reliability requirements. Avoiding misinterpretations and having detailed information on inputs and assumptions for predicting the reliability of hardware is essential in understanding the risks associated with using the prediction results for future product integration and ensuring overall system reliability.
Manufacturers can ensure consistent prediction and reporting of reliability of hardware across product development teams by following established standard procedures such as IEEE 1413-2010.IEEE 1413 aids in providing sufficient information on inputs, assumptions, and uncertainties in the estimated reliability. Further, aerospace standards such as AS9100, based on ISO 9001:2015, are dedicated to ensuring product quality and process management for aircraft parts manufacturers.
Finally, while the grounding of the 787 in 2013 focused the spotlight on Li-ion batteries,and on the complexity of the supply chain, there were also concerns pertaining to how the airplane could be certified within three months without knowing the root causes of failure. This is more so relevant today, in light of the recent concerns with the 737 MAX, and the role of Boeing and the FAA in understanding and evaluating reliability and safety issues.
Acknowledgement
We would like to thank Ms. Rhonda Walthall, Associate Director, and Aftermarket Digital Strategies at UTC Aerospace Systems, for her invaluable comments that helped improve the quality of paper.
Appendix A
Table A1 Events associated with the Boeing 787 Dreamliner, November 2011-January 2013.
Table A1 (continued)
Table A2 Events associated with the Boeing 787 Dreamliner, April 2013-August 2015.
Table A2 (continued)
Table A3 Failure times calculation (Based on problem reports from Ref. 54.
CHINESE JOURNAL OF AERONAUTICS2020年7期