The PEGASUS Method

Description of the PEGASUS-Method

The following chapter describes the different elements of the PEGASUS method. Therefore, the methods, tools, processes within the different elements are explained.

0. Description of test object (Use Case)

0. Description of test object (Use Case)
The Highway Chauffeur

Among the project partners, the highway chauffeur was picked as exemplary test object. The system was designed to be an SAE L3 conditional automation system. The system design was chosen that on the one hand simple enough so that all major partners could or already had implemented a system like it, on the other hand complex enough so that we could use it to derive a vast variety of scenarios from it including critical scenarios and automation risks. The following Figure 1 shows the systems capabilities (dark blue) and limitations (light blue).

Figure 1: Capabilities of the Highway Chauffeur

The following Table 1 summarize the capabilities of the automated driving function on the level of path guidance and stabilization.

Table 1: Capabilities of the Highway Chauffeur

The system limitations are mainly based on its exemplary and simple design as well and therefor led to the following restrictions for the ODD (see Table 2).

Table 2: Limitations of the Highway Chauffeur

Extended Application Scenario

One of the goals of the PEGASUS project was to also ensure that our tools and methods can be adapted to different domains. To avoid spending time and effort on defining a completely new system, we analyzed our Highway chauffeur’s key capabilities and defined 5 exemplary functional scenarios which would differ from the original system on different levels. Describing the new system in scenarios helped us not only to efficiently describe the desired functionality and performance but also allowed us to systematically generate executable and therefore testable concrete scenarios which then could be used in simulation without additional effort.

We ended up picking functional scenarios that would extend the ODD to highway and rural roads and chose 5 specific situations our SUT could encounter there. Table 3 shows a description of those scenarios

Table 3: Functional Scenarios for extended application scenarios

1. Knowledge

As a basis for defining requirements or deducing test cases, the operating scenarios of the item to be developed have to be specified. For this purpose, various sources of knowledge, like standards, regulations, or expert knowledge, can be utilized. Well established codes & standards like SAE J3061 (Society of Automotive Engineers, 2014) on nomenclature, as well as safety-related documents like ISO 26262 (International Organization for Standardization, 2009) are the basis of any development of highly automated driving systems. Furthermore, respective legal requirements like the country-specific road law and international conventions as well as related regulations need to be followed. Finally, there exist non-binding documents from government institutions or related committees with specific recommendations. One prominent example here is the document published by the ethics committee of the German transportation ministry (Ethics Commission Automated and Connected Driving appointed by the German Federal Minister of Transport and Digital Infrastructure, 2019). Also knowledge by experts on the specific use case and its requirements should be collected via a structured expert peer review.

Guidelines

Within the scope of PEGASUS, the German standard how to build freeways (RAA) (Forschungsgesellschaft für Straßen- und Verkehrswesen, 2008) was used as a source of knowledge for specifying the operational design domain of the Autobahn-Chauffeur. The standard provides a set of rules regarding the process of constructing freeways and defines design patterns.

In a first step of constructing freeways, a design class is chosen based on criteria like freeway type (for example urban freeway or transregional freeway). This design class specifies the standard cross section, the limit and standard values of the design elements, the elementary shapes of junctions and distances between junctions, as well as speed limits.

Cross sections
The cross section is chosen based on the design class and the expected traffic volume. The cross section defines the arrangement and dimensions of freeway components like lanes, as shown in Figure 1. Each cross section may contain different amounts and types of lanes like traffic lane, shoulder or central reservation.

Figure 1: Standard cross section (RQ 36) (Forschungsgesellschaft für Straßen- und Verkehrswesen, 2008)

Alignment
Each freeway consists of multiple segments. Besides the standard cross section, each segment implements a layout and an elevation profile, as seen in Figure 2.

Figure 2: Elevation profile and layout of a road segment (Forschungsgesellschaft für Straßen- und Verkehrswesen, 2008)

The layout describes the lateral alignment of the freeway. For this purpose, the RAA defines multiple design patterns, like straight, circular arc, or clothoid, and specifies parameter ranges to be redeemed.

To describe the vertical alignment of the freeway, the RAA defines multiple patterns for the elevation profile like plane, crest curve, or sag curve. In addition, guidelines how to calculate the parameters for these patterns are formulated.

Junctions
To connect multiple roads, the RAA defines various patterns for junctions, like clover leaf or windmill. Which pattern is used, depends on the traffic volume, the constructional effort, installation space, and elevation profile.

These design patterns specified in the RAA can be used to define the operational design domain of the Highway Chauffeur. In addition, these patterns build the basis for a knowledge-based scenario generation with the help of an ontology.

Codes & Standards

As a necessary pre-requisite generally accepted industry standards are to be followed, as well as national and international regulations. For the development of HAD systems ISO 26262 (International Organization for Standardization, 2009) and ISO/PAS 21448 (International Organization for Standardization, 2019) are essential.

ISO 26262
Functional safety processes in the automotive industry are described in ISO26262, an adaptation of the IEC 61508 functional safety standard with a focus to automotive Electric/Electronic Systems. The ISO 26262 document has the goal to provide process steps to assure that the developed E/E (electric/electronic) system is functionally safe meaning the “absence of unreasonable risk […] due to hazards […] caused by malfunctioning behavior […] of E/E systems“ (International Organization for Standardization, 2009). The most recent update of this standard was released in December 2018 as ISO 26262:2018 iteration.

After defining the functional specification of the system a hazard analysis and risk assessment is performed. The Hazard and Risk Analysis determines the “Automotive Safety Integrity Levels” (ASIL= which are based on a matrix with Severity (ranging from S1 Light Injuries to S3 Life threatening injuries), Exposure (E1 Very low probability to E4 high, - i.e. 10% - probability) and Controllability (C1 Simple, i.e. >99%, to difficult C4 Difficult, i.e. <90%) as input values. Thereby, an ASIL level represents a risk-based classification of a safety goal.

Based on these results, the system is designed and a safety concept is developed. The hardware and software development then takes place in further parallel runs of the V model. On the right branch of the V integration tests for the developed components are performed and a validation of the safety goals takes place, as well as a functional safety assessment of the complete vehicle against the functional specification. Since both the V model itself as well as ISO26262 and IEC 61508 are well established, within the scope of this document no further details are provided. Instead the interested reader is referred to the full documents. It is recommended to always follow ISO26262 (International Organization for Standardization, 2009) as basis procedure.

SOTIF
Regarding the PEGASUS analysis, for highly automated driving, also the principles of the safety of the intended normative behavior of the automated driving system needs to be addressed, reaching beyond E/E system malfunctions. For this purpose, currently activities are ongoing within standardization bodies to address the so-called “safety of the intended functionality”. Originally these activities focused on driver assistance and partially automated systems, but now are extended to higher levels of automation as well (SAE level >=3). As stated by ISO, “the absence of unreasonable risk due to hazards resulting from functional insufficiencies of the intended functionality or by reasonably foreseeable misuse by persons is referred to as the Safety of the Intended Functionality (SOTIF)”. In order to cover these issues, guidance on the applicable design, verification and validation is needed. A document was published by ISO in January 2019 as a “Publicly available standard” ISO/PAS 21448:2019 (International Organization for Standardization, 2019).

The split between ISO 26262 and ISO/PAS 21488 is to be shown in Figure 3. Regarding aspects of the cyber security of road vehicles, there are also currently activities ongoing: ISO/SAE 21434 reached the status of a committee draft in September 2018.

Recommendation

A prominent example of a government recommendation regarding automated driving is the document by the ethics committee of the German Transportation ministry that was released on June 20, 2017 (Ethics Commission Automated and Connected Driving appointed by the German Federal Minister of Transport and Digital Infrastructure, 2019). This committee was headed by the former justice of Germany’s Supreme Court, Udo di Fabio. The Ethics Commission on Automated and Connected Driving was convened as an interdisciplinary panel of experts by the German Federal Minister of Transport and Digital Infrastructure in 2016. The goal is „to develop the necessary ethical guidelines for automated and connected driving“. Results of the five working groups (“Situations involving unavoidable harm”, “Data availability, data security, data-driven economy“, “Conditions of human-machine interaction“, “Consideration of the ethical context beyond road traffic“, ”Scope of responsibility for software and infrastructure“) were presented as a public available report in 2017 (Ethics Commission Automated and Connected Driving appointed by the German Federal Minister of Transport and Digital Infrastructure, 2019). This report contains among other things 20 ethical rules for automated and connected vehicular traffic. Of particular importance for PEGASUS project is the second rule: “The protection of individuals takes precedence over all other utilitarian considerations. The objective is to reduce the level of harm until it is completely prevented. The licensing of automated systems is not justifiable unless it promises to produce at least a diminution in harm compared with human driving, in other words a positive balance of risks.” Within the framework of PEGASUS safety argumentation, this particular rule is set as the primary safety goal to be met.

2. Data

Within PEAGASUS, a scenario-based approach is chosen to assess the highway chauffeur function. Those scenarios can be derived in two ways, either systematically (see P4) or data driven (this paragraph). For the data driven way, the project aimed at using available, not strictly confidential data to develop the methodology and derive scenario characteristics. Therefore, within D2, relevant data is collected from multiple sources such as naturalistic driving studies (NDS), field operational tests (FOT), simulation data from user studies, accident data (GIDAS) as well as recorded data from test drives. This is the baseline for all adjacent metrics and tools to search for relevant scenarios deriving the characteristic scenario parameters and statistics and to eventually assess the driving function.

NDS / FOT / Test Drives

The PEGASUS approach of scenario-based combined testing requires insights from real-world traffic in order to understand what the relevant situations for highly automated driving (HAD) are and how frequently they occur. Thus, the field operational test (FOT) and naturalistic driving studies (NDS) data analysis within PEGASUS derives traffic characteristics from real driving data. The naturalistic methods NDS and FOT represent field studies within real traffic where drivers are observed during a longer period of time while everyday driving. The so gathered data is characterized by a high external validity. Furthermore, the data enables comparing the performance of an automated driving function in a test scenario with human driving performance in similar situations in real traffic. To a certain extent evaluation criteria for this comparison can be derived from it.

Driving Simulator Data

Driving simulator data allow to analyze the underlying causal relationships of accidents and to systematically approach and assess the threshold of human driving performance.

Thus the application of the driving simulator supports closing the gap between the assessment and description of human driving performance in critical and non-critical scenarios to the point of accidents. Furthermore the driving simulator can be used to assess human driving performance in scenarios that are not available in accident data bases nor NDS/FOTS due to their minor probability of occurrence.

Within PEGASUS, the threshold of the human driving performance was assessed using driving simulator studies e.g. by producing repeatedly critical scenarios with varying stimulus intensities. The varying stimuli were operationalized by different criticalities, e.g. the TTC (time to collision) of a merging vehicle in front. The threshold of the human driving performance was then described as the probability to have an accident dependent of the different levels of criticality. Additionally further performance parameters, e.g. the deceleration or reaction behavior have been analyzed as well as the accident severity. Using the simulator data the metric controllability of the human driver was described by means of a logistical regression and can be thus directly compared with the performance of the automated driving function.

Within a first simulator study 52 participants drove on the left lane of a two-way highway with a constant speed of 130 km/h. A preceding car driving in a column of vehicles merged 63 times with a velocity of 80 km/h from the right to the left lane. In doing so criticality was systematically varied by varying the time-to-collision (TTC) (0.5, 0.7, 0.9, 1.1, 1.3, 1.5s). During the experiment the TTC was calculated when the merging vehicle drove on the midline (s. Figure. 1 aspired TTC). However, the reported TTC of the following results is based on the actual beginning of the merging maneuver in order to achieve a realistic estimation of the criticality. [In case the merging vehicle was not visible yet, the earliest point of visibility was used for calculating TTC.]

Figure 1. Merging vehicle with ego car / participant (blue) and merging car (black)

It should be indicated that the assessment of driving performance was focused on the stabilization of a vehicle within this experiment. This driving task, which is almost skill based, e.g. based on stimulus reaction automatism, was selected due to a direct influence on the criticality of the driving situation (see Preuk & Schießl, 2017) [ Preuk & Schießl, 2017,Menschliche Leistungsfähigkeit als Gütekriterium für die Zulassung automatisierter Fahrzeuge: Methode zur Ermittlung der Grenzen menschlicher Leistungsfähigkeit. 9. VDI-Tagung Der Fahrer im 21. Jahrhundert, VDI-Berichte 2311, S. 15-24.]

Within this study accidents have hardly been avoidable in critical situations with a TTC < 1.3s, on the other side accidents occurred very seldom with a TTC > 1.7s; the threshold of human driving performance should be in between accordingly.

As described above the metric controllability of the human driver was described by means of a logistical regression and can thus be directly compared with the performance of the automated driving function. For this purpose the accident probability was equalized with the controllability and predicted on the basis of the measured TTCs. 4 different logistical regressions were calculated in order to indicate the performance of different driver groups (see Figure 2):

the whole sample without outliers (orange, n = 48),
the mean 50% of the sample (middle blue, n = 26),
the best 10% of the sample (light blue, n = 7)
the worst 10% of the sample (dark blue, n = 4).

Figure 2. Prediction of accident probability on basis of TTC by a logistical regression

The threshold for the human driving performance was defined by an accident probability of 50%, which is linked with a TTC of 1.38s for the whole sample within this study.

GIDAS and PCM data

The German In-Depth Accident Study (GIDAS) is a database of real world crashes which are investigated by the MHH (Medizinische Hochschule Hannover (MHH), 2019) and the VUFO (Verkehrsunfallforschung an der TU Dresden GmbH, 2019). The project is funded by the Bundesanstalt für Straßenwesen (BASt) and the Forschungsvereinigung Automobiltechnik e. V. (FAT) since 1999. For a collision to be documented in GIDAS, it must adhere to sampling criteria, such that the data can be assumed to be a representative sub sample of the German crash scene. The criterion requires that at least one person has to be at least slightly injured. GIDAS contains information about the involved participants (for example vehicle, cyclist, pedestrian etc.) and the individuals, their sustained injuries and the infrastructure. Each crash is reconstructed and the findings are also stored in the database.

The Pre-Crash-Matrix (PCM) is a branch of the GIDAS data. It is not reprehensive of the German crash scene, rather details information regarding the pre-crash phase of two participants. The database contains detailed information regarding the predicted trajectory five seconds prior to initial impact.

The GIDAS database with PCM extension collects in-depth data from real world crashes in the region Hannover and Dresden. Each year up to 2000 single crashes are investigated and include the documentation of over 3000 crash-relevant properties. This information include

Conditions of the environment
The road design with traffic regulations
Vehicle deformations
Impact sites of occupants and other road users
Technical characteristics such as vehicle type and technical equipment
Crash information and characteristic values, i.e. collision and driving speed, ∆v and EES
Crash causes and details how the crash occurs
Information of the individuals such as weight, height, age
Injury patterns, preclinical and clinical care

The PCM data extends the existing data with detailed information of vehicle dynamics for the pre-crash phase of five seconds before first impact of these two participants which were crashed first. The criteria for a case could having a PCM are:

At least one passenger car is involved
No participant skids
No participant drives reverse
No participant has a trailer
No participant has a technical defect.

In the regarded version of GIDAS (2016-12-31) there are 29,514 crashes coded and 8,651 of them have a PCM. In all GIDAS cases there are 24,359 collisions with car-involvement and in these cases 47,983 participants (33,920 cars) are coded. In all cases with car-involvement 64,263 persons are involved and 5,303 of them are injured with at least one AIS 2 (or higher) injury. More information about the injury scale AIS can be found in (AAAM, 1998) and (AAAM, 2005).

3. Requirements Analysis

In general, there exist various methodologies for the requirement assessment of the tolerable risk of new technologies. CEN/CENELEC outlines in EN 50126 the concepts of risk assessment and refers to the established methods of GAMAB “at least as good as”, ALARP “as low as reasonably practical” and the Endogenous Minimal Mortality. RAMS approach is established for specification and development of safety relevant railway systems. [QUELLE]

The approach includes system definition as well as implementation of risk analyses, hazard determination and safety approvals. Details and deductions are to be provided in the following paragraphs.

Method to define acceptance criteria

The common expectation for HAD vehicles is that an introduction will lead to an increased road safety and a reduction in traffic fatalities. However, the quantitative safety requirements are still in discussion. This clause analyses the risk acceptance according to several acceptance principles and applies the safety level on today’s traffic to derive references for acceptable risks. The focus is on macroscopic safety requirements meaning accident rates per mileage and not the behaviour in individual driving situations. It is concluded that the acceptable risk varies with the focus group involved and with the market share of automated vehicles. Increased safety of conventional driving in the future could lead to higher requirements as well. It is also pointed out that it is not guaranteed that the given acceptable risk levels are also accepted by the customer because other factors besides the accident statistics are relevant. An argumentation for a proof of safety is given and, however, as none of these risk levels can be proven before introduction, a monitoring of vehicles in the field is suggested. Different introduction and risk management strategies are briefly described. Finally, the risk during introduction of new generations of airplanes is discussed.

Detailed description of the content (methods and goals):
The description of the content is an executive summary of a paper by Philipp Junietz, Udo Steininger and Hermann Winner (see (Philipp Junietz, 2019)).

The focus of the following description is on the results and their embedding into the overall approach, especially on the synchronization with work packages about defining a safety level for HAD-F and derivation of requirements for an HAD-F.

Quantitative risk assessment

The usual quantitative risk definition “risk equals frequency times severity” is illustrated in Figure 1. Considering occurrence of unintended events (frequency) and extent of damage (severity), we find two typical areas. In the green area, the system is in a safe state; the corresponding risk is accepted. In the red area, the system is in an unsafe state; the corresponding risk is not accepted. The borderline between these areas is probably not sharp; there can be a kind of transition area. Although this simple definition is very useful for many questions in technology and insurance industry, it neglects aspects like aversion against high severity, lack of controllability, and personal benefit that are relevant for risk perception and acceptance by individuals and society.

Figure 1: General definition of risk [Fritzsche, A. F. Wie sicher leben wir? Verlag TUV Rheinland, 1986]

Fritzsche discussed risk acceptance relating to voluntary nature of exposure in 1986. The results are summarized in Figure 2. The numbers are valid until today. The reason might be that risk perception studies reached their peak in the 70’s with the spread of nuclear power plants. Later studies delivered no relevant changes in risk perception.

For voluntary activities, Fritzsche found that the willingness to accept risks is nearly unlimited, depending on the experienced personal benefit. We can see this by the example of high-risk sport or other leisure activities, e.g. free climbing, motorcycling etc. Job-related activities are important for a deeper understanding of the subject. Acceptance is relatively well investigated in this field and there is a common understanding of accepted individual mortality risk in the order of 10-5 per person and year, for example by professional associations and insurance companies, on the one hand. On the other hand, job-related risks are useful to bridge the gap between voluntary and involuntary risks.

Figure 2: Risk acceptance versus voluntary nature of risk exposure[see (Philipp Junietz, 2019) Based on Steininger, U., and L. Wech. Wie sicher ist sicher genug? Sicherheit und Risiko zwischen Wunsch und 29 Wirklichkeit. VDI-Berichte, No. 2204, 2013. and Fritzsche, A. F. Wie sicher leben wir? Verlag TUV Rheinland, 1986]

Fritzsche found that for involuntary risks, e.g. death of passengers due to a train or airplane crash, the acceptance level is an order of magnitude less than for job-related risks. Moreover, acceptance decreases another order of magnitude if the risk is caused by major technology, e.g. chemical industry or nuclear power generation. Beside the fact that the experienced personal benefit of those technologies is low (at least from a subjective point of view), the low degree of self-determination or rather controllability by individuals plays an important role for the low acceptance level as well as the potentially high number of mortalities (severity). Nevertheless, Figure 2 shows that it is generally possible to deal with risk, risk perception, and acceptance in a quantitative manner.

To implement safety requirements based on risk acceptance, several principles have been developed in different application areas. Because of the relationship between railway and road traffic, it is useful to refer to the CENELEC safety standard EN 50126. The development of this standard has been started in the 1990’s, where safety requirements based on quantitative risk analysis have been implemented and ALARP, MEM, and GAMAB have been introduced as principles for risk acceptance.

As low as reasonably practicable (ALARP) principle tries to assess what is technically feasible considering economic sense and social acceptance. Between two regions of generally unaccepted and broadly accepted risk, there is a tolerance range where risk is undertaken only if a benefit is desired and where each risk must be made as low as reasonably practicable. The two key levels lie around road death statistics (about 10^-4 per person and year) and the chances of being struck by lightning (about 10^-7 per person and year). If something is more dangerous than driving a car, the risk is unacceptable. If something is less dangerous than being struck by lightning, then we do not expect anyone to do anything about it. In the range between these two figures, cost benefit studies are appropriate to reduce the risk to as low as reasonably practicable [EN 50126].

Minimum endogenous mortality (MEM) is based upon age- and gender-specific mortality rates. Although the absolute values of the mortality rates change with birth cohort, they show a typical development over age as well as a significant minimum at an age of about 10 years. The related mortality at an age of 10 years is defined as “minimum endogenous mortality”. The MEM principle demands that a new system does not significantly contribute to the existing minimum endogenous mortality. EN 50126 specifies that the individual risk due to a certain technical system must not exceed 1/20th of the minimum endogenous mortality, taking into account that people are normally exposed to the risk of several technical systems. This means that the accepted individual risk of a certain technical system should be about 2.5∙10-6 per person and year. [Statistisches Bundesamt, „Kohortensterbetafeln für Deutschland, Methoden- und Ergebnisbericht zu den Modellrechnungen für Sterbetafeln der Geburtsjahrgänge 1871 – 2017“, 5126206-17900-4, 23.06.2017]

Globalement au moins aussi bon (GAMAB) - English: generally at least as good as - requires, unlike MEM, the existence of a reference system with currently accepted residual risks. According to GAMAB, residual risks caused by a new system must not exceed those of the reference system. In other words: a new system must offer a level of risk generally at least as good as the one offered by any equivalent existing system. This makes it necessary to identify the risk of the equivalent existing system. Looking for the acceptable risk of HAD-F in a certain application area according to GAMAB, we have to identify the current risk of the equivalent existing system in the same application area. To derive acceptance requirements for a controlled-access highway pilot, we analyse the current risk on German controlled-access highways during manual driving. We get this way an average rate for fatal accidents of 1.52·10^-9/km [Schöner, H.-P. Challenges and Approaches for Testing of Highly Automated Vehicles, Paris, 04.12.2014]. This can be transferred into an accident rate of 6·10^-6 per person per year with the help of known driving profiles [VDA 702 Situationskatalog E-Parameter nach ISO 26262-3. VDA-Empfehlungen, VERBAND DER AUTOMOBILINDUSTRIE E. V. (VDA), 2015. https://www.vda.de/de/services/Publikationen/situationskatalog-e-parameter-nach-iso-26262-3.html].

In Figure 3, the different approaches for the mortality risk are summarized. On the one hand, it shows that the application of different risk acceptance principles delivers comparable and consistent results. On the other hand, it demonstrates that we have to deal with a relatively broad range of applicable acceptance criteria.

Figure 3: Implementation of several risk acceptance principles for a Highway-Pilot
[see (Philipp Junietz, 2019) Based on Steininger, U., and L. Wech. Wie sicher ist sicher genug? Sicherheit und Risiko zwischen Wunsch und 29 Wirklichkeit. VDI-Berichte, No. 2204, 2013. and Fritzsche, A. F. Wie sicher leben wir? Verlag TUV Rheinland, 1986]

In a first step, comparison with other technologies – especially other traffic systems and technologies that deliver a high personal benefit – seems to be useful. Afterwards, different focus groups have to be distinguished, for example users of highly automated driving systems and other traffic participants, taking into account the impact of voluntary exposure. Finally, a decrease in total mortality risk is expected in the future, following the trend in the last decades and centuries. Therefore, risk acceptance might change over time.

Introduction of New Technologies in Aviation
In aviation passengers are exposed to a technical system without having personal control. Although severe accidents happen, its safety is accepted by most of the population. Due to the long travelling distance and the fact that accidents mostly happen during take-off and landing, accident rates are typically given per flight and not per travel distance. Accidents and critical situations are strictly reported and collected in databases, so we have even more profound data compared to road traffic. Depending on the number of flights per year, we can observe an annual risk that is similar to driving a car on a highway. One fatal accident happens about once per ten million flights [Airbus. Commercial Aviation Accidents 1958-2016. A Statistical Analysis, 2017. Accessed August 1, 2017]. With a typical exposure of two flights per year, the risk of a fatal accident would be lower than the risk of involuntary exposure finv and about one order of magnitude lower than driving on a highway. However, with 20 flights per year, one would be exposed to a risk that is in the same order of magnitude. Therefore, the levels of risk are in fact comparable when only driving on controlled-access highways is considered. However, typically users drive on all types of roads. The risk of car traffic is at least on order of magnitude higher in total, so the superior reputation of air traffic is justified.

Aviation has become increasingly automated over the past decades (although today’s systems are still SAE level 2 because they are supervised by the crew). The detailed collection of data in aviation allows an analysis per generation of airplanes, which was summarized by Airbus Industries (see [Airbus. Commercial Aviation Accidents 1958-2016. A Statistical Analysis, 2017. Accessed August 1, 2017]). As depicted in Figure 4, with every introduction of a new generation, the fatal accident rate for this new generation was higher than the state of the art. Due to the low number of new airplanes at introduction, this trend cannot be observed in the total accident rate. Nevertheless, the introduction was clearly beneficial to society in total because after an introduction phase of five to ten years, the new generation had the lowest accident rate of all.

Judging from this data, new generations of airplanes are not tested in a way to prove statistically that the system is superior to the former. In fact, this is impossible; because the knowledge about the new system’s behavior is incomplete and only field experience can reduce the unknowns. Similar to HAD, statistical testing is neither economically feasible nor necessary because the strict supervision of air traffic allows efficient improvement in case of critical situations or accidents. However, the highest automation in commercial air traffic is still comparable to level 2, so human error is still a factor. Nevertheless, the leap in accident rate occurred with the introduction of technology, be it because of flaws in human-machine-interaction or in the technology itself. One could argue that it is unethical to release a system that is not tested in the best way possible. But first, it is impossible to completely test a system operating in an uncontrolled environment because there might be situations that the tester was not aware of. These unknown unknowns cannot be tested. Second, a stricter approval process would prevent technical progress because an economic-oriented development would become impossible. It seems possible if not likely that the accident rate of automated vehicles will behave in a similar way. We should be aware of that possibility and focus on the improvement of the system in case of a detected critical situation or accident. The delayed introduction of HAD could in fact risk the lives of many people because the system is believed to improve safety over time.

With a combined testing strategy of simulation, proving ground tests, and real traffic tests, it is still unlikely to complete a logical proof of safety because every validation test has certain underlying assumptions. To deal with this uncertain safety performance, accidents, unexpected critical situations, and near misses must be monitored similar to air traffic, in order to find flaws in the system (including infrastructure and human interaction) with the chance to improve them. This is discussed in detail later.

Figure 4: Fatal accidents with different generations of airplanes in commercial traffic. Dotted line means less than one million flights a year. First generation: Early commercial jets, Second generation: More integrated Auto Flight System, Third generation: Glass cockpit and Flight-Management-System, Fourth generation: Fly-By-Wire with flight envelope protection. [Airbus. Commercial Aviation Accidents 1958-2016. A Statistical Analysis, 2017. Accessed August 1, 2017]

4. Systematic Identification of Scenarios

Knowledge-based approaches to scenario generation complement data-based approaches. As the mutual strengths and weaknesses in rare-event coverage vs. frequentistically adequate representation of usual situations are complementary, data-based and knowledge-based approaches can support each other and are able to counterbalance their disadvantages.
Several approaches of systematic knowledge-based identification of scenarios were followed in PEGASUS. One approach is the description of “normal” scenarios on the highways using ontologies. To extend and complete this approach and to identify possibly critical scenarios early in the development process and guarantee a sufficient test-case coverage, by employing methods to systematically identify automation risks, as developed for and applied to the Highway Chauffeur. Furthermore, an expert-based approach for identification of scenarios on layer 4 of the 6-layer-model was developed, which is suitable for parameterization with measurement data.

Application of ontologies for scenario generation

As shown in Figure 1, the process for the knowledge-based scenario generation using an ontology implemented in PEGASUS is realized in two steps.

Figure 1: Ontology-based process for scenario creation based on [2]. Rectangles represent working products, rounded corners represent process steps.

In the first step, linguistically described knowledge about road traffic patterns on German highways is identified, conceptualized, and formalized. To formalize the knowledge, an ontology in the Web Ontology Language (OWL) is implemented. Therefore, the knowledge is represented through hierarchic classes as well as semantic relations and restrictions between these classes. The knowledge modeled in the knowledge base is structured according to a 6-layer-model (see Figure 2). Based on prior work by Schuldt [3], the model has been adapted to a representation in an ontology and an automated creation of functional scenarios. On the first and second layer, the road network is described according to the German guideline how to construct motorways. In concept, the third layer describes temporary manipulations of layers one and two (like road construction sites). However, within the scope of PEGASUS, this layer has not been implemented. On the fourth layer, the interactions of traffic participants are represented through maneuvers. On the fifth layer, weather conditions are modeled. The sixth layer describes digital information, such as V2X and digital data. Similar to the third layer, within PEGASUS, the sixth layer has not been detailed.

Figure 2: 6-layer-model for structuring scenarios according to (Bock, et al., 2018) [2] based on [3] .

In the second step, scenarios are automatically and systematically deduced by varying classes and instances defined in the ontology. For variation, possible combinations and restrictions specified in the knowledge base are considered. Therefore, scenarios are created stepwise.

In the beginning, possible sceneries including relations (like arrangement relations) of the respective elements are created. The elements to be considered for variation can be specified by the user through properties. Afterwards, for every scenery, all combinations of traffic participants are generated, based on a defined position grid, and for each traffic participant all possible maneuvers are assigned. At this point, multiple combinations of sceneries and traffic participants, including their arrangement and all possible maneuvers, have been generated.

In the next step, an individual maneuver to be executed is chosen for every traffic participant in order to build a start scene. Afterwards, for every start scene, an end scene is deduced based on constraints (like relative velocities) arising from the individual maneuvers. Furthermore, the driving paths are checked regarding collisions and, if appearing, the respective end scene is dismissed. The combination of a start scene and an end scene defines a scenario. As shown in Figure 3, each scenario is exported as abstracted HTML-based visualization and scenario graph for a detailed linguistic description. In addition, the export of the scenario consisting of the start scene and end scene to the technical formats OpenDRIVE and OpenSCENARIO allows the execution of the scenario in the simulation.

Figure 3: Left: HTML-based visualization as abstracted overview; Right: scenario graph as detailed linguistic description

References
[1] A. Reschka, „Fertigkeiten- und Fähigkeitengraphen als Grundlage des sicheren Betriebs von automatisierten Fahrzeugen im öffentlichen Straßenverkehr in städtischer Umgebung“, dissertation, TU Braunschweig, Braunschweig, 2017.
[2] G. Bagschik, T. Menzel, und M. Maurer, „Ontology based Scene Creation for the Development of Automated Vehicles“, in 2018 IEEE Intelligent Vehicle Symposium, Changshu, 2018.
[3] F. Schuldt, „Ein Beitrag für den methodischen Test von automatisierten Fahrfunktionen mit Hilfe von virtuellen Umgebungen“, dissertation, TU Braunschweig, Braunschweig, 2017.

Identification of automation risks

The methods to identify automation risks make use of well-established techniques for hazard and risk assessments as well as a thorough safety analysis. They serve the purpose of identifying the causes of risks at an early stage, especially those arising from the introduction of highly automated driving functions (HAD-F). Within the scope of the analysis, possibly hazardous scenarios are identified and described in the form of parameter ranges of logical scenarios. This enables targeted testing later in the test process and enables sufficient test case coverage of potentially critical scenarios.

For each class of automation risks a different method was developed, in order to address the specific challenges of this class. However, the methods to identify automation risks of the classes 1 and 2 are similar in their structural nature.

For the automation risks of class 1 an iterative method, based on traditional safety analysis methods like HAZOP and fault tree analysis (c.f. (Ericson, 2005)), has been developed.

Figure 4: Overview of the method to identify automation risks of class 1

The safety analysis methods were adapted as necessary to be able to address the specific challenges emerging from the introduction of HAD-F. Class 1 automation risks are those arising from external influences on the automation. They correspond to situations in which the HAD-F has difficulty coping with the environment or the behavior of other traffic participants. These include the non-detection or misclassification of objects in the environment, the erratic detection of objects that do not exist in reality, and the misprediction of the future development of a situation. These problems do not necessarily stem from a fault in the E/E systems but may be inherent (e.g. limitations in perceptions due to a specific sensor set-up) to the system and are in general only visible in a certain scenario or a small range of scenarios.

The first step to identify these scenarios is to model the driving function. Taking a behavioral perspective, this entails modeling the inputs, computations, and outputs of the different components/computational units. This step will in general be iteratively refined during later stages. Next, the possible hazards stemming from deviations of the intended functionality or from issuing actions in the wrong context are identified. Identification is pursued using a keyword-based approach on the vehicle behavior level covering different basic scenarios. The resulting table is used for a second keyword-based approach, where the reasons inside the system that may lead to the previously identified deviations are observed and recorded in another table (see Figure 5).

Figure 5: Identifying hazards with a keyword based approach

The next step is to identify possible reasons for hazardous behavior (with a focus on environmental triggers) with the help of a modified fault-tree analysis. Here we follow the identified outputs of each function from step 1 and identify possible reasons for unintended output. These reasons are classified in

Random hardware faults (covered by ISO 26262 (International Organization for Standardization, 2009)): e.g., hardware fault in the ECU for calculating trajectories
Design faults in HW or SW (deviations of the implementation from the specification): e.g., fault in the algorithm that calculates the trajectories
Specification fault: e.g., trajectories of other traffic participants are predicted without a dynamical model of these
Propagated faults (the input the component is receiving is already faulty): e.g., an inappropriate maneuver was chosen by an upstream planner

On the one hand, certain environmental conditions may be necessary for the faults to actually become visible or relevant (e.g., only if another traffic participant is present the HAD-F can mispredict its trajectory). On the other hand, these environmental conditions may be the reason for an unintended function of the component (e.g., a small metal object may be responsible for a misclassification of a RADAR sensor). In order to identify and model these, the fault tree analysis was extended by a further logical gate (which is a variation of the inhibit gate), which makes it possible to specify environmental conditions that are necessary for fault propagation.

In the last step, the environmental triggers for hazardous behavior are deduced and translated into the necessary specification language for scenarios. Automation risks of class 2 correspond to scenarios in which the driving behavior of the HAD-F deviates from the expected behavior of human drivers and subsequently triggers a critical situation. When identifying these scenarios systematically, it is assumed that there is no malfunction within the driving function in these situations. The error or risk thus arises from the interaction between different systems rather than from local causes. A typical case here is the interaction between the HAD-F system and the human driver. A risk analysis method that is particularly well suited for identifying risks arising from the interaction of several complex individual systems is the Systems-Theoretic Process Analysis (STPA) (Leveson, 2012). To facilitate the application of the STPA method, it was adapted to the problem at hand and the analysis was carried out on our use case. In the following a short summary of the approach is given.

The first step is to define the fundamentals. This includes the control structure diagram including the control actions, the scenarios, and the expectations of the human driver in these scenarios. When defining the control structure diagram, the process under consideration and the control actions occurring in this process are described. In the given use case, the considered process consists of the HAD-F, the human driver, and the interaction between them. Within this process, the HAD-F can perform the following tactical control actions: 1) keep speed and keep lane, 2) keep speed and change lane, 3) change speed and keep lane, 4) change speed and change lane, 5) abort lane change, 6) emergency braking and 7) emergency stop. In order to determine the scenarios, a classification based on the six-layer model is used and the most important ones are selected based on an expert judgement so that the number of scenarios is reduced to a manageable size. Finally, the behavior of the HAD-F as expected by the human driver is determined in the identified scenarios.

In the second step, the actual STPA method is performed. This includes determining unsafe control actions (UCAs), hazard identification, and causal analysis. When determining the UCAs, the impact of improper control actions is examined, especially regarding what happens if control actions are

a)   not provided,
b)   provided,
c)   provided at the wrong time, or
d)   stopped too soon or applied for too long.

For example, an otherwise reasonable control action becomes an unsafe control action if the human driver does not expect this control action to be provided in this situation. Subsequently, the hazard identification systematically analyses the consequences of the UCAs in the scenarios under consideration. In this step, the automation risks of class 2 are identified. The final causal factor analysis provides information on the triggers of the identified risk.

For the automation risks of class 3, according to the definition of this class, the possible difficulties between the automation and the driver where identified (e.g. mode confusion or misuse). However, these do not result in scenarios, but in requirements on the HMI concept.

For the automation risks of class 1 and 2, the resulting environmental triggers are translated into a JSON-file and can be used in the PEGASUS database to mark potentially critical combinations of parameter areas.

Definition of logical scenarios on layer 4

In order to draw concrete scenarios from field data, it is necessary to elaborate a framework for deriving a scenario catalogue. Herein, it should be possible to store all relevant elements of the scenario with regard to Layer 4 of the 6-layer-model. Since for safety assurance the focus is on safety relevant situations, the underlying systematics focus on the avoidance of collisions between a subject vehicle (SV) and another object. For the use-case highway chauffeur in PEGASUS the object with which such a collision would occur is typically another vehicle.

The central underlying assumption of the framework is, that a safety-relevant situation can be The central underlying assumption of the framework is, that a safety relevant situation can be characterized by the need for a reaction by the SV to avoid a collision. This object is referred to as the challenging object or the challenger. The challenging object does not have to be the accident perpetrator, but it is the object with which the SV would collide with, it no collision avoidance action were executed. Based on this assumption, a limited number of safety relevant logical scenarios can be defined.

A first criterion for defining these logical scenarios is the area of the SV the challenging object would collide with. A distinction between front, rear and side impacts can be made. The second criterion is the initial positions of the challenging object. Depending on the impact plane, different initial position for the challenger can be distinguished by whether the outline of the challenger overlaps with the outline of the SV in longitudinal or lateral direction. For front impacts, one position can be identified, which overlaps laterally with the SV. Secondly, a position in front of the SV can be identified which does not overlap with the challenger. Thus, a front impact from this initial position would require an additional lateral movement by the challenger compared to a challenger coming from a position with lateral overlap. A third option is a challenger coming from an initial position which does not overlap in lateral direction and either overlaps in longitudinal direction or requires a transition through a state of longitudinal overlap. This requires the challenger to have an initially larger velocity compared to the SV but then decelerate to a slower longitudinal velocity combined with a lateral movement. The different positions and the path leading to the different types of impact are pictured in Figure 6.

Figure 6: Resulting collision paths defining safety relevant logical scenarios

Position for rear impacts can be defined analogous to the positions for a front impact. For side impacts, one position can be identified, which overlaps in lateral direction. Furthermore, a distinction is made between positions, which do not overlap in longitudinal direction or are in front or behind the SV. The decisions to derive the resulting scenarios are depicted in Figure 7; the associated parts leading to a collision are also pictured in Figure 6.

Figure 7: Necessary decisions for identification of safety relevant logical scenarios

A nomenclature for the safety relevant logical scenarios was chosen as shown in Table 1.

Table 1: Nomenclature for safety relevant logical scenarios

Apart from the challenger defining the safety relevant logical scenarios, further objects might be of relevance in a scenario. In order to represent only objects that are relevant for the safety relevant scenarios, different roles have been identified for the additional objects in a scenario. These additional objects either may increase the challenge for the SV to handle the situation safely or are important in the sequence of events of a scenario, so that the SV may react to an emerging situation although a collision with a challenging object is not yet imminent.

Action Restrictions
The first group of objects that increase the challenge for the SV are action constraints. These objects limit the space for collision avoidance maneuvers for the subject vehicle. An action constraint alone would not require collision avoidance action by the SV, but in combination with a challenger, an inadequate avoidance maneuver may result in a collision with the action constraint. Initially actions constraints can be identified, which are realized by a single object. This object may be either located in front, behind or to the side of the SV. Compared to the initial position of the challenging vehicle, no distinction is made between position 2, 3 and 4 in Figure 6. Instead, the positions to the side of the object are treated as a continuous space.

Further action constraints can be identified, which are realized by multiple objects. One configuration is the complete blocking of a side to the ego vehicle. Such a blocking may result from a queue of vehicle with low headways that practically do not allow any collision avoidance maneuver to this side for the SV. Alternatively, this blocking of the side may include a gap which is large enough for the SV to move in while requiring a certain longitudinal trajectory. In order to keep the number of dimension of each scenario as small as possible, it should be possible to describe the position and size of this gap as parameters of the scenario instead of describing trajectories for all vehicle contributing to the existence of this gap.

The described action constraint can be summed up as following:

Object in front of the SV
Object behind SV
Object to the side of SV
Complete blocking of one side of the SV
Blocking to the side of the SV with gap sufficiently large for a collision avoidance maneuver

Dynamic Occlusions
Another role of additional objects in a scenario that increase the challenger for the SV to handle the situation safely are dynamic occlusions. Similar to action restrictions, a dynamic occlusion itself does not require an avoidance maneuver by the SV. Objects are represented as dynamic occlusions if the object obstructs the vision on the challenging object from a theoretical point of view of the ego vehicle. This allows storing scenarios that often are referred to as “Cut-Outs” (Figure 8): A vehicle in front of the SV makes a lane change to a neighboring lane because a vehicle further in front is decelerating. This decelerating vehicle then generates the safety relevant logical scenario A for the SV. If the occluding object were not present, the SV could have perceived the challenging object at an earlier stage and already initiated a collision avoidance maneuver.

The presence of a dynamic occlusion does not necessarily have as consequence, that the SV does not perceive the challenging object. Instead, it describes the kinematic behavior of the object serving as obstruction. Depending on the sensor setup, it may still be possible for the SV to perceive the challenger although it is completely obstructed in a planar representation, since e.g. a radar sensor may sense the obstructed challenger via reflections from the ground. When realizing the scenarios in a simulation or on a proving ground, the role of the dynamic occlusion will executed by a vehicle so that depending on the type vehicle possible sensory advantages for the SV can still be represented.

Multiple Challengers
It is also possible that two objects require collision avoidance action by the SV. For instance, the SV might be challenged at the same time by a sides wide challenger and by a rear-end challenger. Yet, it is unlikely that the hypothetical collisions would happen at just the same time. Thus, it is necessary to store the temporal relation between the two challengers. Identification of scenarios in which multiple challenging objects are present are extremely rare in driving data, so that it is not possible to study the underlying mechanisms yet. Applying the metrics for identifying a single challenger on multiple objects within a limited time window is implemented within the database mechanics. Hence, it is possible to store candidates of scenarios involving multiple challengers, which can be subject to further analysis.

Causal Chains
Apart from objects that increase the challenge for the SV to handle the scenario safely, certain objects may induce a tactical behavior of the SV, which may lead to an earlier reaction to the challenging object. Comparable to a human driver, the SV may look further ahead than just one vehicle and sense relevant behavior. An example is a vehicle two vehicle ahead performing a harsh braking maneuver, which forces the vehicle in front of the SV to a harsh braking maneuver and thus becomes the challenger from SV’s point of view. These causal chains can be incorporated in the logical scenarios by treating the challenging vehicle as a vehicle that initially was challenged by another vehicle itself and applying the systematics of the safety relevant logical scenarios.

5. Preprocessing / Reconstruction

coming soon

6. Process Guidelines + Metrics for HAD Assessment

In general, an automated vehicle is a risk for different focus groups. The first two groups are the users of the HAD vehicle and the potentially involved accident partner, who can be any individual traffic participant (non-users). As previously discussed, the reason for the different views on accepted risk of the two groups results from the benefits the groups get. The third group is the society. Different from the first two groups, the fate of an individual is not relevant for society but the total accident number is. Hereinafter, we only discuss the occurrence of fatalities (index d), so instead of risk, we are allowed to give quantitative requirements for the occurrence rate or frequency of fatal accidents.

Quantitative requirements for different types of usage are given in Figure 1. It is concluded that the accepted frequency for a person’s death per year ƒ_inv is 10^-6k_d ⁄ a for involuntary exposure, 10^-5 k_d ⁄ a for professional exposure (ƒ_prof), and a theoretically unlimited risk for voluntary exposure with typical acceptance rates ƒ_vol of up to 10^-2k_d⁄ a. In general, the accepted risk varies with the benefit for the user or focus group. Most of these considerations are from the 70’s and 80’s regarding the discussions on safety of nuclear power plants. However, the assumptions are still valid in general, but should be adapted to today’s level of safety. We suggest a factor of one forth, similar to the development of MEM from the 70’s to today. (Reduction by a change in accident rate would be another approach with a similar outcome.) In the following, the lower risk today compared to the numbers above is indicated by its index with year and country of the underlying statistic.

Users
The fatal risk for users is assumed equivalent to the risk of a fatal accident of a HAD vehicle (neglecting a higher damage with more than one user at the same time). As depicted in Figure 1, the type of exposition is relevant for accepting risks. In most use cases, HAD-F are used voluntarily; they must be actively bought and activated. Professional use is also plausible but the use for the job is not expected during the first introduction phase. Involuntary use is excluded in typical use cases. This consideration suggests following the risk acceptance rate of fprof, 2016, GER equal to 1.4∙10^-9 k_d⁄ km . Similar rates are also present in the US (2.5∙10^-9k_d⁄ km) [FATALITY RATE PER 100 MILLION ANNUAL VMT - 2013, U.S. Department of Transportation Federal Highway Administration. https://www.fhwa.dot.gov/policyinformation/statistics/2013/pdf/fi30.pdf. Accessed July 17, 2018]. For other countries, data about the mileage on different road types is not always available. The accident rate on all roads’ combined mileage is in a similar order of magnitude for most developed countries.[Oguchi, T. Achieving safe road traffic—the experience in Japan. IATSS research, Vol. 39, No. 2, 2016, pp. 110–116] [Comparison of 2013 VMT Fatality Rates in U.S. States and in High-Income Countries, U.S. Department of Transportation NHTSA, October 2016. https://crashstats.nhtsa.dot.gov/Api/Public/Publication/812340. Accessed July 17, 2018].

However, the substitution of conventional driving also suggests comparing the risk of today’s driving with the suggested rate of ƒ_prof. In the following, both considerations will be examined and compared. For today’s driving risk, driving on Autobahn in Germany will be taken as a reference. This has several advantages. First, driving on controlled-access highways is one of the safest, if not the safest way of travelling in a car, especially when taking the accident rate per mileage as reference. Second, it is likely that the first HAV will drive on a controlled-access highway. Third, accident data on highways are well documented. Even minor accidents often result in the involvement of police because of the traffic disturbance and the measured traffic density estimates the travelled distance. Assuming an average travel distance d, the time-based frequency (index t) can be transmitted to a distance-based frequency (index s) and vice versa. In this example, 4000 km/a are assumed as an average travel distance according to VDA (see footnote 4) and an average velocity of 100 km/h.

In the following equation (1), the GAMAB principle and the MEM principle are combined. We consider this the upper limit for tolerable frequency because a new technology is introduced that comes with new risk. Additional risk is acceptable because the user experiences a benefit from that new technology. Note that we only consider the risk for the user in this section. This is not applicable to non-users or society at all, what will be discussed in the following sections.

Interestingly, the order of magnitude according to equation (1) corresponds to the accepted frequency for professional exposure. This strengthens the hypothesis that both estimations result in acceptable values for users of automated vehicles. However, higher risk could be accepted by the user (similar to motorbikes or extreme sport) but the user should be aware of this potentially increased risk.

Non-users
For all other traffic participants, HAD has no personal benefit (besides the decreased total risk for all traffic participants assuming that HAD is generally safer than the average driver). However, non-users could have a lower risk acceptance threshold because they are skeptical about the new technology or might even have (subjective) disadvantages e.g. due to slow vehicles on the road. Extraordinarily critical is the risk of new types of accidents because non-users would blame HAD for those accidents despite a potential reduction of the total number.

New risks could be caused for example by performance limitation (according ISO/PAS 2144), systematic software failures or cyber-attacks. The total new risk of the technology for an individual non-user should be below ƒ_inv,_2016,GER equal to 2.5∙10^-7 k_d ⁄ a (= ¼ of the value for involuntary exposure). So how can the individual risk for a non-user be calculated? As long as there are not many HAD vehicles on the market, the exposure is very low and the probability that the individual traffic participant is involved in a HAD accident is low. So, the risk is multiplied with the field share µ. The risk for non-users is diluted by the exposure to vehicles equipped with HAD.

According to equation (2), the accepted risk for a single HAD vehicle is lower with increasing number of HAD vehicles. This is intuitively obvious because the exposure multiplies with the number of potential single threats. Comparing the risk level with equation (1) results in a number of 1.625·10⁶ HAD vehicles in Germany, until the risk acceptance of the other traffic participants becomes dominant. This deliberately neglects that the non-user also has benefits if the system is safer than the human driver it replaces. This will only be acknowledged by non-users if there is an undeniable difference in accident statistic. Otherwise, the (subjective) disadvantage of the new technology stays dominant.

Society
For society, the fate of individuals is of lesser importance. Benefits and costs of HAD are measured by the total number of accidents and if they are reduced over time. In general, a decreasing trend of accident rate throughout the years can be observed in Germany [Statistisches Bundesamt (Destatis). Verkehrsunfälle - Fachserie 8 Reihe 7 - 2015, 2015] and the US [Fatality Analysis Reporting System, U.S. Department of Transportation NHTSA, 2017. https://www-fars.nhtsa.dot.gov/Main/index.aspx. Accessed July 18, 2018]. However, we can observe that this trend has been diminishing over the last 5 years for accidents with injuries and even had a slight (but insignificant) increase during these years. For fatal accidents, this trend of a more slowly decreasing rate is observable as well, but less significant. One could suspect that there is a natural limitation with current road network, traffic density, and state-of-the-art vehicles.

When introducing HAD, there will still be a non-zero risk of severe accidents and therefore it is likely that HAD will be involved in those severe or even fatal accidents. So, what are the requirements by society if individual accidents do not influence the total number significantly? What is the upper total accident rate limit accepted by society?

The overall target is to reduce the amount of accidents over time with the introduction of new technology. If we follow the argumentation of Wachenfeld [Wachenfeld, W. H. K., Dissertation, Technische Universität Darmstadt, 2017] and Kalra [Kalra, N., and D. G. Groves. The Enemy of Good. Estimating the Cost of Waiting for Nearly Perfect Automated Vehicles, 2017], we should allow a certain risk to bring HAD to the market and to gain further knowledge. At the same time, it is not acceptable for the whole society that the total risk is increased in a noticeable way. However, there is no way to check how accident numbers would have evolved without the technology as soon as it has entered the market. However, in the last decade, the decrease of fatal accidents and accidents with injuries diminished. At the same time, the annual travel distance increased. Hence, it seems justified to use recent numbers as reference. When using the accident rate for fatal accidents ƒ_s,d , an exponential regression is a better fit than a linear regression. Interestingly, this is also the case for accidents in aviation (comp. footnote 5). The standard deviation for the exponential regression for all years since 2010 results in:

Multiplying the standard deviation with the average annual mileage in 2016 results in 23 fatal accidents per year. It must be pointed out that the type of regression and the number of years influence the result. It is also possible to use the double or triple standard deviation as a measure. However, the results will be in a similar order of magnitude. In the following, the result from equation (3) will be used.

The requirements by society should be that the risk from HAD is significantly lower than the described exponential trend observed in the latest data, so HAD should be at least one standard deviation σ_7y,exp better than the predicted performance of conventional driving. However, society should give HAD time to reach this high safety reference. Similar to air traffic, it is necessary to monitor the performance to allow improvement in functions, infrastructure, and user experience. In the following formula, it is suggested to allow additional risk of one standard deviation at the beginning of introduction and demand, that risk shall be three standard deviations lower than the extrapolation, when full market share is reached. Therefore, the acceptable risk not only depends on the development of the risk in conventional traffic over the years, but also on the market share of µ of HAD.

In the following, a field share µ is assumed that develops similar to the field share of other driving functions such as electronic stability control (comp. (5)). Full field share is assumed to be reached after 30 years and described by a sine function (1+sin⁡π∙t/T) ⁄ 2; 0≤t≤T, in Figure 1 but also other parameters are time dependent because the actual safety on the roads is expected to change over time, also without HAV.

Summary of Safety Requirements
In the previous sections, safety requirements for three different stakeholders were deduced. For society, the required risk depends on the market share of HAD. The authors suggest to allow an increase in total risk by one standard deviation of the predicted accident rate, so HAD can be introduced when the knowledge about its safety level is not yet complete. Additionally to society’s requirements, non-users (as part of society) have increased requirements for new risks that come with automation. For users, we suggest constant risk requirements although they might increase with the current traffic safety over the years. However, the user’s requirements are only dominant in the early introduction phase (comp. Figure 1) when the market share is relatively low. From a market share of about 10% on, the requirements of society (and non-users) are dominant. However, if the market share reaches 100%, there are no non-users remaining.

In Table 1, the requirements are summed up. Note that we currently only give values for Germany, because statistics about mileage on controlled-access highways is available. Since data for the accident rate on the whole road network is in the same order of magnitude for developed countries (see above), we do not expect significant changes in safety requirements.

Proof of Safety and Probation in the Filed
Although we have found individual and social risk acceptance criteria for HAD it is neither possible

to derive safety requirements or test criteria for certain vehicles from risk acceptance criteria nor
to integrate a set of scenarios to approve that individual and social risks of HAD are accepted.

What we need is for 1^st bullet point a Proof of Safety and a Probation in the Field for 2^nd.

Proof of Safety
Social consensus regarding acceptable risk is regulated by liability laws, e.g. German ProdSG §5(2), according to which a product that conforms to standards or other relevant technical specifications is presumed to comply with product safety requirements, as far as they are covered by the standards or other relevant technical specifications. Development according to ISO 26262 ensures “absence of unreasonable risk” (i.e. “risk, judged to be unacceptable in a certain context according to valid societal moral concepts”). Other relevant technical specification is e.g. ISO/PAS 21448 for SOTIF, represented by automation risks in PEGASUS. But conformity with standards or other relevant technical specifications is only a necessary condition.

A Sufficient condition is given by the rules of the Ethics Committee [Ethik-Kommission Automatisiertes und Vernetztes Fahren, BMVI, Juni 2017], which says:

HAD is reasonable if it promises to reduce damage in the sense of a positive balance of risk compared to human performance.
If there is a fundamentally positive balance of risk, technically unavoidable residual risks do not preclude an introduction.

A fundamentally positive balance of risk compared to human performance can be argued following expert’s expectations of a generally benefit of vehicle automation for traffic safety by

avoidance of accidents and
reduction of consequences [Rechtsfolgen zunehmender Fahrzeugautomatisierung, Berichte der Bundesanstalt für Straßenwesen, Heft F 83, 2012].

Moreover, the test concept developed in PEGASUS ensures exemplarily, that the systems achieve at least human driving performance.
Beside the general benefit for traffic safety, HAD will provide automation risks in individual cases because of performance limitations and inadequate interaction with drivers and other traffic participants. Those automation risks are identified and - as far as possible - quantified in PEGASUS to ensure that HAD systems cause only technically unavoidable residual risks.
However, product development according to standards and specifications does not replace product tests in frame of verification and validation. An evaluation of different test strategies is given by Junietz et al [Junietz, P., W. Wachenfeld, K. Klonecki, and H. Winner. Evaluation of Different Approaches to Address Safety Validation of Automated Driving. Accepted Paper. In 2018 Intelligent Transportation Systems Conference (ITSC), 2018].
A crucial limitation of all a-priori considerations consists in the fact, that a valid statistical proof, that the systems actually meet the expectations, cannot be provided before they are launched on the market. This results in requirements for Probation in the Field as well as in measures to be derived therefrom for the continuous improvement of the systems.

Limited Introduction and Probation in the Field
Despite all consideration, normative specification and product tests, there will be an uncertainty about the future performance of HAD at the time of the initial introduction to the market. This uncertainty either can be accepted if all stakeholders agree that the residual risk is sufficiently small or controlled by reducing the number of sold vehicles in the introduction phase (see footnote 15). The market share µ is controlled and the exposure for the society and non-user reduced. Only the user has to accept the uncertainty if he wants to use HAD. However, the performance of HAD should be observed and statistics publicly discussed to build trust in customers and the society, but also to identify critical situations and improve the system with updates.

Recent measures of field observation have limitations regarding consideration of special aspects of HAD, detection of rare events in a timely manner and consistency during complete product lifecycle. Potential improvements are for example

Further development of accident models
Definition of indicators for deficiencies of HAD systems
Increase density of accident analyses
Field studies with HAD systems across manufacturers
Surveys of customers who own vehicles with HAD systems
Effective testing of HAD systems during Periodical Technical Inspection.

It is necessary to implement those improvements before market introduction and to install an independent organization for execution of proof of probation in the field. For efficient product monitoring and continuously optimization of HAD, event data recording (EDR) is necessary. Therefore, standards have to be developed and implemented as well as procedures to deal with data.

The legislature has just created the framework for the introduction of HAD with the amendment of the German Road Traffic Act (StVG) in 2017. A proof of Probation in the Field is therefore also an essential prerequisite for the desired and required further development of the legal framework for the introduction of higher levels of automation.

7. Data in PEGASUS-format

For the PEGASUS project, a feasible way to gather data from different sources and different partners needed to be developed. As pointed out in step 2 the sources differ among others from simulated data over FOT and NDS data up to data from development vehicles. In order to get this data into a single database, a common data format needed to be established. The main goal is to ensure a consistent handling of data throughout the different processing steps. To achieve this, different requirements are checked, and a definition based on this information is generated.

First a structure for all the different data needs to be developed. Since the processing of data is done in Matlab, a Matlab structure is chosen. However, Matlab is not available to every partner in the project. Therefore, also hdf5-files can be used, but they need to have the same internal structure as the matlab-files.

The main structure inside the data-files is designed to have a channel for every signal that is saved in it. Inside this channel, a Time as well as a Value vector corresponding to each other need to be provided for every signal. The time vector should have a (mostly) fixed frequency throughout the entire measurement, but the frequency can vary between different signals. Values which do not change within an entire measurement, can be stored only once with a timestamp of 0. With this structure, all required data can be stored. All mandatory, but also further useful signals are defined. Additional signals can be enclosed, so that special data, which may be available in different test vehicles, can be integrated into the format.
Different requirements were assessed during this process step. The main requirements were given by the different data sources as well as from the further processing of the data. The main goal was identified as to be able to represent every possible situation which may occur to the highway chauffeur function in the project. Since the highway chauffeur is bound to highway scenarios, the data format only needs to be capable of representing these scenarios. Since most of the data is captured from vehicles, an ego-perspective is used in the developed format.

Generally two things need to be represented by the data format:

Objects in the current scenario
Environment of the current scenario

Used coordinate systems
To ensure a consistent representation, two coordinate systems are used in the format. A global coordinate system (North, East, Altitude) to describe the overall movement of the ego vehicle and a local coordinate system (x,y,z) bound to the rear axle of the ego vehicle. The directions in as well as the connections between these two are shown in Figure 1.

Figure 1: Coordinate systems in data definition

Most of the data is located in the vehicle coordinate system, since the main source for data are recordings made in different vehicles. Nevertheless a global coordinate system is necessary to be able to represent the overall trajectories of the ego and all surrounding objects.

Object representation
The object representation has a maximum of 65 object slots. This includes the ego vehicle and up to 64 surrounding vehicles. To distinguish the data between these object slots, every signal is replicated for every object slot. This is done by including the current slot number into the name of every signal. Like this the signal names become idX_SignalName with X reaching from 0 to 64. If there are less objects, not all of these slots need to be used. For every object, signals for typical translation and rotation values as well as other properties can be stored. All these values are stored relative to the ego vehicle, since this is the common format for measurements taken from a test vehicle.

To be capable of storing more than 65 vehicles with this format, a global object id is stored as one signal for every object slot. With this global object ID, every slot is able to store different objects over time, since single objects are likely to get out of range after some time. So every object within the measurement data gets a unique global object ID. For the further data processing steps, these global object IDs, and thus the current object, need to stay within a single slot of the data format. Since the number of used slots needs to stay constant within a single measurement, currently not used object slots need to be indicated by a global object ID of -1.

For the ego vehicle additional signals are necessary, to give a full representation of the current scenario. To be able to recalculate the entire trajectories, values for Northing and Easting of the ego vehicle as well as its current heading angle need to be provided. These values are not needed for the surrounding objects, the complete trajectories of these can be derived from the ego vehicles position.

Environment representation
As already mentioned, the environment in the PEGASUS project is restricted to highway scenarios. Therefore, an easy approach to describe the current situation for the data format can be used. The environment is described by the lane markings and the roadside infrastructure. All these are described by a third degree polynomial using four signals each (distance, heading, curvature and curvature derivative). With the help of these values, the position along the road of each feature can be calculated. Figure 2 and Figure 3 show the general concept of the lane markings and also the definition for heading and distance. Just like for distance and heading in Figure 3, the curvature and curvature derivative also need to be given at the rear axle of the ego vehicle

Figure 2: Lane markings left and right of the vehicle

Figure 3: Heading and distance of the first left lane marking

The Equation 1 to Equation 3 show how to compute the position, heading and curvature in the ego vehicles coordinate system:

Additional data, like lane marking color, lane marking type or type of roadside infrastructure (barrier, guard rail, grass, …) can be given with an additional signal.

Necessary recommended and optional signals
Based on the requirements of the database process some signals need to be available in every measurement. These signals are shown in Figure 4. For a basic representation of the environment, at least the left and right lane marking need to be available. These are bound to the ego vehicle. Further the positions and speeds, as well as the global object ID of all objects, are mandatory for every measurement. For the recalculation of the global trajectories of all objects, the absolute position of the ego vehicle is also needed in every measurement.

Figure 4: Mandatory signals for ego vehicle and surrounding objects

If this data is available in a measurement, all further process steps can be carried out. As already mentioned, a lot more signals are defined within the format. These can be used for further extensions of the database processes or other specific calculations. The remaining signals are divided into recommended and optional signals. The recommended signals should be provided if possible, the optional ones are possibly only necessary for special applications.

In the following a list of mandatory PEGASUS JSON-signals is shown. These signals must be available in test data. The convention is that the index for the EGO is 0 and is set between 1 and 64 for surrounding objects.

8. Application of Metrics + Mapping to Logical Scenarios

Mapping to Logical Scenarios

For the mapping from measurement data to logical scenarios, different signals and data are needed. To this purpose, a processing framework is set up. It takes the recorded data as input and outputs the mapped scenarios together with key performance indicators (KPI) that describe the logical scenario.
In a first step, the data is checked for errors and, if possible, these errors are handled and corrected automatically. If not, the data conversion step must be repeated. Since there are many different KPIs that may be desired for the analysis of the different logical scenarios, the processing framework is flexible concerning the computed measures. In order to achieve this, it analyzes the given scripts and functions needed to derive the measures builds a computation tree from the requirements of the distinct scripts. It analyzes input and output of each script and orders them in a way that always, all needed signals have already been calculated.

The functions and scripts are put in four distinct groups: derived signals, scenario probabilities, scenario extraction and indicators. Derived signals are the first step of computation. Here signals such as the time-to-collision, the time headway or also the assignment of objects to particular lanes are calculated. Scenario probabilities return a probability for each time step of the recording, indicating whether that time step is part of a scenario and with which probability. Functions for scenario extraction then take these probabilities and extract scenarios, thereby not only taking into account the single probabilities but also accounting for sensory deficiencies and processing hiccups. The last group of function, the indicators, extract KPIs for each of the extracted scenarios. Depending on the type of scenario, these indicators can be more or less extensive.
After all these functions have been processed, the data is aggregated for further use. In the current state of the processing framework, it is passed on to the database for further aggregation across trips and recordings. To this purpose, the extent of a scenario and its KPIs are transferred to a JavaScript Object Notation (JSON) file. This can then be read in by the data base.

All of this framework is executed for one sole purpose: the extraction of scenarios and the KPIs describing those scenarios. Due to its modular nature, the framework can always be extended to extract more scenarios or calculate more KPIs. If the naming conventions of the used signals is kept and the needed signals are available, the framework will easily integrate any new function.

The before mentioned scenarios are recognized in the functions of the scenario probability group. From the data, the probabilities are derived using a four-stage process. For this purpose, in the first stage, comfort zones are set up around the ego vehicle. These zones are shown in Figure 1 The zone shown by the green square is limited by the ego lane markings to the left and the right. Limitation to the front is handled dynamically by the time headway

Figure 1: The two defined comfort zones around the ego vehicle. The first is laid along the ego lane (green), the second is in proximity around the ego vehicle (red)

(THW). For the second comfort zone (red in Figure 1), the dimensions are set statically. A common selection here is a few meters before and behind the car and 1 m beside the car. Bear in mind, that other objects are handled by their center, e.g. if the distance from the side of the ego vehicle to the object is 1 m it is already very close.

As noted above, the framework has a modular approach. Therefore, the assignment of objects to the lanes is given as derived signal. This means, that when the mapping to the logical scenarios is performed, the function already knows if objects are in the ego lane at a distinct point in time or not. The same goes for the infringement of the proximity (red) zone. Therefore, the second stage to the recognition is computing all entrances into the ego lane across the complete trip. This is then complemented with the information of the exact position of the infringing object, thereby taking into account the THW thresholds used to define the relevance of the logical scenarios. Entrances into the (green) comfort zone are only considered relevant, if they occur below a certain threshold.

Once a candidate for entering the comfort zone is found, the third stage is the mapping to the actual logical scenario. To this purpose, the time before and after the object entering the comfort zone is investigated. Using the objects dynamic information, the framework can deduce which logical scenario this entrance belongs to.

The fourth and last stage is the reduction of parameters needed for the description of the scenario. For brevity in the description of the scenarios and thereby also for memory efficiency, not the complete trajectory is stored. Moreover, only parameters allowing the precise reconstruction of the trajectory are computed. For entrances from the front and rear, this is rather simple, since a straight movement relative to the road can be assumed. For entrances that include a lane change, a fitting of a simpler function describing the lane change is done. This way, even more complex scenarios can be described using only a small subset of the parameters in the raw recording data.

After a scenario has been fully recognized and its parameters have been extracted, the probability of the scenario is stored for further usage in the extraction process and for the computation of the KPIs.

Application of Metrics

In the following the purpose and conception of the criticality metrics are described as one example of an application of metrics. More detail can be found in the publication by Junietz et al. [1] When deducing scenarios from real world data, a filter to find relevant scenarios is required. Most driving especially on motorways is monotone and does not require a special test case [2]. The metric should detect all scenarios with increased driving requirements in order to derive test cases that are challenging. It can be applied on data from NDS/FOT and during HAD test drives to detect relevant scenarios. When the metric is applied online during driving, real-time computation is necessary. If this requirement is not met, a metric to filter all definitely irrelevant scenarios could be used in a first assessment step in order to reduce the number of scenarios that need further computational effort. The Worst-Time-to-Collision [2] is a metric designed to work as a first filter to efficiently find those scenarios and could be used here.

Requirements for a Criticality Metric
Besides the absence of accidents, metrics that describe the criticality of the test scenario could be used in order to assess the safety (compare section 4.3.2). Criticality is hereby defined as the driving requirements that are present in the scenario. The driving requirements are highly dependent on the scenario, the HAD-F is exposed to. Hence, a general requirement cannot be deduced. The occurrence of scenarios with high requirements can only be compared statistically in real world driving. If two different HAD-F perform in a different way in the same artificial test (simulation/test field), there is not necessarily a difference in safety, as a more conservative driving strategy does necessarily mean increased safety, if both HAD-F drive accident-free. An exception would be if the driving capabilities of the HAD-F are known (e.g. a reaction time until a full brake due to computation time and brake system delay). In this case it is a requirement that the driving capabilities are sufficient to perform accident-free and the driving strategy has to be adapted accordingly.

As all kinds of accidents shall be prevented, the accident severity is not considers here, so it is not a risk metric. Ideally, the metric should describe the probability of an accident in a specific situation for AD or a human driver. As different drivers and different AD also have different driving performance and therefore different collision probability, the metric shall describe the driving requirements in a situation that is independent of the available driving performance [3]. The driving requirements consist of the following entities:

Necessary acceleration in lateral and longitudinal direction
Margin for corrections of course angle (side distance)
Margin for correction of speed (front distance)

These entities depend on each other. A higher margin for corrections can be bought with a higher deceleration early on. Hence, the three entities should be normalized and combined into a joint value.

Comparison with State of the Art
Various metrics exist for the identification of scenarios, trajectory planning, and the assessment of criticality in general. They can be classified into a posteriori metrics, which are often used to analyze NDS [4, 5], and metrics with deterministic [6–9] and probabilistic trajectory prediction [9–13]. An overview and a classification of metrics with trajectory prediction can be found in [10] and [14]. However, those metrics do not combine all of the mentioned entities. In [15, 16] monte-carlo sampling is used to address combined maneuvers. In [17], the reachable and available free space is assessed to derive criticality. However, these approaches do not address the difficulty of driving on those trajectories. In a combination of different criteria is often done in trajectory planning and optimization [18, 19]. Defining an optimization function to be minimized offers the opportunity to combine different entities and weighting factors. Typically, the resulting trajectory is of interest together with the required control values in case model predictive control (MPC) is used. In PEGASUS, we use a similar approach with an optimization function designed to describe criticality as the driving requirements mentioned above. The minimized value of the optimization function is the criticality in the situation. It is important to note the difference between this approach and most use-cases for trajectory optimization. The metric has no direct feedthrough to the trajectory control. Its only purpose is to optimize possible future trajectories in order to calculate the criticality. This could be either done online during test drive, or a posteriori from recorded data.

Computation
To obtain the desired criticality metric, we define the state-space model with state vector x=[x_wy_wv_vψ_c] and input vector u=[a_xa_y]. The problem depends on the current ego velocity in natural coordinates v_v, the course angle ψ_c, and the acceleration a_x,a_y in natural coordinates. Additionally, the position in world coordinates is required because the position of the objects is not updated into vehicle coordinates once the prediction horizon is initialized. For most MPC applications, a linear single-track-model is used. This has the advantage that the actuator inputs can be optimized directly. However, in the application of the criticality metric, the accelerations are of interest. As we do not use the model for actual vehicle control, additional deviations from a simplified model (e.g. because the neglecting of tire cornering stiffness) can be neglected. Instead, a point mass model is used that is extended by a course angle that is required to transform the vehicle motion into world coordinates. A non-linear vehicle model is described by:

Assuming small course angle changes and small changes in velocity allows linearization of the model. Initializing the model at course angle zero and using a constant velocity v_old, which is updated only at the beginning of the prediction horizon (comp. [18]), the equations simplify to:

Resulting in the system matrices:

The goal of the MPC is to optimize the trajectory subject to the criticality within a given prediction horizon N. The first two influence factors are the accelerations that are minimized. The margin for corrections of angle is dependent on the lateral distance and the velocity. A higher distance gives more space to correct the course angle. A lower velocity has two influence factors: A disturbance (e.g. side wind) has a smaller influence towards a deviation in the position (because the velocity is integrated) and the course angle can be corrected with less effort in a_y. As we want to minimize a term that includes maximization of distances, we use the position with the maximum lateral distance to all obstacles as reference r_y and a maximum deviation d_y resulting in the following term:

The margin in longitudinal direction cannot be used in such way, because only front distance is relevant. As reference r_x we use the recommended following distance in Germany (0.5∙v/(km/h)). We use the piecewise linear function:

With the minimization of an objective function alone, the collision with other objects is not excluded. Hence, the outputs are bound by constraints that are updated at the start of the prediction horizon assuming constant velocity and course angle. Again, we neglect objects coming from behind and represent the left, right, and front boundaries to be constrained by c_l,c_r, and c_f respectively. Additionally, the inputs must be bound because the dynamics of a vehicle are limited by the available friction coefficient. According to Kamm’s circle:

However, to allow efficient solving of the optimization problem, Kamm’s circle is approximated with 12 linear constraints using , and taken from [18]. Modeling the objective function with the sum of the defined elements with constant weighting factors gives us following optimization problem:

The weighting factors are chosen in a way that allows equal weight of the four elements of the optimization function. However, R_x typically results in values 10 to 20 times higher than the other elements. Because the ego vehicle is approaching, the elements of the objective function are summed up for subsequent steps. Hence, we choose a weighting factor of 1/10 here, while all other factors remain 1. The final criticality is the maximum of the acceleration during prediction divided by the maximal acceleration possible.

Developed Workflow
The computational effort of the proposed metric is high compared to state of the art metrics. Hence it is suggested to preselect scenarios that might be relevant or in other words filter scenarios that are definitely not relevant. We suggest to use the metric Worst-Time-to-Collision (WTTC) [2] as it addresses all kinds of scenarios. It calculates the time until a collision if both vehicles perform the maneuver that leads to the fastest collision possible (Figure 2).

Figure 2 Worst maneuver using the maximum tire-road friction

As threshold, a WTTC value of at least one second is recommended based on calibration with real world scenarios [3] (Figure 3). Other metrics would also be possible in scenarios were they can be applied (e.g. front-to-rear collisions for TTC, TTB or required acceleration). The identified threshold are similar to those found in the SHRP2 database [20].

Figure 3: false and right identification of critical scenarios based on human labeled data

If sufficient data is available, machine learning could be used to classify scenarios for a first evaluation. During training, false negatives should be prevented using a cost-matrix punishing those events. For classification an ordinal scale is suggested to classify the scenario into uncritical, slightly critical and critical. For a cut-in scenario, the metrics WTTC, Time-Headway, Time-to-Steer and Collision Index (CI=v²/TTC) were favored using methods of feature selection. With more data available, a trained classification model could replace the suggested filter by WTTC only.

References
[1]   P. Junietz, F. Bonakdar, B. Klamann, and H. Winner, “Criticality Metric for the Safety Validation of Automated Driving using Model Predictive Trajectory Optimization,” in 2018 21st International Conference on Intelligent Transportation Systems (ITSC), 2018, pp. 60–65.
[2]   W. Wachenfeld, P. Junietz, R. Wenzel, and H. Winner, “The worst-time-to-collision metric for situation identification,” in 2016 IEEE Intelligent Vehicles Symposium (IV), 2016, pp. 729–734.
[3]   P. Junietz, J. Schneider, and H. Winner, “Metrik zur Bewertung der Kritikalität von Verkehrssituationen und -szenarien,” in 11. Workshop Fahrerassistenzsysteme, Walting, 2017.
[4]   M. Benmimoun, Automatisierte Klassifikation von Fahrsituationen: fka Forschungsgesellschaft Kraftfahrwesen mbH, 2015.
[5]   T. A. Dingus, R. J. Hanowski, and S. G. Klauer, “Estimating crash risk,” Ergonomics in Design: The Quarterly of Human Factors Applications, vol. 19, no. 4, pp. 8–12, 2011.
[6]   C. Rodemerk, S. Habenicht, A. Weitzel, H. Winner, and T. Schmitt, “Development of a general criticality criterion for the risk estimation of driving situations and its application to a maneuver-based lane change assistance system,” in Intelligent Vehicles Symposium (IV), 2012 IEEE, 2012, pp. 264–269.
[7]   H. Winner, S. Geyer, and M. Sefati, “Maße für den Sicherheitsgewinn von Fahrerassistenzsystemen,” in Maßstäbe des sicheren Fahrens. 6. Darmstädter Kolloquium Mensch + Fahrzeug, 2013.
[8]   R. K. Satzoda and M. M. Trivedi, “Safe maneuverability zones & metrics for data reduction in naturalistic driving studies,” in Intelligent Vehicles Symposium (IV), 2016 IEEE, 2016, pp. 1015–1021.
[9]   F. Damerow and J. Eggert, Eds., Predictive risk maps. Intelligent Transportation Systems (ITSC), 2014 IEEE 17th International Conference on, 2014.
[10]   M. Schreier, Bayesian environment representation, prediction, and criticality assessment for driver assistance systems. Dissertation, Technische Universität Darmstadt, 2016. Düsseldorf: VDI Verlag GmbH, 2016.
[11]   J. Eggert and T. Puphal, “Continuous Risk Measures for ADAS and AD,” in FAST-zero '17, 2017.
[12]   J. Eggert, “Risk estimation for driving support and behavior planning in intelligent vehicles,” at-Automatisierungstechnik, vol. 66, no. 2, pp. 119–131, 2018.
[13]   D. Althoff, J. Kuffner, D. Wollherr, and M. Buss, “Safety assessment of robot trajectories for navigation in uncertain and dynamic environments,” Auton Robot, vol. 32, no. 3, pp. 285–302, http://dx.doi.org/10.1007/s10514-011-9257-9, 2012.
[14]   S. Lefèvre, D. Vasquez, and C. Laugier, “A survey on motion prediction and risk assessment for intelligent vehicles,” (English), Robomech J, vol. 1, no. 1, pp. 1–14, http://dx.doi.org/10.1186/s40648-014-0001-z, 2014.
[15]   A. Broadhurst, S. Baker, and T. Kanade, “Monte carlo road safety reasoning,” in Intelligent Vehicles Symposium, 2005. Proceedings. IEEE, 2005, pp. 319–324.
[16]   A. Eidehall and L. Petersson, “Statistical Threat Assessment for General Road Scenes Using Monte Carlo Sampling,” Intelligent Transportation Systems, IEEE Transactions on, vol. 9, no. 1, pp. 137–147, 2008.
[17]   C. Schmidt, Fahrstrategien zur Unfallvermeidung im Straßenverkehr für Einzel- und Mehrobjektszenarien. KIT, Diss.--Karlsruher Institut für Technologie, 2013. Karlsruhe, Baden: KIT Scientific Publishing, 2014.

9. Logical Scenarios & Parameter Space

Logical scenarios that were previously defined are stored in this data container together with their variants extracted from measurement data. Logical scenarios are an abstract description of a traffic scenario. They are characterized by the fact that not all scenarios, such as distances or speeds, have a concrete value. Instead, some values are left open and only parameter ranges are specified. Which combinations of parameters occur how often can be extracted from measurement data. Several components are required to be able to save these logical scenarios together with the variants found from measurement data (concrete scenarios). First, parameters are derived from the modeling of the logical scenarios that uniquely describe the concrete characteristics of the scenarios. These parameters are then determined automatically in the extraction step for the scenarios found. In order to be able to use the modelled scenarios in simulations or on the test track, templates are created in OpenSCENARIO and OpenDRIVE. These templates are created manually for each scenario and stored in the database.

After scenarios in the previous process steps have been found and extracted from measurement data, e.g. from test runs, they are stored in this data container. However, scenarios must not only be extracted, but it must also be possible to play them back in the simulation and on the test track. On the one hand, it is important that a scenario found in measurement data, including the static environment, can be algorithmically correctly extracted and modelled. On the other hand, it is at least as important that these scenarios can be described technically so that the events can be reproduced precisely and that the description format used can be used in the simulation and the test track.

Within this project, the two description formats OpenDRIVE and OpenSCENARIO are used to fulfill this task. Both formats are open standards and are based on the XML description language. They solve the problem that an open standard is needed to describe the static and dynamic environment. Thus, within the database, a single combination of standards can be used to aggregate data from different sources such as FOTs, NDSs or simulations and simultaneously use them on the test track or simulations from different vendors.

OpenDRIVE was first released in 2005 and allows to describe a static environment with a focus on streets. Thus, individual road elements such as straight lines and curves can be parameterized and combined with other elements to create entire routes. OpenSCENARIO was first introduced in 2014 and further developed during the PEGASUS project. Unlike OpenDRIVE, OpenSCENARIO is used to describe the temporally dynamic elements of a scenario. For example, the position of a traffic light is defined in OpenDRIVE, while the switching times are defined in OpenSCENARIO. However, the focus is on the vehicles themselves. For example, vehicles are defined in catalogs and positioned on a route that is referenced in OpenDRIVE. The behavior of the vehicles in the scenario can be described by a story. Vehicle actions can be triggered when conditions occur. The actions include, for example, a lateral movement to change lanes on different levels of abstraction. A vehicle can be given only a new destination lane, the transverse movement can be described by geometrical primitives or precisely given by a trajectory in the form of a list of positions.

Although OpenDRIVE and OpenSCENARIO are technically powerful enough to define scenarios, there is still room for improvement. For example, the definition of routes in OpenDRIVE is often time-consuming, especially for more complex routes. For this reason, DLR introduced the SimpleRoad description language. This makes it easier to generate routes by providing a shorter and easier to understand description. Files in SimpleRoad format can be translated into OpenDRIVE by a provided converter.

Even when describing the dynamic parts of a scenario in OpenSCENARIO, compromises must be made in the current version with regard to parameterization. This becomes clear with the example of the description of a lane change trajectory. Within the scenario concept, lane changes are described by fourth degree polynomials. However, since the current version of the OpenSCENARIO standard does not support these as geometric primitives, such a trajectory must be defined as a list of positions. This makes it possible to use the polynomial for lane changes, but this trajectory cannot be described by a few parameters in OpenSCENARIO itself. On the other hand, supported geometric shapes such as straight lines can be easily modified by parameters, which is an important requirement for stochastic variation. The solution to this problem is to allow the introduction of non-standard compliant elements into OpenSCENARIO. For these, however, an appropriate script must be provided, which is executed by an introduced Transpiler and translates the new definition into standards-compliant OpenSCENARIO. Thus, new maneuvers and other elements can be easily integrated and distributed to partners for evaluation before ultimately eventually being integrated into the standard.

These data formats (OpenDRIVE and OpenSCENARIO) and extensions (SimpleRoad and Transpiler) are used to define the logical scenarios. As already mentioned, not all parameters are assigned a concrete value. Since elements such as distances and speeds are not fixed in a logical scenario, placeholder variables are specified in the files at these points. Before the scenarios can be used for a simulation, for example, these variables must be replaced by concrete values so that a concrete scenario is created from the logical scenario.

Figure 1: Illustration of the right-hand cutin scenario

Table 1: Parameters used to describe the right-hand cutin scenario in OpenScenario

Method to define acceptance criteria

As an example, the single reeving shall be considered from the right, which is shown in Figure 30. For the dynamic elements of this logical scenario, a total of 8 parameters were required, which are listed in Table 1. If a value is defined for each parameter, the constellation of the vehicles shown in this scenario can be uniquely described. The space of all possible parameter combinations describes the parameter space. It contains all possible variations of this one scenario. After, for example, a right-hand cut-in has been detected in the data, the values of all parameters are determined from the trajectories and saved. After this has been carried out for a large number of scenarios, analyses can be carried out in the next process steps, for example on the frequency of certain characteristics, in order to use the findings for the simulation.

10. Integration of Pass / Fail Criteria

Pass / fail criteria are implemented as fixed threshold. These thresholds are depicted by utilizing function styled formats. Therefore, a unified language was defined, offering basic operators and functions. As long as the threshold or inequality is fulfilled for all time steps during a test, the test is assumed as passed. The format is based on Matlab function to simplify the implementation effort needed to integrate pass-fail criteria into existing testing toolchains.

Figure 1: Picture of Pass / Fail Criteria

11. Space of logical Test Cases

coming soon

12. Application of Test Concept incl. Variation Method

Test Concept

The PEGASUS Testing Method is located in the upper right side of the general V-model (see figure 1). It is important to note what is in respectively not in the scope of the Testing Method.

Figure 1 Location of PEGASUS Testing Method within the general V-Model

In the scope of the PEGAUS Testing Method is:

Safeguarding the AD-Function regarding risk of collision
Sensor performance as an input for system performance

Not in the scope: Other HIGHLY Automated Driving Functions relevant topics covered by OEM, Suppliers tests or regulations: (examples)

Testing according ISO26262
Direct safeguarding of the sensor performance
Safeguarding
- interaction with driver (HMI, …)
- compliance with traffic rules

The Goal of the test concept is the development of a method to generate an evidence regarding the safeguarding of a level 3 system (Autobahn-Chauffeur, max. 130km/h). The test instances used are shown in figure 2.

Figure 2: Test instances and data flow within PEGASUS Testing Method

Input for the test concept are a set of logical test cases with corresponding parameter space incl. Exposure of parameters and Pass-/Fail criteria as well as special concrete scenarios (for example from certification purpose, accident data, automation risk analysis if testable in the described context (see above)).

The expected output are evaluated Scenarios and the probability for collision scenarios (with crash severity). Starting from space of logical test cases (= logical scenarios + parameter space + Pass/Fail-Criteria + exposure of parameter) test cases get allocated to test validation levels.

Starting from the space of test cases, the test cases are assigned to the PEGASUS test instances.
Hereby, “all” logical scenarios get tested in the simulation to create a high efficiency. Based on the automatized/stochastic variation of the logical scenarios parameter, concrete test cases are created. These concrete test cases get evaluated after the Pass-/Fail-Criteria.

Interesting or critical cases (i.e. not fulfilled or close fulfilled Pass-criteria) get tested or validated in real cars on a proving ground. In addition, manual selected concrete test cases can be evaluated on the test ground (i.e. test as per ECE R79 or rating tests).

Within field tests it is not possible to test specific test cases of the space of logical test cases. Instead, the behavior of drive features get tested in real traffic, whereby guidelines of route, time or weather, challenges for the HAF- function can be created. The major target is to find “surprises” (e.g. new scenarios, new parameters) and create new measured data for further analyzes in Replay2Sim.

How does the test case allocation take place and how does the test procedure works?

Test case allocation:

Simulation:
- All logical test cases regarding scenario based testing with a high number of scenarios but a low relevance regarding real sensor performance
Proving ground:
- Pre-selected tests, e.g. certification tests
- Test with a high relevance regarding drive dynamics and real sensor performance
- Rare events which can hardly be seen in field tests
Field tests: Tests with a high relevance regarding real system performance under a high variation of surrounding conditions

Deviation of test cases for simulation is done by stochastic parameter variation and test automatization respectively, for the test ground based on manual selection or identification of relevant scenarios within the simulation.

Test result:
Based on Pass/Fail criteria evaluated concrete scenarios for simulation, test ground and field test Probability for accident scenarios

Test end criteria regarding simulation (suggestion):

Create transfer function between scenario parameters (input) and test result (output, e.g.accident yes/no, distance between ego-vehicle and relevant target):
Target value for quality of transfer function (e.g. R_Qd-value) ≥ 80%*
Calculate standard deviation σ of computed probability for accident scenarios:
Target value for σ ≤ 20%* (*according state of the art)

Starting from space of logical test case the test cases and test level get assigned. Hereby “all” logical scenarios within the space of logical test cases get tested in the simulation. Based on the automatized/stochastic variation of the logical scenarios parameter, concrete test cases are created. These test cases get evaluated after the Pass-/Fail-Criteria. Examples of these criteria are shown in figure 5. Critical cases (i.e. not fulfilled or close fulfilled Pass-criteria) get evaluated in real cars on a proving ground. In addition, manually selected concrete test cases can be evaluated on the test ground (i.e. accident scenarios, rating or certification tests). Within field tests it is not possible to test specific test cases of the space of logical test cases. Instead, the behavior of drive features get tested in real traffic.

The major target is to find “surprises” (i.e. new scenarios, new parameters). These surprises may be enforced by different guidelines in route (i.e. tunnel) or time (i.e. low sun) and can be further analyzed in Replay2Sim.

Test of the HAD-function in real world traffic (long term testing)

identification of specific individual situations within the framework of event-based simulation
Identification of faults/impairments of the system as a result of environmental influences cannot currently be simulated directly via models, as no suitable physical models are yet available.
Perform special “pass” assessments (e.g. risk in passing or following other vehicles)

Figure 5: Examples for PASS/FAIL criterions

Variation Method

The processing step of the stochastic variation plays a central role within the concept of a scenario-based assessment of an automated driving function. Given a scenario data model with all relevant parameter spaces as well as additional a-priori knowledge about these spaces from a central database, the variation method has the main task to find all relevant critical parameter subsets within each logical scenario space. According to the PEGASUS test concept this variation is mainly applied within the simulation since it is the only test instance capable of handling a large number of test runs at relative low cost. The stochastic variation of parameters within all relevant logical scenario spaces offers the most complete exploration of the extremely large scenario space representing the reality as faced by the automated driving function. The variation and simulation should deliver a description of all relevant critical scenario subspaces. This description or so-called characterization of the identified critical subspaces including knowledge about their occurrence probability is then passed to the risk assessment subsequently accumulating these results to a total risk measure. This risk measure is the basis of the PEGASUS safety argument.

Given a logical test case (e.g. OpenSCENARIO, OpenDRIVE, parameter space description, and metrics for safety assessment of simulation results) the goals of the parameter exploration can be defined as one of the following:

Find one or more worst cases of the safety metrics in the defined parameter space. This can be defined as an optimization problem.
Characterize the subspace of parameter values where a metric reports a safety violation
Quantify (e.g. find an approximation of) the probability of safety violation in the defined parameter space, if possible, also with a confidence measure in the probability value.

A combination of algorithms following the goals of (1) and (2) can also be used to provide evidence that inside the most probable subspace of parameter values no safety violations occur – in case, the search inside this subspace does not deliver any scenario violating the safety limits.

More formally, the input of a parameter exploration algorithm can be defined as follows:

N parameters {P1, .., Pn} with mixed discrete and continuous value domains and with defined min-max bounds.
One or more safety metrics {M1, M2, … }, such as time-to-collision (TTC), distance-to-collision (DTC), and others.
Optionally the parameter description can include constraints among parameter values, such as p1 < p2.
The parameter description can also include prior probability distributions and correlations among the parameter values – a required input for all exploration methods that have the goal to assess the integrated probability of safety violations.

In order for the results of the simulation to be useful for a safety argumentation the following properties of the simulation models, the input data and the logical scenarios must be ensured:

The simulation model must be realistic, i.e. it must deliver results close to reality.
The parameters influencing the function/safety must be complete (e.g. no important parameter/effect must be missing), moreover, the min-max limits must be correct.
The metrics must deliver a realistic safety assessment.
In case the prior probability distributions are used in the argumentation, the distributions must be derived from data having sufficient statistical relevance.

Test coverage

Coverage measures attempt to define completeness criteria for a test. For the assessment of automated driving functions at system level no accepted completeness measures exist to date. Due to the fact that, due to its complexity, a complete characterization of the parameter space is impossible, is it reasonable to assume that in future, even if one or more coverage measures will be defined and recommended for automated driving vehicle safety assessment, they will be used as indicators of incompleteness, rather than as indicators of completeness.

Metrics used in the conducted parameter exploration experiments

Within PEGASUS, we considered a set of simple metrics to identify the criticality of scenario. Such Metrics are all representing a sort of distance measure, e.g. the distance itself, time headway, time-to-collision or time-to-react. Two example metrics, which are experimented and are used in illustrations in the subsequent sections, are:

time-to-collision (TTC) – no assessment of the severity level
time-to-collision-collision-speed (TTC_VCOL) – uses the relative collision velocity as severity assessment in case a collision occurs. The values are defined to be equal to:
- TTC, when TTC > 0
- – (abs(relative collision velocity)), when TTC = 0.

The Figure 1 below compares the response surfaces of the two metrics in an experiment.

Figure 1. Plots of the response surface for TTC (left) and TTC_VCOL (right)

As the above figure illustrates, a metric that can distinguish among severity levels provides more information and structure for the search space and probably makes more sense when searching for worst-cases, for instance.
Definition and computational challenges related to the metrics:

How to define metrics that give a realistic measure of the safety violations, including the severity of the incidents? Moreover, the computation of the metric models should not be computationally too expensive (for instance, a FEM crash simulation would probably not be acceptable considering the large number of scenario simulations needed).
How to define smooth metrics which allow more efficient optimization algorithms?
How to define metrics that do not expose several spurious local minima or large “plateau areas”?
How to scale several metrics in order to get to comparable values, in case several metrics are to be used in a logical scenario?
One should also be aware that sampling effects and numerical errors in the simulators can make the identification of a zero-crossing (for instance for TTC) more difficult.

In case that more than one safety metric has to be used in the analysis of a logical scenario, the metrics’ output must ideally be comparable among the metrics. Since all metrics refer to the safety of a scenario, scaling the metric outputs, for instance between [-1, 1], could be possible solution (for instance, as a combination of a measure of the distance to an incident and a measure of the incident’s severity – consider that, in a Failure Mode and Effects Analysis (FMECA) the severity of an event is assessed using 10 severity levels).

Finding worst cases / optimization algorithms

There exist a large number of optimization algorithms that can be used for searching safety violations using the metrics, respectively worst-cases of the scenario severity. The output of an optimization algorithm usually cannot be used to characterize the defined parameter space – unless no safety violation is found and an argument about the coverage of the searched space can be built. The following properties of the response function, i.e. here the safety metric(s), can pose challenges to different classes of optimization algorithms (Multiple restarts with random starting points can often compensate in a certain degree the mentioned weaknesses of the optimization algorithms.). Therefore, these properties should be inspected and their potential effect should be assessed in the evaluation results:

Plateau subspaces: these are subspaces where the metrics have a constant value and do not deliver any orientation towards the safety violations or the worst cases
Multiple (sub-optimal) local minima: for instance, if the TTC is used as a metric, it is possible that, depending on its implementation, each time the ego vehicle passes another car/traffic object, a local minimum is reached – even when the cars are driving in parallel lanes and there is no high risk of collision. The search around local minima can consume a lot of resources.
Abrupt changes of the safety metrics or singularities: such “islands with high severity” might be hard to detect.

Several optimization algorithms have been implemented and analyzed. Two optimization algorithms are:

Local Search (LS) and
KD-OPT gradient optimization (KD-OPT).

Local Search

This is a variant of the “simulated annealing” classical optimization algorithm. Starting form a given seed, the algorithm searches neighboring points and follows the current best/worst case found so far. Mixed discrete and continuous parameters as well as (simple) constraints among the parameter values can be supported.

The algorithm is specified by the starting point (starting seed), the relative distance within which neighboring pints are generated and the total number of points (concrete scenarios) that are to be generated. The generated points are attracted by local or global minimum points. The choice of the configuration parameters can influence the outcome of the algorithm (which local/global minimum point is found). The higher the dimensionality of the problem (number of parameters) and the higher the distance between the starting seed and a local/global minimum is, the more simulation points are needed for the convergence to a solution point.

Figure 2. Output of LS algorithm, depicted with Parallel Coordinates (left) and a 3D scatter plot (right).

KD-Opt gradient optimization on sub-spaces

This algorithm searches for the worse cases along individual axes iteratively until a convergence point is found (In general, the worst cases are searched iteratively until a convergence point is found, on axes, planes, and sub-spaces up to a maximal dimensionality K.). This often leads to a more direct search for a local or global minimum, even for larger numbers of parameters. Mixed discrete and continuous parameters as well as (simple) constraints among the parameter values can be supported.
Even in cases when several local minimum points exist the algorithm can still “jump over” the local minimum points and find a global minimum - although, depending on the shape of the response function, finding a global minimum cannot be guaranteed. Plateau regions of the response function can be traversed often without dramatic problems. When the worst case is part of a plateau region (for instance a big subspace where TTC = 0) then this algorithm delivers one (arbitrary) point from the region as result. The choice of the starting point (seed) can influence the found worst case.

Figure 3. Output of the KD-Opt algorithm. Parallel Coordinates (left) and 3D scatterplot (right).

Characterization algorithms

Goal of the characterization algorithms is to deliver information about the response of the severity metric(s) in the whole parameter space associated with a logical scenario. In general, this is more valuable than just finding one worst case point, as this is closer to a safety assessment in the complete parameter space. At the same time, solving characterization problems is computationally more expensive than solving optimization problems. Therefore, when the dimensionality increases, compromises with respect to the coverage of the parameter space have to be made. For this, the following characterization algorithms can be used:

Fixed sampling with full or with partial cross products (FS)
Adaptive sampling with full or partial cross products (AS)
Random generation (RND)
Multi-stage mix of strategies

Fixed sampling with full cross products

This is a classical cross product among selected parameter values. Mixed discrete and continuous parameters as well as (simple) constraints among the parameter values can be supported.

The complexity of the algorithm is exponential in the number of parameters. More precisely, assuming N parameters with S sampling points per parameter the total number of parameter combinations in the cross product is SN. The exponential increase of the number of value combinations limits the usability of the algorithm to only small N (and S). The table above depicts the time required to explore by full cross products scenarios that require 10s of simulation for various values of N and S.

The algorithm is easy to implement and the results are easy to comprehend, e.g. can be depicted with a variety of visualization diagrams, including projections on intuitive 3D plots, as depicted in the Figure 4 below.

Figure 4. Output for fixed sampling with full cross products.

While the algorithm provides in intuitive characterization of the space for small N (and a sufficiently large S), we should mention that the worst case (e.g. as found by optimization algorithms) is usually not included in the sampled grid. In order to compensate this, one can of course, use the algorithm in combination with optimization algorithms executed before or after the fixed sampling.

Fixed sampling with partial cross products (size K)

This is a variant of the cross-product algorithm where the size of the cross products is limited to at most K parameters. Mixed discrete and continuous parameters as well as (simple) constraints among the parameter values can be supported.

For N parameters there exist C(N,K) combinations of K parameters, for each of which a size K cross product can be built. The number of generated value combinations is only exponential in K, not in N, e.g. for S sampling points per parameter this is C(N,K)*S^K.

For small K one can increase N. But is it “safe” to reduce the cross-products size from N to K? Apart of some cases, when subsets of parameters do not interact with each other, the reduction of the cross-product size is not guaranteed to be “safe”. The figure below shows the reduction of the sampled space when K is reduced from 3 to 2 dimensions: most of the volume is not sampled. If the difference between N and K is larger the tested subspace literally vanishes inside the parameter space. You can compare the figure below with the diagrams generated for the full cross product in the previous section - note that here N-K=1.
On the other hand, we should mention the following positive properties of the FS-KD algorithms:

can be used even for larger N values
coverage measures for K parameter combinations can be demonstrated
if some parameters have a higher influence on the severity metrics then they can be placed in a cross-product set, while the rest of parameters can be analyzed serially around selected points from the more significant subspace (e.g. around worst points or border points resulted from the analysis of more significant subspace)
intuitive 3D surface plots of the response function can be easily built. This helps to easier understand the nature of the 2D/3D parameter interactions in order to develop a parameter exploration strategy in an iterative manner.

Figure 5. Outputs of the FS-KD algorithm.

The influence of the seed point around which the 2D/3D/KD projections are built is worth discussing. The figure below displays the results of the FS-KD (with K=2) when the seed is (a) in the middle of each parameter domain, versus (b) in a point with a bad severity value resulted from a previous search with an optimization algorithm. As can be seen, the projections of the worst-case severity are closer to the ones obtained with full cross products when the projections are around a worst-case point. Therefore, if the goal is to focus on the characterization on severity violations, it appears reasonable to first search for a worst case and then to apply the FS-KD search using the worst case as a seed.

Figure 6. Influence of the starting seed for FS-KD. Up: starting seed is the domain middle point. Down: starting seed is a worst case previously found.

Adaptive sampling with full/partial cross products

Adaptive sampling has the goal to identify better the local minima (local worst cases) and the borders between accepted and not accepted severity metric values. The algorithm starts with a fixed sampling (full or partial cross products) and then increases the sample rate near one or both of the (a) severity border crosses or (b) local minima / worst cases. Mixed discrete and continuous parameters as well as (simple) constraints among the parameter values can be supported.

The figure below depicts a response surface with 4 local min/max points (left surface plot), the severity response with a relatively high fixed sampling rate (13 points / axis, middle heatmap) and the result of adaptive sampling with a rather small fixed sampling rate (5 points / axis), super-sampled around severity borders and worst cases.

Figure 7. Outputs for adaptive sampling. Left: response surface. Middle: heatmap with higher fixed sampling. Left: heatmap with low fixed sampling and super-sampling around border-crosses and the local worst cases.

The complexity of the algorithm is higher than the one of the fixed sampling FS/FS-KD, e.g. above C(N,K)*S^K. Because the borders of the regions with severity violations can be very long if explored with a high resolution in a multi-dimensional space, the algorithm is only useful when applied to rather small 2D or 3D parameter subspaces, for instance, most useful when applied around a previously found interesting point, such as a worst case found using optimization algorithms.

Random generation

Random generation is a simple basic sampling method that can often provide a good complement to other, more goal oriented, methods. Although it is simple, it often can compensate some weaknesses that more sophisticated heuristic algorithms can exhibit. Mixed discrete and continuous parameters as well as (simple) constraints among the parameter values can be supported.

Random generation provides a reasonable degree of coverage (kind of, uniformly distributed sampling points), if the number of generated value combinations is not too small. For instance, for the cases when FS is not feasible and FS-KD does not provide enough coverage (due to a high difference between N and K), it is reasonable to switch to RND, or at least to add also a number of random generated scenarios to the sampled set.

The figures below depict the outcome of random generation on a parameter space with 5 dimensions. You might want to compare the results with the ones depicted in the previous sections for FS and FS-KD.

Figure 8. Outputs for randomly generated scenarios.

Multi-stage mix of strategies

Without prior knowledge about the response surfaces of the severity metrics for a specific logical scenario, it is not easy to specify an algorithm adequate for characterizing the parameter space. The best general-purpose solution that we have identified after experimenting with a few scenarios is to be able to specify a mix of all strategies: optimization with several random restarts, random generation and fixed sampling with partial cross products built around interesting points, such as, around previously found worst cases.

Figure 9. Output for a multi-stage combination of random generation, local optimization and FS-KD

The above figure depicts the mix of (1) random generation, (2) optimization with local search, followed by (3) fixed sampling with 3D partial cross products. Last stage was added also in order to be able to plot the severity response surfaces around the worst case found after the first two stages. You might note that here we used TTC_VCOL as severity metric, a metric that uses not only the estimation of the time left till a collision occurs, but also the relative collision speed in case a collision occurred.

Characterization with quantification of probability of safety violations

Assessing the probability of safety violations for a logical scenario space with a given confidence measure is a challenging goal of the stochastic parameter exploration. In order to approach such a goal, several preconditions about the input data must be ensured:

All relevant parameters that influence the experiment are known, including their prior probabilities, correlations, constraints and other dependencies.
The simulation model (ego car, street, traffic, environment) delivers values that are close to reality.
The severity metrics deliver realistic evaluations of the severity of a scenario.

The assessment of the probability remains challenging for more than just a few parameters, due to the high number of samples required for a probability estimation with a reasonable confidence, even when all of the above assumptions are met.

Theoretically a full cross product parameter exploration, with a sufficiently fine sampling resolution, followed by an integration of the probabilities around the severity violations can lead to a probability estimate. The method is, however, impractical for more than 3-4 parameters.

In principle, Monte-Carlo simulations (or derivates that optimize the number of samples, such as Importance Sampling) can be used to estimate the probability of severity violations. The large number of simulations required for obtaining a probability estimate with a reasonable confidence is very large, so that these methods can only be applied for scenarios with a very small number of parameters.

A particular fortunate case, when an estimate of the probability can be made, or at least, an argument for the safety inside a parameter space can be given, is when, after using a combination of optimization and characterization methods (with a arguably sufficient coverage), no concrete scenario violating the safety conditions can be found. This is not a very likely case for the complete parameter space of a logical scenario, but the method can be used to argument that, inside the most probable subspace of the parameter space, no safety violations occur.

Other approaches, that have the potential to lead to reasonable estimates with a (significantly) smaller number of simulations, for instance using surrogate models, machine-learning and DoE methods, should be investigated in future.

Recommendations for a simulation-supported safety argumentation of logical scenarios

Generally accepted and sufficient coverage goals for the safety assessment of logical scenarios do not exist to date. Increasing the coverage of the analyzed parameter space towards the feasibility limits is an option that should be taken into consideration.

Even if a completeness guarantee cannot be given, simulation results can contribute to a safety argumentation. Two pragmatic analysis approaches that appear reasonable to us to date are:

1. Absence of safety violations in the most probable parameter subspace:

Absence of safety violations in the most probable subspace is an argumentation step that appears to be within the reach of current technology. This goal could be approached using a combination of optimization and characterization algorithms.
In the absence of a more complete and reliable assessment of the cumulated probability of the safety violations, at least the next goal should be attempted as well.

2. Parameter space characterization using a combination of algorithms:

Random generation – for a “fair” coverage of the parameter space.
Optimization algorithms with multiple random restarts – for a more goal-directed identification of safety violations and worst cases in differing regions of the parameter space.
Fixed sampling with K-size cross products – for K-dimension coverage of value combinations around a previously chosen seed, as well as for more intuitive visualizations of the response function with 2D and 3D projections, as these might be essential for the assessment conducted in the next step (they could help identify areas with unexpected parameter interactions, for instance).
Manual assessment of the results by experts. An intuitive and interactive visualization and analysis of the results on sub-spaces is likely to be necessary for conducting this activity.

13. Test Cases

Test Cases in the PEGASUS context are concrete scenarios with PASS-/FAIL-criteria, which can be executed either in Simulation or on proving grounds. The test automation generates one new concrete scenario in each (testing) step. This concrete scenario is available to the Simulation or Proving Ground module for further processing purposes. The data format is OpenDrive: „.xodr“ and OpenScenarion: „osc“.

14. Test HAD-F: Simulation, Proving Ground, Real World Drive

Simulation

The simulation is a test platform used for the assessment of the functional implementation of the HAD-function. It plays a central role within the PEGASUS method, because it has the capability to cover a large number of test scenarios and enables to vary freely the relevant scenario parameters. In contrast to the proving ground or real world drives, simulation models are used to interact with the implemented HAD-function. In each simulation run, the HAD-function is assessed for a defined concrete scenario and concrete road network by means of given pass / fail criteria (metrics).

One central test platform used to assess the functional implementation of a HAD-Function in a large number of scenarios and variety of scenario parameters is the simulation. The essential idea behind this test platform is that input signals of the HAD-Function are provided by means of simulation models and output signals of the HAD-Function are then used to dynamically update the simulation.

In contrast to simulation, in other test platforms such as proving ground and real world drive a real vehicle is used to test the HAD-function. There, the function typically runs on a real ECU integrated in the vehicle. All input signals of the HAD-function are provided by the vehicle itself and not by simulation models. Typical input signals are, e.g. vehicle dynamic data (velocity, steering angle, etc.) or environmental data (e.g. detected objects, lane information, etc.). The latter ones are provided by means of environmental sensors based, e.g. on radar, camera, or LIDAR technology, but may also be provided by some downstream data processing like a sensor data fusion. Output signals of the HAD-function, e.g. steering commands, may then be used to control the real vehicle. Therefore, testing a HAD-function using a real vehicle has the advantage to be closer to reality than simulation test instances. The drawback of real driving tests is the impossibility to cover a large test space such as all possible scenarios on highway.

The simulation is able to cover more efficiently the large scenario test space and the necessary variation of scenario parameters. Simulation models are used to provide input signals for the HAD-Function. Furthermore, the used simulation models may be dynamically updated by signals provided by the HAD-Function. In general, the closeness to reality of the simulation is determined by the accuracy of the used models for each situation. That means, the required closeness to reality should be considered when choosing the simulation models. However, the verification as well as the validation of the simulation, e.g. the demonstration that the simulation meets the required accuracy, is an important and challenging task. Without a certain validation the simulation results offer only a limited knowledge.

In the following we take a look at the interaction between the simulation models and the HAD-function more closely. Figure 1 gives a schematic overview of a typical setup of a simulation environment considered in the project.

Figure 1: Closed-Loop simulation of HAD-function. The simulation may be executed on a MiL-, SiL-, or HiL-platform.

A common approach is to use a simulation tool in which all models, such as the vehicle model, the environment simulation, etc. run.

To configure the simulation it is necessary to define the road network and the scenario. Among other things, the desired behavior of the traffic surroundings as well as the subject vehicle (velocity profile, lane association, etc.) are defined. An important decision within the project was to define the road network and the scenario by the vendor independent and open file formats OpenDRIVE and OpenSCENARIO, respectively. Its prototypical support by the simulation tools of the project partners IPG, Vires, and dSpace was implemented or improved within the scope of the project with the goal of establishing an internationally accepted open source standard.

Based on the simulated environment, which in the first place is the simulated ground truth, the simulated sensor signals for the environment sensors can be derived. This is done by means of sensor models for integrated sensors like, e.g. radar, lidar, and camera. The main intention of sensor models is to simulate the behavior of real sensors, like e.g. the field of view or the error characteristics. Topics related to sensor models, including used standards like Functional-Mockup-Interface (FMI) or Open-Simulation-Interface (OSI), are explained later in detail.

Based on all input signals the HAD-function then computes the internal or output signals, e.g. signals for vehicle control, used to dynamically update the simulation including the vehicle model. This so-called closed-loop simulation may be performed on different platforms, such as MiL, SiL, or HiL platforms.

Figure 2 shows an overview of how the closed-loop simulation described above is integrated in the overall simulation environment.

Figure 2: Schematic Overview of simulation environment. For more details about the simulation, see Figure 1.

Inputs to the simulation environment are a concrete road network and a concrete scenario. Concrete means that its parameterization is already set beforehand by the stochastic variation / the test automation tool to a set of concrete values (see P12 for details about the stochastic variation).

Within the PEGASUS method, the concrete scenario defined in the OpenSCENARIO file format may contain not yet standard conform extensions of the format, e.g. a not yet standard conform representation of the trajectory of a cutting in vehicle. This PEGASUS specific OpenSCENARIO file is then transferred into an actual standard conform OpenSCENARIO by means of the so-called Transpiler and an appropriate translation script. The non-standard content of the PEGASUS specific OpenSCENARIO is actually addressed by the Association for Standardization of Automation and Measuring Systems (ASAM) within its standardization process of the next generation OpenSCENARIO. This allows to evaluate format extensions without modifications of the used simulation tool. It may also be required that the standard conform OpenSCENARIO file contains specific, practically non parameterizable, elements. That are elements for which it is complicated or practically infeasible to define parameters, e.g. a trajectory definition consisting of a time series containing pairs of time and position. The consequence is that these elements may not be variable in the stochastic variation. Nevertheless, it may be required that the concrete OpenSCENARIO contains these elements, e.g. to allow testing the concrete scenario in another test platform like proving ground.

As an example, let us consider the trajectory of a cutting in vehicle. For the stochastic variation the trajectory must be practically parameterizable via, ideally human interpretable, parameters like the duration of the cut-in maneuver or parameters of a mathematical representation of the cut-in trajectory. In contrast, for testing the same concrete scenario on the proving ground it may be best if the cut-in trajectory is represented in a standard conform OpenSCENARIO via a time series containing pairs of time and position of the cutting in vehicle. The Transpiler can then be used to transfer the practically parameterizable OpenSCENARIO into an OpenSCENARIO where the same cut-in trajectory is represented via pairs of time and position. The latter one can then be used for both, the closed-loop simulation and the proving ground.
Beside the scenario description, the parameters of the road network, such as curve radius or lane width, may also be varied by the stochastic variation. This results in a concrete road network, e.g. the road network with a fixed parameter set, constituting an input to the simulation environment. However, a variation of a road network defined in the file format OpenDRIVE is rather challenging. Therefore, the file format SimplifiedRoad is used instead. The road network defined in a concrete Simplified Road is then converted in an OpenDrive using the OpenDrive-Generator.

The generated concrete standard conform OpenSCENARIO and the generated concrete OpenDRIVE are then used to perform the simulation with the HAD-function. While simulating, data of certain signals, such as velocity of the subject vehicle and the challenger, are recorded using the PEGASUS format as described in data container D9. Based on the recorded data the test is then evaluated using defined pass / fail criteria (metrics). This may include the evaluation of certain metrics. For the stochastic variation, typically a certain (criticality) metric, e.g. distance, time-to-collision (TTC) or time-to-react (TTR), will be evaluated to obtain a measure for the criticality. The development and definition of evaluation criteria and metrics are not covered within this chapter (see P8 and D6). They are input to this processing step. The result of the evaluated criteria then forms an output of this processing step which is then used for further assessment.

Figure 3 schematically illustrates how the simulation environment described above is integrated in the overall software architecture including relevant project partners. Note, that some components of the architecture are not considered within this chapter, but in other processing steps. Furthermore, relevant project partners for the simulation environment are shown.

Figure 3: Schematic overview of the overall software architecture including the simulation environment described in this process step.

Transpiler

OpenSCENARIO and OpenDRIVE are two file format specifications that help to accurately define environments and scenarios relevant for the safety validation of automated driving. However, both data formats in its current version have several limitations, which make it cumbersome to model complex scenarios. Also, changes and extensions of the dataset format have to go through a lengthy standardization process and have to be implemented in the (simulation) tools, which both inhibit a quick development of the standards. Therefore, an OpenSCENARIO Transpiler is introduced to tackle both problems. The main idea behind the Transpiler is to allow the addition custom elements to the OpenSCENARIO formats, which are not part of the standards, but can be translated to standardized OpenSCENARIO format elements. The translation is done by translation scripts, which are executed by the Transpiler. By combining the translation scripts with a documentation, new ideas and extensions can easily be shared with project partners and thus can easily be evaluated before integrating them into the standard. It is executed after creating a concrete scenario from a logical scenario by specifying the parameters and before handing it to e.g. the simulation environment or any other tool, which needs standard conform OpenDRIVE and OpenSCENARIO files.

Both, the OpenDRIVE and the OpenSCENARIO file format standards solve multiple problems regarding the definition static environments and scenarios during the development and safety validation of automated driving. The OpenSCENARIO format, for example, is a technical description of scenarios. It is based on the XML description language and is developed by a consortium of more than 25 partners parallel to and in coordination with the PEGASUS project. In this description format, the elements of a scenario, such as the start positions or the maneuvers of road users, are clearly defined by predefined parameters. On the other hand, this format allows the elements to be linked to a story by conditions, triggers and sequences.
The development of this standard has many advantages and solves essential problems at the same time. The definition of the format makes it easier for tool manufacturers to adapt their software to the requirements of the safety validation of automated vehicles. At the same time, interoperability between tools on the same or different levels as the simulation and the test track is ensured. Only a standardized file format makes it possible for a scenario to be used in several tools once it has been specified.

However, OpenSCENARIO is still under development and so it became clear in the PEGASUS project that changes and enhancements are necessary to create scenarios for the safety validation of automated vehicles. Since it makes sense to share and discuss these enhancements with the project partners before they become part of the standard, the Transpiler was created. With the help of this tool, OpenSCENARIO can be extensively extended without tool manufacturers having to adapt their software to these changes. The trick is to provide an appropriate script for each extension, which translates the extension into standards-compliant OpenSCENARIO. An example of this is the introduction of new maneuvers. To display lane changes, the lateral movement should be described as a fifth degree polynomial, which OpenSCENARIO in its current version does not support as a geometric form. Any trajectories can be described as a series of positions in an OpenSCENARIO file. However, this representation cannot be parameterized, which is a necessary feature for the stochastic variation in the testing process. Therefore, the standard description format was extended by a parameterizable element for lane changes described by a fifth degree polynomial. A suitable script was implemented, which translates the new element into a series of positions. This allows the new maneuver to be used for validation before the element is included in the standard and implemented in the corresponding tools.

Figure 4: Simplified visualization of the Action Restriction and translation into standard OpenSCENARIO

The Transpiler can also be used to easily add complex but at the same time parameterizable elements to a scenario, which cause changes in several places of the file. One example are action restrictions, which e.g. describe vehicles blocking a lane (see Figure 4). An action restriction can consist of one or more vehicles, which have to be added to the scenario and equipped with a start position and a behavior. In the current version of OpenSCENARIO, these restrictions would not be parameterizable, but at most cumbersome to implement manually. On the other hand, a new custom element with a number of parameters could be added, which is converted fully automatically into the corresponding standardized elements by the Transpiler. The Transpiler is an open source and free software provided by the PEGASUS project.

OpenDrive-Generator

OpenDRIVE is an open, vendor-independent mature XML-based file format, which contains all features to model real road networks and is supported by commonly used simulation software. Therefore, OpenDRIVE is a natural choice to model virtual tracks for test cases. In combination with OpenSCENARIO for describing dynamic behavior of objects and traffic participants (an open and XML-based file format as well), these two formats build a solid foundation to model environments for virtual test cases.

Direct modelling of different test track variants in OpenDRIVE is rather challenging. A simple change in a track to create variants usually leads to several other changes in the same OpenDRIVE file caused by the dependencies of the OpenDRIVE data model. Therefore, a domain specific language (DSL) is used to efficiently model various variants of highways to create adequately realistic tracks to describe test cases. This DSL is called Simplified Road and is specially designed to simplify the prototyping of road tracks.

The OpenDRIVE-Generator processes Simplified Road files and its parametrization to generate OpenDRIVE files. These are finally used for the simulation and testing activities.

Interfaces

The Simplified Road format is specified by an XML schema. Figure 5 visualizes an extract of that schema file. Some node details are collapsed, to limit the shown information to provide just an example. Especially, the course’s children are the same for lane, elevation and lateral profile. Furthermore, children of center are the same for left and right.

Figure 5: Partial visualization of the Simplified Road XML schema.[Noyer, Ulf; Richter, Andreas and Scholz, Michael (2018) Generation of Highway Sections for Automated Test Case Creation. In: Proceedings of the DSC 2018 Europe VR, pages 171-174. Driving Simulation Conference Europe 2018 VR, 05.-07 Sep. 2018, Antibes, France. ISBN 978-2-85782-734-4 ISSN 0769-0266]

The root element of each Simplified Road document is the “track”-element. It can contain a “metaInformation” element, which can have attributes to describe the document with information of author, version and textual description. The track element further contains an unbounded sequence of section elements, which represent parts / sections of the described street. Each section can have children of the type “course”, “road”, “elevation”, “lateralProfile”, “signals” or “objects” to specify details of the specific section. The course element specifies the curvature of the street segment by using one of the mathematical functions “constant”, “linear” function, “arc”, “clothoid” or “cubic” function.

In the element “road” it is defined how many lanes the road section has including their width and other attributes for the “lane” and “road marking”. These lanes and road markings are specified separately for the “left” side, the “center” and the “right” side of the road. Similar to the “course”, the “elevation” and the “lateral profile” are defined with mathematical functions for road sections. Further details of these elements are hidden in Figure 5.

If the element “signals” is present, it contains street signs to be located along the road. The element “objects” contains more generic “vegetation” like trees and bushes or “infrastructure” along the road. Infrastructure can contain elements of the type guide post, railing, noise barrier and guard rail.

With these elements, it is currently possible to define one long street without junctions. With the focus on test scenarios, this limitation is perfectly admissible for the PEGASUS project scope. In the future, it should be possible to define highway entries and exits based on guidelines and a possibility of modification, too. From the perspective of the test scenario, it is just needed to provide the road currently under test.

There are several simplifications compared to OpenDRIVE, as less XML elements are needed to specify equal content. Furthermore, attributes have meaningful default values as often as possible. Besides of guide posts there are currently further supported infrastructure elements, such as noise barrier, railing and guard rail. In some tests, simplified road needed only between 7% and 21% of the lines required in OpenDRIVE to describe a specific road [Noyer, Ulf; Richter, Andreas and Scholz, Michael (2018) Generation of Highway Sections for Automated Test Case Creation. In: Proceedings of the DSC 2018 Europe VR, pages 171-174. Driving Simulation Conference Europe 2018 VR, 05.-07 Sep. 2018, Antibes, France. ISBN 978-2-85782-734-4 ISSN 0769-0266].
Input: Parameter file

Parameter files define variables, which are substituted in simplified road definitions. Based on these variables different variants of road tracks can be managed easily. Using a parameter file with parameters is optional.

Output: OpenDRIVE
The output file generated by the OpenDRIVE generator is the transformation of the simplified road input file with applied parameters of the parameter file. It contains the defined road track in the OpenDRIVE format. As OpenDRIVE is a commonly used industry standard, it can be further processed by all major simulation tools and environments.

Sensor models

Scope

Goals & Objectives
Here, the activities were focused on developing suitable Radar, Lidar- and camera sensor models, for the integration and usage within existing commercial (e.g. IPG CarMaker, Vires VTD, dSPACE ASM) and proprietary OEM simulation tool chains for driving function assessment. To minimize the development efforts for the sensor models and to maximize the reusability of development and validation efforts for the different sensor models, a standardized interface and packaging method is defined as another main goal.

Work Plan
With respect to sensor modeling, the main activities basically followed these steps:

Requirements Elicitation
General Simulation Architecture Definition
Interface Definition
Model Classification and Selection
Model Implementation
Model Validation
Model Tool Integration (commercial and OEM proprietary environments)

Requirements Elicitation
Based on this available expert knowledge, the primary sensor phenomena of radar, lidar and camera were identified and prioritized with respect to the expected driving function to be supplied with simulated sensor data. Furthermore, interface requirements were derived from the abovementioned effect lists and at this early stage first aligned with commercial and proprietary simulation tool chains. That and the already stated overall demand of standardization led to certain change request on the tool vendor side, which yielded to the Open-Simulation-Interface (OSI) standard. OSI is an open source project hosted on GitHub and maintained by the PEGASUS members. The definitions are aligned with the upcoming ISO23150 standard describing a generic sensor output interface containing object data and detections. By requesting generally target platform independence, no or only limited system constraints to the target environments should limit model usage across different platforms.

Design / Architecture
One of the major design decisions to be made, was the selection of the appropriate type of sensor simulation model. Basically, two kinds of models were considered:

Phenomenological models
(Quasi-) physical models

The first one is based on statistical behavior on object data as in- and output in most cases and therefore performs on a higher level of abstraction. The latter approach targets a more physically related implementation, which usually requests a so-called raw-data (e.g. radar detections, lidar point clouds, raw images) interface and therefore has more specific demands in terms of the data provided by the simulation environment. Each kind has its strengths and weaknesses in terms of e.g. performance, accuracy, and implementation effort. For the project scope, the phenomenological type was selected as the most appropriate choice regarding the desired functions for automated highway pilots.

A phenomenological model receives ideal objects from environment simulation and modifies them taking into account sensor specific phenomena. The output of the sensor model contains detected objects (including false positives (ghosts)) which have sensor specific properties (e.g. deviations in dimension, position, velocity and individual existence probabilities).
Another important decision to cover standardization and exchangeability of sensor models was made on the one hand by defining the Open-Simulation-Interface (OSI), which utilizes Google-Protocol-Buffer (aka ‘Protobuf’) protocol and on the other hand by selecting the Functional-Mockup-Interface (FMI) standard, which led in conjunction to the Open-Simulation-Model-Packaging (OSMP) approach.

The selection of simulation environments, which are dedicated to act as the master simulation controller, was basically done during project setup by choosing the following project partners and their commercial tool environments:

IPG (CarMaker)
Vires (VTD - Virtual Test Drive)
dSPACE (ASM) [Associated Project Partner]

Besides these OEM project partner proprietary environments were considered for compatibility with the abovementioned simulation model design. In summary, the design architecture is shown in Figure 7.

Figure 7: Simulation Architecture Overview

Implementation
Sensor model implementation was realized according the following workshare:

Bosch:
Radar Model
Camera Model
TU Darmstadt:
Lidar Model
A.D.C.:
Camera Model

Parameterization of those models were mainly derived from real sensors (e.g. range, field of view, dynamic and statistical behavior attributes). By providing FMI model packaging templates based on OSMP, implementation efforts could be lowered and reaching compliance according the selected standards was simplified.

Validation
In order to answer the question, how the sensor model validation in terms of ‘Is the correct model implemented?’ shall be designed, different strategies were discussed. In general, the validation process must consider a comparison with real sensor data – e.g. reference data. Of course, gathering reference data for the sake of comprehensive sensor model validation shall not lead to higher efforts compared to real world testing. On the other hand, comparison of real world and synthetic data will definitely suffer from non-perfect reference data taken during real measurements and used for generation of synthetic data for validation via replay-to-sim. Neither a data acquisition tool for real sensors nor any simulation environment will capture or provide all environmental attributes representing the reality. Since deviation between both is implicit, the question will be what deviation is tolerable and what kind of attributes are relevant and therefore impacting the model quality?
The overall goal of sensor model validation in the PEGASUS context is the ability to use a validated sensor model provided by a supplier in the validation of the automated driving functions at the OEM. Since there is no guaranteed commonality in the simulation environments between supplier and OEM, the validation needs to remain valid if the sensor model is used in a different simulation environment than the one it was validated against. To enable this, we used a two-stage validation approach:

Validation test suite for ground truth source as sensor model input:
Based on several representative sample scenarios, test cases are generated that compare the actual ground truth presented by the environment simulation to reference values to ensure that the ground truth lies within acceptable limits (based on the actual requirements of the sensor models in question).
Validation test suite for sensor models:
Based on representative KPIs and the representative sample scenarios, test cases are generated that compare the sensor output to reference bounds, ensuring that the sensor model gives results, which are characteristic and lie within the boundaries of the assured KPI ranges.

Based on the two-step approach, the sensor model can be considered valid, if it passes the sensor model validation test suite, and if it is connected to an environment simulation that passes the ground truth validation test suite.
It is important to state, that the validity of the sensor model is confined to the validation scenarios. For all other scenarios no prediction on the sensor model’s accuracy is permissible.

A concept and first implementation of the validation test suite for the sensor models, which covers the abovementioned, was developed and applied to sensor models (s. Figure 8).

Figure 8: Pre-generated Ground Truth Provisioning [PMSF]

Initiated and Ongoing Tasks
Building up the PEGASUS tool chain was supported by having some pre-existing artefacts on the one hand like the OpenDRIVE and OpenSCENARIO standards and on the other hand the experience from tool and sensor supplier side. At the end, the abovementioned OpenX standards plus the Open-Simulation-Interface were published by transferring ownership to ASAM e.V. There, further development by its members will take place and broaden usage on an international level. The ASAM is investigating a potential liaison with the ISO/TC 22/SC 33/WG 9 which desires to reference and refine the OpenX Standards bringing them to an official international level.

Abbreviations/References

ADC - Automotive Distance Control Systems GmbH: subsidiary of Continental AG;
FZD - Fachgebiet Fahrzeugtechnik TU Darmstadt www.fzd.tu-darmstadt.de
FMI - Functional Mockup Interface; fmi-standard.org
KPI – Key Performance Indicator;
OSI - Open Simulation Interface; github.com/OpenSimulationInterface
OSMP - Open Simulation Model Packaging; github.com/OpenSimulationInterface/osi-sensor-model-packaging
PROTOBUF - Google Protocol Buffer; developers.google.com/protocol-buffers

Proving Ground Tests

Scenario-based testing on proving grounds is one essential component of assessing highly automated driving functions. It serves to investigate the real behavior of these functions under defined and thus manageable real environment conditions. It generates supporting points used to verify simulation results. In order to carry out reproducible tests it is necessary to use state of the art tools. The demands on the measurement technology and on the communication between the test participants are very high. In the PEGASUS project a complete tool chain has been developed to meet the requirements for proving grounds.

The tool chain essentially consists of traffic simulation vehicles that automatically (e.g. with the help of driving robots) execute concrete scenarios in order to test the performance of the vehicle under test. In order to observe and coordinate the test sequence, a control station is part of the tool chain. To minimize dangers and achieve a high level of controllability, safety drivers are included in this tool chain. The safety for all directly and indirectly involved test participants plays an important role in this description.

The following figures and paragraphs will outline the technical implementation of the PEGASUS test tool chain for proving grounds. The tool chain consists of a static part containing the control station and the stationary RTK correction data transmitter as well as a dynamic part consisting of the vehicle under test (VUT), traffic simulation vehicle (TSV) and/or movable soft crash target (mSCT) involved in the experiment (see Figure 9). Non-critical scenarios are usually performed with TSVs. If a scenario is classified as critical, traversable elements can be used; the movable soft crash targets (mSCTs). These can be used if the hazard and risk assessment allows it and if there are no people at the test site or on the test track. At any time during the experiment, all dynamic entities (VUT, TSVs, mSCTs) must know the exact positions of all other participants. For this reason a robust and state of the art radio communication (usually WiFi mesh) is required. In order to achieve the highest possible positioning accuracy, all location-variant test participants must be equipped with an Inertial Measurement Unit (IMU), which receives RTK-correction data from a differential Global Positioning System (D-GPS) base station via radio transmission.

Figure 9: Overview Static/Dynamic parts of the toolchain and communication layers

The communication of the static and dynamic components can be described in three levels.

Green: A time-critical communication level between the vehicles involved in the test that is dependent on short latency times.
Blue: The monitoring level between the vehicles involved in the test and the control station.
Red: The RTK correction data level, the correction data for the inertial measurement technology used in the dynamic elements of the tool chain.

Traffic Simulation Vehicle

The traffic simulation vehicles (TSV) (see Figure 10) play an important role in the scenario-based testing approach. Due to the exact control and positioning of the TSV in the scenario, the VUT can always be tested in the same, reproducible situation. Only if the test is precise and repeatable, the pass and fail criteria can be safely evaluated for a reliable assessment. In addition to the IMU and WiFi units, the TSVs are equipped with a programmable longitudinal and lateral controller. This is necessary to ensure that repeated test cases yield comparable results. The vehicle control system used to control the internal actuators for longitudinal and lateral guidance can be configured by means of an interface. As implemented in PEGASUS, the vehicle guidance can be implemented by controlling the internal actuators or conventionally by means of driving robots for steering, brake and accelerator pedal.

The safety concept requires that a safety driver is physically present in the vehicle to intervene in case of a fault. He constantly activates a dead man's switch located directly on the steering wheel during the test case execution. When the driver releases the switch, the automated vehicle guidance is switched off immediately.

A tablet or laptop is used to constantly inform the safety driver about the current test status. To obtain the current position of the TSV, a high-precision IMU (location technology) or a comparable technology must be installed in the vehicle. The IMU determines the position of the TSV at any time using GPS data and D-GPS correction data. An external WiFi (-Mesh) antenna is used for communication with the other test participants and the control station. As an alternative to the D-GPS correction, correction data may also be received using. In order to ensure a permanent power supply of the measuring technology and the real-time computer, an independent energy provision must be ensured, which is separated from the electrical system. Furthermore, all vehicles involved in the test and the control station must be equipped with powerful synchronization software and hardware (real-time computer).

Figure 10: Overview of components in the TSV

Vehicle Under Test

The vehicle under test (VUT) (see Figure 11), occupies a special position as a dynamic, location-variant object that can also apply a non-deterministic driving strategy. Like the TSV, the VUT must be equipped with software and hardware for synchronization, since in certain scenarios it can reach indeterminable states that in turn triggers defined TSV maneuvers.
Since the VUT is only observed, the interface for controlling the internal actuators is omitted here. In order to carry out various scenarios, however, it is necessary that the exact position of the VUT is available to the TSVs. This is done analog to the TSVs by means of an IMU which in turn determines the position of the VUT at any time by means of D-GPS and dual GPS antennas. In order to transmit the information to the other test participants, a WiFi(-Mesh) is used. In order to synchronize all test vehicles with each other with low latency, the setup requires a powerful computer with corresponding software. The VUT must also be powered by a reliable power supply. A tablet/laptop is used by the safety driver to obtain information. The safety driver can always interrupt the test in critical situations and dissolve the test network by using the emergency stop.

Figure 11: Overview of components in the VUT

Master Control Station

The master control station (MCS) is the central unit for monitoring the performance of proving ground tests (see Figure 12). It is used by the test manager to coordinate all test participants and to check the correct test execution. For this purpose, the MSC is connected to the dynamic parts of the tool chain via the supervision communication level.
The control station can be setup stationary in a building or, as implemented in the PEGASUS tool chain, mobile in a vehicle. The location of the control station on the test site must be selected by the test supervisor in such a way that a good overview of the scenario execution area is ensured and access as well as exit routes to this area can be supervised. The supervisor obtains all information from the vehicles in real time by the monitoring software. Furthermore, the software offers the functionality to interrupt the test by means of a software emergency stop. If the location of the control station does not allow the test manager to have the required overview of the test site, video surveillance can be used instead.
In addition to the monitoring activity, the control station is also able to evaluate data and carry out simulations making use of a high-performance computer.

Figure 12: Overview of components in the MCS

Methodology Pre-Tests and Main-Test

After receiving the test case, a high-level description of the scenario to be driven is created with the help of expert knowledge. Using suitable software, this description is then converted into an automatically executable scenario. The safety drivers, who are part of the safety concept, carry out the first preliminary tests at reduced speed. After a few iterations and adjustments of automation parameters, the main test is performed several times. The generated measurement data are used to evaluate the VUT with regard to pass/fail criteria and the measured data are made available to the PEGASUS database. The first part of the tool chain, which is shown in Figure 13, shows the test preparation up to the main test with the generated data.

The expert knowledge includes know-how about test execution, test parameters, setting details for the measurement technology and, above all, the safety concept.

The following rules must be strictly adhered in order to guarantee safety during trial operation:

In all tests where longitudinal and/or lateral guidance of the test vehicles is performed by driving robots or controllers, safety drivers are present in all vehicles as the fallback level.
Drivers must be specially trained and have a recognized test driving license. On the day of the tests, all drivers must be informed separately about the risks of the respective scenarios and agree on reaction strategies for the case the scenario is terminated. The risk assessment carried out for each scenario serves as an aid.
If the risk assessment indicates an increased risk for safety drivers and/or other involved and uninvolved persons, the experimenter must reject the tests. The test supervisor's approval will also be refused or withdrawn if, during the implementation on the test site, further primary concerns (feedback from the safety drivers) or secondary concerns (unsuitable weather, coefficients of friction, co-users on the test site, suitability for the test, etc...) are raised.
Before each test day, the test leader has to check whether all safety drivers are in possession of a valid driving license of the required vehicle class and can present a corresponding proof (driving license). If a test driver cannot present a driving license due to a driving ban, but can prove possession of the driving license by means of other recognized documents, this can be accepted for the tests on the test site (in agreement with the test site operator).
At all times during the test, audio communication between the safety drivers in the TSVs, the VUT and the test supervisor in the control station must be ensured.
All scenarios are simulated in advance of the physical test case execution in order to allow a better assessment of potential collision risks. All maneuvers are also tested in advance at low speeds (< 30 km/h). The speed is increased, where the speed increments for each scenario are determined by the safety drivers in a joint dialogue at their discretion.
Even if a scenario has already been tested previously, at the beginning of a new test day a run is performed at low speed again. This is also at the discretion of the safety drivers.
Before a new scenario is tested, the data entered in the scenario description must be checked. The corresponding user interfaces of the control station environment or of the vehicles have to provide the possibility to perform the required checks. In particular, attention has to be paid to physical limits or all vehicles (required vs. possible longitudinal and lateral accelerations etc.). If there exist no implausible values, the scenario can be performed at low speeds.
Before starting the test, the hardware and software must be checked for function on each test day. Only if all components have been tested, secure data communication between the vehicles has been established, and the safety drivers have checked the communication between each other the tests can be started.
All vehicles incl. control station must be equipped with a first aid kit according to DIN 13157.
All drivers involved in the test, or at least 3 safety drivers and one control station operator, must attend a first aid course every two years (2 days or 16 training units).
The checklists must be completed on each test day (Table 1, Table 2, Table 3, and Table 4).

The safety drivers have further hardware and software functions at their disposal to increase safety, which are described in more detail in the next paragraph:
- Virtual fences
- Collision warning
- Previous Simulation of the Scenario
- Approaching target speeds
- Risk assessment of the scenario
- Evaluation of the test site incl. cooperative testing

Robot-controlled vehicles have both, hardware and software safety features. The primary responsibility for the safety of the vehicle and the surrounding vehicles/pedestrians lies with the safety driver. The built-in tools for additional safety are:

Hardware (VUT and TSV):
- Dead man's switch - releasing the activation switch positioned close to the steering wheel gives the safety drivers in the TSV full control and maximum safety. In the VUT, which is merely observed, an emergency stop is always within reach to abort the attempt and to disengage the entire system.
- Emergency stop button, also accessible from the front passenger seat.
- Spring-loaded safety brake, which is held by an electromechanical clutch, only for driverless testing (not used in the PEGASUS tool chain).
Software (VUT and TSV):
- Steering angle, speed and acceleration can be restricted for path tracking.
- Test abort if the control deviation of the individual axes (throttle, brake, steering) exceeds a set value.
- The accuracy of the IMU is monitored. If it falls below a threshold or if the connection is lost, the test is aborted.
- Path sequence error limits can be set. If the error reaches the set value, the test is aborted.
- Error limits for speed control can be set. If the error reaches the set value, the test is aborted.
- Virtual guide rails can be used to change the error limits for web guiding and speed control within defined segments of the path.
- bort parameters ensure that a predefined reaction occurs when a test is aborted (e.g. brakes, information to all vehicles).
- Critical sections can be configured to allow defined control of steering, brake and accelerator pedal during a test abort.

Communication and monitoring software (VUT, TSV and control station):
- Tests are performed with fully synchronized vehicles. They adjust their current speed and path based on the error of the reference vehicle (VUT or TSV).
- Tests can be aborted from any vehicle by pressing the emergency stop button.
- Each vehicle knows the position of the other vehicles.
- Relative parameters to other vehicles can be used to trigger a test abort.
- The setup data for synchronized vehicles are checked to ensure that they all use the same coordinate system and the same clock (GPS time).
- With each test protocol, a code can be defined to ensure that all vehicles are running the same scenario.
- The positions of all vehicles on the test site are displayed to the service user.
- The control station performs a collision check, which can stop vehicles if necessary.
- The control station can stop all vehicles.
- The control station sends a real-time watchdog to each vehicle. If sending is stopped or the signal is lost, then the vehicle stops.

Once the hazards of the concrete scenario have been assessed and the risk has been minimized, the actual main experiment is performed using the planned driving scenario with correct boundary conditions according to specification (OpenSCENARIO). In addition to applying the safety concept (see above) and performing the scenario training for safety drivers, some further technical aspects have to be taken into account by the test performers. The actual main test must be checked for validity immediately after the end of the test. At least three valid repetitions are required for the main test to be considered completed. The test is valid if the required tolerances from the OpenSCENARIO file are met. If they are not described precisely enough or cannot be implemented technically and physically (due to measurement uncertainties and error propagation), the following reference values can be used (see Table 5). These reference values are empirically determined in practical use of the test tool chain. Robust operation of the tool chain is possible if these values are adhered to.

Table 5: Valid range of physical quantities in the main experiment

In order to evaluate the validity, it is necessary to record the corresponding quantities. In addition, channels for later evaluation of the VUT and other variables must be recorded as informative channels (Table 6). It is recommended that all vehicle data of the participating vehicles are recorded on the same time axis (e.g. GPS time).

Table 6: measurement channels to be recorded in the main experiment

After the test has been carried out, the data are processed further. The data generated by the data recorder is converted to the PEGASUS file format and evaluated with regard to pass/fail criteria. Figure 14 provides an overview of the process.

Figure 14: Methodology post-processing of data

After the necessary activities have been carried out, the results are manually transferred to the PEGASUS database.

Real World Test Drive

Field test is a test platform to test the HAF-function in a real traffic environment with different environment conditions, but without having concrete test cases.

Test of the HAD-function in real world traffic (long term testing)

identification of specific individual situations within the framework of event-based simulation (Unknown behavior, New scenarios and/or new parameters, Pass-Criteria violated)
Identification of faults/impairments of the system because of environmental influences that cannot currently be simulated directly via models, as no suitable physical models are yet available.
Perform special “pass” assessments (e.g. risk in passing or following other vehicles)

The field test takes place on public streets using existing infrastructure (e.g. test field A9, geodetic surveyed routes within the Pegasus project). Where necessary, guidelines for route and time to create challenging situations (e.g. lots of traffic, tunnel or weather) have to be considered.

Input is: global guidance of conditions (e.g. o a route, weather conditions, time of day, week or month, special environment e.g. tunnels, special curvatures …), pass criteria, original vehicle as system under test.

The output is: evaluated real world test drive, measurement data as input for database (return flow to database). This means: measurement data in OEM-Format, measurement data in PEGASUS-format as input for the database, analyzing the data will be performed on database estimated results are for example: evaluated test drive data, new scenarios and/or new parameters for existing scenarios, input for Replay2Sim.

Figure 15: The holistic architecture of the Field Test

Figure 16: Examples for PASS/FAIL criteria for Field Test

15. Test Data

Test data container measurements contain generally the mandatory PEGASUS JSON Signals. Simulation test data and proving ground test data have their own data-description format depending of the applied measuring technology, simulation architecture or tools as well as on the test object bus architecture. In order to standardize the test data treatment, a general test data format was defined in the PEGASUS project. This test data format should have the following hard characteristics:

Be able to fully describe the tested scenario
Metrics can be applied to the test data

It also should have the following soft characteristic:

Easily be used as input for the PEGASUS method, especially for real ride tests

The hard characteristics fully match the requirements of the PEGASUS scenario database. Therefore, it was decided to use the PEGASUS JSON format description with the benefit that database scripts can easily be applied to the test data sets. The only difference between this container and the container of step 7 is the fact, that the ego describing signals are fulfilled by vehicle with activated driving function (closed-loop). Within the data container of step 7, it is possible that the ego signals are fulfilled by a vehicle without automated driving function, e.g. open-loop or manually driven.

In order to compare different test data sets from different test instances a converter needs to be implemented which transforms the measured signals and their format into the PEGASUS JSON format. This converter can be used in the simulation, proving ground and real world tests as long as the input signals are chosen carefully. It ensures also the comparability of test results between the different test instances and thus allows to verify the results among each other.
The test data from simulation, proving grounds, and real world drives can also be used as input for a second iteration of the PEGASUS method. Therefore, a converter should be implemented to converter the result data to the common input format of the database.

Due to the standardized interface the test data can be uploaded easily into the PEGASUS database including a flag signaling the originating test instance. This flag must be activated by uploading simulation or proving ground – data in order to not disturb the calculated distributions. Calculated database parameter distributions must reflect reality, simulation searches critical scenes and proving ground – data test critical scenes and would virtually increase the probability of critical situations. In contrary, real ride – tests should always be uploaded in the scenario database to improve the calculated parameter and scenario distributions.

16. Evaluation and Classification

The “test evaluation” is the last processing step in the sequence that defines the Testing Method. Within the processing step test result evaluation, the test data is categorized and evaluated with different purposes. The results are summarized and depending on the purpose possibly passed back to one test instance for another iterative assessment step. If the results fulfill certain criteria, the iterative assessment process breaks up and the results are passed to the next processing step, which is the overall risk assessment.

The iterative assessment process has multiple main purposes (figure 1):

iterative assessment within the simulation in order to find critical scenario subspaces
controlling PASS/FAIL criteria
- identifying the concrete scenarios (FAIL-criteria met) for which a (manual) assessment had to be made (performance as good as a human in the concrete scenario or not)
- identifying the concrete scenarios that met the PASS criteria
passing of concrete scenario test result to another test instance for cross-verification
controlling the “test end criterion”

Figure 1: the holistic architecture of the Testing Method for a logical scenario

First, there will be an evaluation of the single test results of each concrete scenario test either originating from the proving ground or the simulation. Within the iterative assessment process of the simulation (see step 12) there will exist groups of test results within a logical scenario space that are evaluated regarding the goal of finding critical scenario subspaces. If such a subspace cannot be described sufficiently, the results are passed back to the test automation including the stochastic variation for further assessment.

Another purpose of passing back test results is the cross-verification, e.g. if a concrete scenario test result of the simulation must be verified in a real but clinical testing environment.

For the test end criteria for a logical (or a concrete) scenario see step 12 in the chapter Test concept. The test end criteria for a set of logical scenarios is met when for each logical scenario its test end criteria is satisfied.

17. Test Results

The goal of the Testing Method (see Test concept) is the identification of safety critical parameter or parameter spaces in the ODD of the L3 system. So, the test results vary depending on the input of the Testing Method: concrete scenario (special scenario), a logical scenario or a set of logical scenarios.

For each concrete scenario:

Assessment "passed" or "failed", calculated by evaluating the test results with the corresponding criticality metric, to the concreted scenario.

For each logical scenario:

Assessments "passed" or "failed", calculated by evaluating the test results with the corresponding criticality metric, to each analysed concreted scenario (generated by the test automation procedure).

Description of the parameter subspace where a criticality metric reports a safety violation including an approximation of extension and probability.

18. Risk Assessment

PEGASUS Behavioral Safety Assessment (BSA) focuses on the assessment of the HAD-F in individual test cases. Thereby, for each individual test case different metrics are applied to confirm the HAD-Fs compliances with pre-defined behavioral criteria. In the specific PEGASUS context these pre-defined criteria are (a) keeping appropriate safety distances, (b) not causing collisions and (c), if possible, mitigating collisions, as they comply with the test concept described in step 12. During the BSA, for each of these criteria, it is evaluated if the HAD-F complied with it or not. Based on the result for each criterion a method is proposed to decide if a single test case is passed or failed.

Furthermore, the BSA discusses the necessary information needed to extrapolate the result of an individual test case to its semi-concrete scenario and its logical scenario as well as to the overall operational design domain, to derive more expressive results.

Multi-Stage Behavioral Safety Assessment
The first part of the BSA focuses on the assessment of individual test cases. Thereby, it is assumed that the test cases from different focus areas (e.g. crash analysis, automation risks, FOT-data, and simulation) are provided by D18 in a unified format, as described above. Figure 1 shows the different stages of the BSA. The first stage assesses if the HAD-F complies with the required safety distances based on metrics like time-to-collision or the RSS-Model (Shalev-Shwart,et.al) [S. Shalev-Shwartz, S. Shammah, and A. Shashua, “On a formal model of safe and scalable self-driving cars,” arXiv:1708.06374, 2017].

Figure 1: Multi-stage behavioral safety assessment of a single test-case.

If a safety distance is violated stage 1 is failed if not it is passed. Notice, that at this point it is not distinguished if the HAD-F causes the violation or another traffic participant. Stage 1 may be obsolete depending on the test concept, which is highlighted by the dashed line.

While stage 2 performs the actual collision check, stage 3 performs the causality assessment. Stage 3 is of great importance, since up to now it cannot be determined if the HAD-F or someone else caused possible fails in stage 1 or stage 2. Therefore, it is first evaluated if the situation would have been controllable for the HAD-F. In PEGASUS this controllability evaluation is limited to assessing the physical limits of driving for collision avoidance. If the HAD-F would not have been able to control the situation leading to the collision an additional examination is performed if the HAD-F caused the collision.

While stage 3 is important in the PEGASUS project it may be difficult to assess causality automatically for all relevant scenarios. While it may be possible to estimate the driving limits with a certain uncertainty, estimating who caused the accident may be challenging, since generic rules for all possible scenarios would have to be derived. However, the RSS-Model (Shalev-Shwart,et.al) [S. Shalev-Shwartz, S. Shammah, and A. Shashua, “On a formal model of safe and scalable self-driving cars,” arXiv:1708.06374, 2017] already contains a causality model, which covers a significant amount of situations and more progress in this field can be expected in the future. Until then the judgment of an expert group could be used.

Finally, stage 4 determines if the HAD-F appropriately mitigated the collision (e.g. by applying an appropriate braking force). Thereby, it is irrelevant if the HAD-F caused the collision or nor. This is the case because the PEGASUS project assumes that the HAD-F should try to mitigate a collision in any case, if it is physically possible.

After evaluating all stages it is determined if the test case is passed or failed. The approach taken in PEGASUS is depicted in Figure 2. The first row is an overall fail because the HAD-F does not keep the safety distance, causes a collision and did not mitigate appropriately. The second row shows a case where the HAD-F does not comply with the safety distances but has no collision. In this case, the overall result may be a PASS-, since even without a collision the HAD-Fs behavior is critical. The result in the third row differs depending on if the HAD-F caused the collision (stage 3: 0) or not (stage 3: 1), which clarifies the importance of the causality stage.

Figure 2: Example of overall test-case rating based on the 4 proposed stages. 0 and 1 are indicating if a stage is failed or passed, respectively.

Extrapolation of individual test-case results
A central question of the behavioral safety assessment is which conclusion can be drawn for the individual test result of Figure 2. The most fundamental conclusion is if a test case is passed or not and if not for what reason. This result can, for example, be used to compare two HAD-F releases, if the same arbitrarily selected test cases are used during the BSA.
More expressive conclusions can be drawn if additional information on the context of the test-case are available as shown in Table 1.

Table 1: Example for scaling the result of a single-test case to its logical scenario and to an average yearly driving distance. Bottom row: basic result. Other rows: specification for additional inputs required to derive more expressive results

One possibility is to select the test cases by performing an equidistant sampling of the parameter space of a logical scenario (see Table 1 second row from the bottom). In this case, the importance π_i^R of the test case is defined by the total number of selected test cases N_R as π i R=1/N_R . The result is the ratio by which the logical scenario under test is solved. Further, Table 1 shows possibilities to draw conclusion on the HAD-F behavior in an average yearly driving distance. However, while such extrapolations of the individual test cases results are theoretically possible the additionally required input parameters may only be available after extensive real world testing.

19. Contribution to Safety Statement

This step contains the results of the behavioral safety assessment as depicted in step 18 as well as information that allow identifying its logical scenario. The content of this data tone is the behavioral safety assessment of step 18 and data-file which includes the ID of the test-case and ID of its logical scenario, the result of the multiple stage of step 18 and if available the extrapolated results as described within table 1 of step 18.

20. Contribution to Safety Argumentation

PEGASUS Safety Argumentation is to be understood as a conceptual framework to support securing and approval of higher levels of automation through structure, formalization, coherence, integrity and relevance. It is structured by introducing five layers. Established formalizations are used wherever possible in order to describe each layers elements. Those elements are linked across the layers in order to form a coherent argumentation. It is also suggested to evaluate each elements integrity in order to establish a reliable Safety Argumentation. Assessing each elements relevancy proposes to be a useful measure as well. This framework is a proposal for the integration of research on the topic of technology acceptance, existing guidelines or standards to be taken into account when bringing HAD-F to market and the logical structure that builds a safety case. The central assumption of the PEGASUS Safety Argumentation is: if a chain of arguments, which was created taking into account the proposed framework of the PEGASUS Safety Argumentation, stands up to a critical examination, this will support securing and approval of higher levels of automation.

5 Questions, 5 Paradigms, 5 Layers

It is assumed that a safety argumentation for HAD-F needs to address more or less the following five questions:

How to ensure necessarily compliance with all relevant standards and directives needed for homologation?
Which elements should a safety argumentation contain sufficiently?
How to support a safety argumentation with evidence?
How to monitor a HAD-F over its entire life cycle?
What do people expect from highly automated mobility in the larger context?

The first question is addressed in large parts by established quality management processes. A product which complies with relevant guidelines or standards shall be presumed to comply with product safety requirements insofar as they are covered by those guidelines or standards.

The question of what design principles should be taken into account during the development of HAD-F is currently still differently answered. Here is room for standardization to develop a unified view of which design principles are relevant and how they are defined. Provided that those design principles have a positive effect on safety of a HAD-F, applying state of the art design principles seems to be a key prerequisite to achieve sufficient conditions for securing and approval of higher levels of automation. This can be seen as a feasible first step in operationalizing general objectives as recommended by the German ethical commission for example: if a generally positive risk balance in comparison to the human driving performance can be assumed, technically unavoidable residual risks do not stand in the way of a market launch (Ethics Commission Automated and Connected Driving appointed by the German Federal Minister of Transport and Digital Infrastructure, 2019).

Coming up with answers to question number three is one of the main drivers of the PEGASUS project as described in the previous sections of this document. The methods and tools developed within PEGASUS all aim to provide evidence in a safety argumentation. For safeguarding HAD-F it is crucial to link above mentioned design principles and the derived safety goals with results achieved by applying actions, methods, and tools like those developed by members of the PEGASUS project. Only if there is a logical connection between design principles and the results obtained by applying actions, methods and tools can use these results be considered as evident in the sense of a safety argumentation.

Question number four refers to the systematic monitoring of the market launch and throughout the product life cycle. It is assumed, that the product needs to be monitored during its life cycle with particular attention to deficits and unexpected behavior. As it is not possible to provide a statistically valid proof of positive risk balance for a HAD-F before market launch, particular attention should be paid here (Philipp Junietz, 2019). At this point there is a need for further clarification regarding a uniform approach.

The fifth question seeks to embed a safety argumentation in a broader context of technology acceptance. The assumption is that a safe product is necessary for acceptance of a particular technology but it is very likely that this is not the whole story. Answering this question is more than safeguarding HAD-F. Rather, it is about understanding people’s needs in relation to HAD-F and drawing the right conclusions for the development of highly automated mobility.
As a conceptional framework the PEGASUS safety argumentation wants to address the previous mentioned questions methodically: in a nutshell the PEGASUS safety argumentation proposes five layers for this upon which elements are located. Layer 0 embeds a safety argumentation in a broader context of technology acceptance. Within Layer 1, top level safety goals are defined which need to be achieved. Layer 2 links elements across the layers to build a safety case. Within layer 3, actions, methods and tools are developed and implemented in order to achieve results. Those results become evident (Layer 4) when they can be traced back to a safety goal within layer 1.

Therefore, the overarching goal is to create a relevant, coherent and thereby verifiable safety argumentation with integrity in a structured and formalized manner. Leading the activities here are five paradigms mentioned: structuring, formalization, coherence, integrity and relevance.

Structuring is meant here as organizing the PEGASUS safety argumentation into five layers. This structure attempts to bring more clarity into the continuing discussions regarding securing and approval of higher levels of automation. A differentiation must be made between two kinds of layers: layers outside the focus of PEGASUS, which can be placed within the proposed framework in order to put PEGASUS into a wider context (layer 0), and layers and their elements which directly contribute to answering the question of how to verify safety and reliability (layer 1-4).

Figure 1: 5 questions, 5 paradigms, 5 layers

Formalization within the PEGASUS safety argumentation means making the individual elements of a layer explicit by means of a defined, standardized and ideally established notation. The choice of the notation is dependent on the layer and the contained elements to be formalized.

Coherence is defined as the linking of individual elements within one of the postulated five layers of the PEGASUS safety argumentation as well as in between layers. Only once there is success in reasonably linking elements across layers, can one show how safety and reliability can be verified and how an argumentation can be made for socially accepted, highly automated mobility in the larger context.

Integrity in the sense of the PEGASUS safety argumentation is intended as a quality criterion for the individual elements of the five layers. They are oriented around the quality criteria of empirical research. Different levels of integrity are imaginable in principle for the different layers and their elements.

Relevance may also be seen as a quality criterion but despite integrity may not be applicable to all five layers. Relevance is especially linked to the elements of layer 2. It is intended to show the “maturity layer” of each element. For example, a peer reviewed and published method, which is applied for the purposes of generating evidence, is ranked higher on an ordinal scale than an unpublished method. Integrity does not necessarily have to correlate with relevance, however it can be expected that an element with high relevance also has a high integrity.

Layer 0 – Acceptance Model

Layer 0 embeds PEGASUS in a larger context. The specifics are not the focus of PEGASUS. The key element of this layer is a scientific model for describing the dependence of the individual or social acceptance for HAD-F from several factors. A key premise here is that individual or social acceptance cannot be explained with a single cause. This premise is in line with established models on individual technology acceptance such as the Technology Acceptance Model (TAM) (Davis, 1985), (Davis, 1989) the Theory of Planned Behavior (TPB) (Ajzen, 1991), and the Unified Theory of Acceptance and Use of Technology (UTAUT) (Venkatesh, et al., 2003). Depending on the model, different factors are postulated. Rahman et al. examined the utility of the models mentioned for describing the acceptance of Advanced Driver Assistance Systems (ADAS) and give an overview of studies which build upon the models mentioned (Rahmana, et al., 2017). Rahman et al. come to the conclusion that in principle all three models can be used in the context of acceptance of ADAS. However, further work is necessary to adapt the theories and models to the domain, to develop them further and to explain a greater proportion of the variance (cf.(Madigan, et al., 2017), (J.Haboucha, et al., 2017)).

Layer 1 – Top Level Safety Goals

Currently there are various suggestions to address the topic here: OEMs suppliers, associations, research institutes as well as national and international regulators and legislatures all have their perspectives which points need to be taken into account in order to fulfill the promise of enhanced road safety by introducing HAD-F. Depending on the author, those are called design principles, (priority) safety design elements, guidelines or (ethical) rules for example. They all share the pursuit of automated vehicle technology safety. Merging those approaches to establish a unified and complete view seems to be reasonable. To the author’s knowledge, this is the subject of ongoing discussions. Within the scope of PEGASUS safety argumentation, 13 top level safety goals directly derived from the report by the German Ethics Commission on Automated and Connected Driving (G_01) as well as the 12 priority design elements proposed by NHTSA (NHTSA, 2017) (G_02 – G_13) are used exemplarily.

It is argued here, that the fulfillment of G_02 to G_13 is a positive influence on the achievement of the more general requirement associated with G_01. For the scope of PEGASUS, G_05 “Process and procedure for testing and validation of HAD-F functionality with the prescribed ODD is documented and considered” is of particular interest as it is in line with the overall project goals.

Layer 2 – Logical Structure

The logical structure of the safety argument as defined by the PEGASUS safety argumentation is all about deriving the steps necessary to establish a verification for safety and reliability from higher level goals. The logical structure of the argumentation is made explicit in order to answer the following question: why is a certain result, which has been achieved, relevant for the verification of safety and reliability? In principle, various formalizations are applicable for this: an example is ISO 15026-2 (Anon., kein Datum) or the Goal Structuring Notation (GSN) (Kelly & Weaver, 2004) (Origin Consulting (York) Limited on behalf of the Contributors, 2011), which are also used in the example given later. Both approaches formulate action-guiding goals, which to a certain extent serve as a monitor of success in the argumentation chain. By means of a standardized graphical notation, safety goals are defined as elements of the second layer, plus strategies and solutions for achieving these safety goals. Further information regarding justifications, assumptions and context can also be consigned to these.

It is proposed to add a relevancy ranking to the individual elements of the logical structure of the safety argument as mentioned earlier.

Layer 3 – Methods and Tools

The leading question for layer 3 of the PEGASUS safety argumentation is: How can one prove that the layer 2 specific safety goals have been achieved? To do so one needs results at first hand. As part of the overall PEGASUS method, a continuous and flexible tool chain is being described in the previous chapters of this document. All this actions, methods and tools could be linked to specific safety goals. This will later be depicted as an example.

Layer 4 – Evidence

Results are generated from applying actions, methods and tools located in layer 3. The challenge is now to embed the results and evaluate their contribution to achieving a specific safety goal. Only through this step can a result become evident as intended by the PEGASUS safety argumentation. The leading question for layer 4 is therefore: Can a result be considered as evidence for achieving a specific safety goal? The key difference between the terms “result” and “evidence” in the sense of the PEGASUS safety argumentation is therefore how they can be assigned to specific safety goals.

The methods and tools described in layer 3 can be as widely differing as the specific safety goals that are to be achieved. This can then lead to heterogeneous results and to formats in which evidence can lie. A uniform formalization cannot therefore be reasonably proposed for all results from layer 4.

At this point we arrive at a key point of the PEGASUS safety argumentation – the critical testing of the argumentation chains. If the argumentation chains, which connect the individual elements of the layers, can withstand critical evaluation, an approval recommendation can be made.

Example

Following the above introduction of the PEGASUS safety argumentation, there now follows a brief demonstration of how it could be applied. This example is given in a graphic form and draws from proposed formalizations. It must be considered here that this example does not claim to show a complete or sufficiently detailed argumentation chain. Instead it is intended to illustrate how the layers – described in theory – could be applied in practice.

Layer 0 shows a model based on UTAUT but is slightly altered and depicted without moderating effects for the sake of simplicity. The ellipses used to visualize the latent variables should not be confused with elements used for GSN (Formalisation here draws from Structural Equation Modelling (SEM), an approach from multivariate statistics). Relevance could be rated B in principle for research on technology acceptance as there is a large body of research on this topic. Nevertheless, the particular model depicted here lacks validity (integrity) mainly for two reasons: it was altered by the author and it is designed for individual technology acceptance, not social acceptance.

In Layer 1 the Top Level Safety Goals G_01 – G_13 are addressed. Strategy S_01 reflects the argument, that G_02 – G_13 support achievement of overarching goal G_01. This example focuses on G_05.

Layer 2 mainly address the operationalization of G_05. S_02 aims to reflect the fourth leading question presented earlier: how to monitor a HAD-F over its entire life cycle? G_14 (early research and development), G_15 (product development), and G_16 (after market) aims to address processes and procedures for testing and validation of HAD-F functionality with the prescribed ODD. They differ in the phase of the product life cycle they address. Hereby it is expressed, that there is a need for different processes and procedures depending on the life cycle. This example focus on G_15: Process and procedure for testing and validation of HAD-F functionality with the prescribed ODD is documented and considered to bring HAD-F to market.

Strategy S_03 expresses the overall project goals of PEGASUS to come up with the solution for safeguarding HAD-F for market. At this point it becomes clear, where PEGASUS can support securing and approval of higher levels of automation.

The overall PEGASUS method is reflected by G_17 – G_20. They reflect the four main clusters of the PEGASUS method, namely Data Processing (G_17), Requirements Definition & Conversion for Database (G_18), Scenario Compilation and Database (G_19) and Assessment of Highly Automated Driving Function (incl. Human) (G_20). As stated earlier, methods developed within these clusters could be traced back to fulfillment of goal G_05 and therefore are evident for this particular safety goal.

The example focuses on G_20 in particular the composition of a safety statement as reflected by G_21. This was explained in detail in one of the previous sections. S_05 expresses the basic assumption of that approach to decompose the safety statement into four individual ratings (stages). Integrity could be rated high for Level 2 as the proposed approach offers traceability. Relevance could be rated C-B.

Applying this particular method as described in step 18 is located in layer 3. As the proposed method is not fully implemented as an algorithm yet, it lacks reproducibility and reliability which impacts integrity. Relevance could be rated C-B depending on the stage. Results obtained by applying this method can count as evident as they are linked to the overall safety goal and are located in layer 4.

The PEGASUS Method

Description of the PEGASUS-Method

0. Description of test object (Use Case) The Highway Chauffeur

Extended Application Scenario

1. Knowledge

Guidelines

Codes & Standards

Recommendation

2. Data

NDS / FOT / Test Drives

Driving Simulator Data

GIDAS and PCM data

3. Requirements Analysis

Method to define acceptance criteria

Quantitative risk assessment

4. Systematic Identification of Scenarios

Application of ontologies for scenario generation

Identification of automation risks

Definition of logical scenarios on layer 4

6. Process Guidelines + Metrics for HAD Assessment

7. Data in PEGASUS-format

8. Application of Metrics + Mapping to Logical Scenarios

Mapping to Logical Scenarios

Application of Metrics

9. Logical Scenarios & Parameter Space

Method to define acceptance criteria

10. Integration of Pass / Fail Criteria

12. Application of Test Concept incl. Variation Method

Test Concept

Variation Method

Test coverage

Finding worst cases / optimization algorithms

Local Search

KD-Opt gradient optimization on sub-spaces

Characterization algorithms

Fixed sampling with full cross products

Fixed sampling with partial cross products (size K)

Adaptive sampling with full/partial cross products

Random generation

Multi-stage mix of strategies

Characterization with quantification of probability of safety violations

Recommendations for a simulation-supported safety argumentation of logical scenarios

13. Test Cases

14. Test HAD-F: Simulation, Proving Ground, Real World Drive

Simulation

Proving Ground Tests

Real World Test Drive

15. Test Data

16. Evaluation and Classification

17. Test Results

18. Risk Assessment

19. Contribution to Safety Statement

20. Contribution to Safety Argumentation

5 Questions, 5 Paradigms, 5 Layers

Layer 0 – Acceptance Model

Layer 1 – Top Level Safety Goals

Layer 2 – Logical Structure

Layer 3 – Methods and Tools

Layer 4 – Evidence

Example

0. Description of test object (Use Case)
The Highway Chauffeur