Software Quality Metrics Overview.pdf

Software Quality Metrics Overview

Software metrics can be classified into three categories: product metrics, process

metrics, and project metrics. Product metrics describe the characteristics of the prod-

uct such as size, complexity, design features, performance, and quality level. Process

metrics can be used to improve software development and maintenance. Examples

include the effectiveness of defect removal during development, the pattern of testing

defect arrival, and the response time of the fix process. Project metrics describe the

project characteristics and execution. Examples include the number of software

developers, the staffing pattern over the life cycle of the software, cost, schedule, and

productivity. Some metrics belong to multiple categories. For example, the in-

process quality metrics of a project are both process metrics and project metrics.

Software quality metrics are a subset of software metrics that focus on the qual-

ity aspects of the product, process, and project. In general, software quality metrics

are more closely associated with process and product metrics than with project met-

rics. Nonetheless, the project parameters such as the number of developers and their

skill levels, the schedule, the size, and the organization structure certainly affect the

quality of the product. Software quality metrics can be divided further into end-prod-

uct quality metrics and in-process quality metrics. The essence of software quality

engineering is to investigate the relationships among in-process metrics, project

characteristics, and end-product quality, and, based on the findings, to engineer

improvements in both process and product quality. Moreover, we should view quality

from the entire software life-cycle perspective and, in this regard, we should include

Chapter 4: Software Quality Metrics Overview

metrics that measure the quality level of the maintenance process as another category

of software quality metrics. In this chapter we discuss several metrics in each of three

groups of software quality metrics: product quality, in-process quality, and mainte-

nance quality. In the last sections we also describe the key metrics used by several

major software developers and discuss software metrics data collection.

4.1 Product Quality Metrics

As discussed in Chapter 1, the de facto definition of software quality consists of two

levels: intrinsic product quality and customer satisfaction. The metrics we discuss

here cover both levels:

Mean time to failure

Defect density

Customer problems

Customer satisfaction.

Intrinsic product quality is usually measured by the number of “bugs” (func-

tional defects) in the software or by how long the software can run before encounter-

ing a “crash.” In operational definitions, the two metrics are defect density (rate) and

mean time to failure (MTTF). The MTTF metric is most often used with safety-

critical systems such as the airline traffic control systems, avionics, and weapons. For

instance, the U.S. government mandates that its air traffic control system cannot be

unavailable for more than three seconds per year. In civilian airliners, the probability

of certain catastrophic failures must be no worse than 10 −9 per hour (Littlewood and

Strigini, 1992). The defect density metric, in contrast, is used in many commercial

software systems.

The two metrics are correlated but are different enough to merit close attention.

First, one measures the time between failures, the other measures the defects relative

to the software size (lines of code, function points, etc.). Second, although it is diffi-

cult to separate defects and failures in actual measurements and data tracking, fail-

ures and defects (or faults) have different meanings. According to the IEEE/

American National Standards Institute (ANSI) standard (982.2):

An error is a human mistake that results in incorrect software.

The resulting fault is an accidental condition that causes a unit of the system to

fail to function as required.

A defect is an anomaly in a product.

A failure occurs when a functional unit of a software-related system can no

longer perform its required function or cannot perform it within specified

limits.

4.1 Product Quality Metrics

From these definitions, the difference between a fault and a defect is unclear. For

practical purposes, there is no difference between the two terms. Indeed, in many

development organizations the two terms are used synonymously. In this book we

also use the two terms interchangeably.

Simply put, when an error occurs during the development process, a fault or a

defect is injected in the software. In operational mode, failures are caused by faults or

defects, or failures are materializations of faults. Sometimes a fault causes more than

one failure situation and, on the other hand, some faults do not materialize until the

software has been executed for a long time with some particular scenarios.

Therefore, defect and failure do not have a one-to-one correspondence.

Third, the defects that cause higher failure rates are usually discovered and

removed early. The probability of failure associated with a latent defect is called its

size, or “bug size.” For special-purpose software systems such as the air traffic con-

trol systems or the space shuttle control systems, the operations profile and scenarios

are better defined and, therefore, the time to failure metric is appropriate. For gen-

eral-purpose computer systems or commercial-use software, for which there is no

typical user profile of the software, the MTTF metric is more difficult to implement

and may not be representative of all customers.

Fourth, gathering data about time between failures is very expensive. It requires

recording the occurrence time of each software failure. It is sometimes quite difficult

to record the time for all the failures observed during testing or operation. To be use-

ful, time between failures data also requires a high degree of accuracy. This is per-

haps the reason the MTTF metric is not widely used by commercial developers.

Finally, the defect rate metric (or the volume of defects) has another appeal to

commercial software development organizations. The defect rate of a product or the

expected number of defects over a certain time period is important for cost and

resource estimates of the maintenance phase of the software life cycle.

Regardless of their differences and similarities, MTTF and defect density are the

two key metrics for intrinsic product quality. Accordingly, there are two main types

of software reliability growth models—the time between failures models and the

defect count (defect rate) models. We discuss the two types of models and provide

several examples of each type in Chapter 8.

4.1.1 The Defect Density Metric

Although seemingly straightforward, comparing the defect rates of software prod-

ucts involves many issues. In this section we try to articulate the major points. To

define a rate, we first have to operationalize the numerator and the denominator, and

specify the time frame. As discussed in Chapter 3, the general concept of defect rate

is the number of defects over the opportunities for error (OFE) during a specific time

frame. We have just discussed the definitions of software defect and failure. Because

Chapter 4: Software Quality Metrics Overview

failures are defects materialized, we can use the number of unique causes of ob-

served failures to approximate the number of defects in the software. The denomina-

tor is the size of the software, usually expressed in thousand lines of code (KLOC) or

in the number of function points. In terms of time frames, various operational defin-

itions are used for the life of product (LOP), ranging from one year to many years

after the software product’s release to the general market. In our experience with

operating systems, usually more than 95% of the defects are found within four years

of the software’s release. For application software, most defects are normally found

within two years of its release.

Lines of Code

The lines of code (LOC) metric is anything but simple. The major problem comes

from the ambiguity of the operational definition, the actual counting. In the early

days of Assembler programming, in which one physical line was the same as one

instruction, the LOC definition was clear. With the availability of high-level lan-

guages the one-to-one correspondence broke down. Differences between physical

lines and instruction statements (or logical lines of code) and differences among lan-

guages contribute to the huge variations in counting LOCs. Even within the same

language, the methods and algorithms used by different counting tools can cause sig-

nificant differences in the final counts. Jones (1986) describes several variations:

Count only executable lines.

Count executable lines plus data definitions.

Count executable lines, data definitions, and comments.

Count executable lines, data definitions, comments, and job control language.

Count lines as physical lines on an input screen.

Count lines as terminated by logical delimiters.

To illustrate the variations in LOC count practices, let us look at a few examples

by authors of software metrics. In Boehm’s well-known book Software Engineering

Economics (1981), the LOC counting method counts lines as physical lines and

includes executable lines, data definitions, and comments. In Software Engineering

Metrics and Models by Conte et al. (1986), LOC is defined as follows:

A line of code is any line of program text that is not a comment or blank line,

regardless of the number of statements or fragments of statements on the line.

This specifically includes all lines containing program headers, declarations,

and executable and non-executable statements. (p. 35)

Thus their method is to count physical lines including prologues and data defin-

itions (declarations) but not comments. In Programming Productivity by Jones

4.1 Product Quality Metrics

(1986), the source instruction (or logical lines of code) method is used. The method

used by IBM Rochester is also to count source instructions including executable

lines and data definitions but excluding comments and program prologues.

The resultant differences in program size between counting physical lines and

counting instruction statements are difficult to assess. It is not even known which

method will result in a larger number. In some languages such as BASIC, PASCAL,

and C, several instruction statements can be entered on one physical line. On the

other hand, instruction statements and data declarations might span several physical

lines, especially when the programming style aims for easy maintenance, which is

not necessarily done by the original code owner. Languages that have a fixed column

format such as FORTRAN may have the physical-lines-to-source-instructions ratio

closest to one. According to Jones (1992), the difference between counts of physical

lines and counts including instruction statements can be as large as 500%; and the

average difference is about 200%, with logical statements outnumbering physical

lines. In contrast, for COBOL the difference is about 200% in the opposite direction,

with physical lines outnumbering instruction statements.

There are strengths and weaknesses of physical LOC and logical LOC (Jones,

2000). In general, logical statements are a somewhat more rational choice for quality

data. When any data on size of program products and their quality are presented, the

method for LOC counting should be described. At the minimum, in any publication

of quality when LOC data is involved, the author should state whether the LOC

counting method is based on physical LOC or logical LOC.

Furthermore, as discussed in Chapter 3, some companies may use the straight

LOC count (whatever LOC counting method is used) as the denominator for calcu-

lating defect rate, whereas others may use the normalized count (normalized to

Assembler-equivalent LOC based on some conversion ratios) for the denominator.

Therefore, industrywide standards should include the conversion ratios from high-

level language to Assembler. So far, very little research on this topic has been pub-

lished. The conversion ratios published by Jones (1986) are the most well known in

the industry. As more and more high-level languages become available for software

development, more research will be needed in this area.

When straight LOC count data is used, size and defect rate comparisons across

languages are often invalid. Extreme caution should be exercised when comparing

the defect rates of two products if the operational definitions (counting) of LOC,

defects, and time frame are not identical. Indeed, we do not recommend such com-

parisons. We recommend comparison against one’s own history for the sake of

measuring improvement over time.

Note: The LOC discussions in this section are in the context of defect rate calcu-

lation. For productivity studies, the problems with using LOC are more severe. A

basic problem is that the amount of LOC in a softare program is negatively correlated

with design efficiency. The purpose of software is to provide certain functionality for

Plik z chomika:

Inne pliki z tego folderu:

Inne foldery tego chomika: