Software Quality Metrics Overview.pdf

(168 KB) Pobierz
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
4
Software Quality Metrics Overview
Software metrics can be classified into three categories: product metrics, process
metrics, and project metrics. Product metrics describe the characteristics of the prod-
uct such as size, complexity, design features, performance, and quality level. Process
metrics can be used to improve software development and maintenance. Examples
include the effectiveness of defect removal during development, the pattern of testing
defect arrival, and the response time of the fix process. Project metrics describe the
project characteristics and execution. Examples include the number of software
developers, the staffing pattern over the life cycle of the software, cost, schedule, and
productivity. Some metrics belong to multiple categories. For example, the in-
process quality metrics of a project are both process metrics and project metrics.
Software quality metrics are a subset of software metrics that focus on the qual-
ity aspects of the product, process, and project. In general, software quality metrics
are more closely associated with process and product metrics than with project met-
rics. Nonetheless, the project parameters such as the number of developers and their
skill levels, the schedule, the size, and the organization structure certainly affect the
quality of the product. Software quality metrics can be divided further into end-prod-
uct quality metrics and in-process quality metrics. The essence of software quality
engineering is to investigate the relationships among in-process metrics, project
characteristics, and end-product quality, and, based on the findings, to engineer
improvements in both process and product quality. Moreover, we should view quality
from the entire software life-cycle perspective and, in this regard, we should include
85
86
Chapter 4: Software Quality Metrics Overview
metrics that measure the quality level of the maintenance process as another category
of software quality metrics. In this chapter we discuss several metrics in each of three
groups of software quality metrics: product quality, in-process quality, and mainte-
nance quality. In the last sections we also describe the key metrics used by several
major software developers and discuss software metrics data collection.
4.1 Product Quality Metrics
As discussed in Chapter 1, the de facto definition of software quality consists of two
levels: intrinsic product quality and customer satisfaction. The metrics we discuss
here cover both levels:
Mean time to failure
Defect density
Customer problems
Customer satisfaction.
Intrinsic product quality is usually measured by the number of “bugs” (func-
tional defects) in the software or by how long the software can run before encounter-
ing a “crash.” In operational definitions, the two metrics are defect density (rate) and
mean time to failure (MTTF). The MTTF metric is most often used with safety-
critical systems such as the airline traffic control systems, avionics, and weapons. For
instance, the U.S. government mandates that its air traffic control system cannot be
unavailable for more than three seconds per year. In civilian airliners, the probability
of certain catastrophic failures must be no worse than 10 −9 per hour (Littlewood and
Strigini, 1992). The defect density metric, in contrast, is used in many commercial
software systems.
The two metrics are correlated but are different enough to merit close attention.
First, one measures the time between failures, the other measures the defects relative
to the software size (lines of code, function points, etc.). Second, although it is diffi-
cult to separate defects and failures in actual measurements and data tracking, fail-
ures and defects (or faults) have different meanings. According to the IEEE/
American National Standards Institute (ANSI) standard (982.2):
An error is a human mistake that results in incorrect software.
The resulting fault is an accidental condition that causes a unit of the system to
fail to function as required.
A defect is an anomaly in a product.
A failure occurs when a functional unit of a software-related system can no
longer perform its required function or cannot perform it within specified
limits.
87
4.1 Product Quality Metrics
From these definitions, the difference between a fault and a defect is unclear. For
practical purposes, there is no difference between the two terms. Indeed, in many
development organizations the two terms are used synonymously. In this book we
also use the two terms interchangeably.
Simply put, when an error occurs during the development process, a fault or a
defect is injected in the software. In operational mode, failures are caused by faults or
defects, or failures are materializations of faults. Sometimes a fault causes more than
one failure situation and, on the other hand, some faults do not materialize until the
software has been executed for a long time with some particular scenarios.
Therefore, defect and failure do not have a one-to-one correspondence.
Third, the defects that cause higher failure rates are usually discovered and
removed early. The probability of failure associated with a latent defect is called its
size, or “bug size.” For special-purpose software systems such as the air traffic con-
trol systems or the space shuttle control systems, the operations profile and scenarios
are better defined and, therefore, the time to failure metric is appropriate. For gen-
eral-purpose computer systems or commercial-use software, for which there is no
typical user profile of the software, the MTTF metric is more difficult to implement
and may not be representative of all customers.
Fourth, gathering data about time between failures is very expensive. It requires
recording the occurrence time of each software failure. It is sometimes quite difficult
to record the time for all the failures observed during testing or operation. To be use-
ful, time between failures data also requires a high degree of accuracy. This is per-
haps the reason the MTTF metric is not widely used by commercial developers.
Finally, the defect rate metric (or the volume of defects) has another appeal to
commercial software development organizations. The defect rate of a product or the
expected number of defects over a certain time period is important for cost and
resource estimates of the maintenance phase of the software life cycle.
Regardless of their differences and similarities, MTTF and defect density are the
two key metrics for intrinsic product quality. Accordingly, there are two main types
of software reliability growth models—the time between failures models and the
defect count (defect rate) models. We discuss the two types of models and provide
several examples of each type in Chapter 8.
4.1.1 The Defect Density Metric
Although seemingly straightforward, comparing the defect rates of software prod-
ucts involves many issues. In this section we try to articulate the major points. To
define a rate, we first have to operationalize the numerator and the denominator, and
specify the time frame. As discussed in Chapter 3, the general concept of defect rate
is the number of defects over the opportunities for error (OFE) during a specific time
frame. We have just discussed the definitions of software defect and failure. Because
88
Chapter 4: Software Quality Metrics Overview
failures are defects materialized, we can use the number of unique causes of ob-
served failures to approximate the number of defects in the software. The denomina-
tor is the size of the software, usually expressed in thousand lines of code (KLOC) or
in the number of function points. In terms of time frames, various operational defin-
itions are used for the life of product (LOP), ranging from one year to many years
after the software product’s release to the general market. In our experience with
operating systems, usually more than 95% of the defects are found within four years
of the software’s release. For application software, most defects are normally found
within two years of its release.
Lines of Code
The lines of code (LOC) metric is anything but simple. The major problem comes
from the ambiguity of the operational definition, the actual counting. In the early
days of Assembler programming, in which one physical line was the same as one
instruction, the LOC definition was clear. With the availability of high-level lan-
guages the one-to-one correspondence broke down. Differences between physical
lines and instruction statements (or logical lines of code) and differences among lan-
guages contribute to the huge variations in counting LOCs. Even within the same
language, the methods and algorithms used by different counting tools can cause sig-
nificant differences in the final counts. Jones (1986) describes several variations:
Count only executable lines.
Count executable lines plus data definitions.
Count executable lines, data definitions, and comments.
Count executable lines, data definitions, comments, and job control language.
Count lines as physical lines on an input screen.
Count lines as terminated by logical delimiters.
To illustrate the variations in LOC count practices, let us look at a few examples
by authors of software metrics. In Boehm’s well-known book Software Engineering
Economics (1981), the LOC counting method counts lines as physical lines and
includes executable lines, data definitions, and comments. In Software Engineering
Metrics and Models by Conte et al. (1986), LOC is defined as follows:
A line of code is any line of program text that is not a comment or blank line,
regardless of the number of statements or fragments of statements on the line.
This specifically includes all lines containing program headers, declarations,
and executable and non-executable statements. (p. 35)
Thus their method is to count physical lines including prologues and data defin-
itions (declarations) but not comments. In Programming Productivity by Jones
89
4.1 Product Quality Metrics
(1986), the source instruction (or logical lines of code) method is used. The method
used by IBM Rochester is also to count source instructions including executable
lines and data definitions but excluding comments and program prologues.
The resultant differences in program size between counting physical lines and
counting instruction statements are difficult to assess. It is not even known which
method will result in a larger number. In some languages such as BASIC, PASCAL,
and C, several instruction statements can be entered on one physical line. On the
other hand, instruction statements and data declarations might span several physical
lines, especially when the programming style aims for easy maintenance, which is
not necessarily done by the original code owner. Languages that have a fixed column
format such as FORTRAN may have the physical-lines-to-source-instructions ratio
closest to one. According to Jones (1992), the difference between counts of physical
lines and counts including instruction statements can be as large as 500%; and the
average difference is about 200%, with logical statements outnumbering physical
lines. In contrast, for COBOL the difference is about 200% in the opposite direction,
with physical lines outnumbering instruction statements.
There are strengths and weaknesses of physical LOC and logical LOC (Jones,
2000). In general, logical statements are a somewhat more rational choice for quality
data. When any data on size of program products and their quality are presented, the
method for LOC counting should be described. At the minimum, in any publication
of quality when LOC data is involved, the author should state whether the LOC
counting method is based on physical LOC or logical LOC.
Furthermore, as discussed in Chapter 3, some companies may use the straight
LOC count (whatever LOC counting method is used) as the denominator for calcu-
lating defect rate, whereas others may use the normalized count (normalized to
Assembler-equivalent LOC based on some conversion ratios) for the denominator.
Therefore, industrywide standards should include the conversion ratios from high-
level language to Assembler. So far, very little research on this topic has been pub-
lished. The conversion ratios published by Jones (1986) are the most well known in
the industry. As more and more high-level languages become available for software
development, more research will be needed in this area.
When straight LOC count data is used, size and defect rate comparisons across
languages are often invalid. Extreme caution should be exercised when comparing
the defect rates of two products if the operational definitions (counting) of LOC,
defects, and time frame are not identical. Indeed, we do not recommend such com-
parisons. We recommend comparison against one’s own history for the sake of
measuring improvement over time.
Note: The LOC discussions in this section are in the context of defect rate calcu-
lation. For productivity studies, the problems with using LOC are more severe. A
basic problem is that the amount of LOC in a softare program is negatively correlated
with design efficiency. The purpose of software is to provide certain functionality for
Zgłoś jeśli naruszono regulamin