Metrics consist of measures of the length of the program, for example the number of executable statements, and the complexity of the program, for example McCabe's cyclometric complexity.
The tool comprises three distinct parts which are as follows:
Lexical analysis is run for all metrics except knots. The parse tree analysis is run only for the calculation knots.
Abbreviation Number of Executable Statements STMTS Number of lines of code (excluding comments) LINES Number of Comments CMNTS Number of Blank Lines (incl. Blank Comments) BLANK The Ratio of Comments to Statements CM/ST Number of unique operators (Halstead) ETA_1 Number of unique operands (Halstead) ETA_2 Number of operators (Halstead) N1 Number of operands (Halstead) N2 The vocabulary of the program (Halstead) VOCAB Program Length LENGTH Estimate of Program Difficulty DIFF'TY Measure of the Program Volume (Halstead) VOL Measure of the Complexity Level (Halstead) LEVEL Effort Required to Produce Program (Halstead) EFFORT Estimate of the Development Time (Halstead) DT Estimated Number of Bugs (Halstead) BUGS The McCabe Cyclometric Number McCabe Number of Decisions DEC'Ns Number of Binary Decisions BINARY Estimate of the Absolute Complexity A.C.
Halstead's measures which are described in a his book Elements of Software Science [2] were not based on any particular language and in recent years the measures have been calculated in various languages to see if they correlate. Due to this no general agreement has yet been reached to establish which tokens are operators and which are operands. This has lead to many different interpretations of his measures and thus it should be noted that the nag_metrics tool uses the rules laid out below.
Operators are defined as:
Operands are defined as:
As can be seen from the above definition, the declarative section of a program is completely ignored because it is considered not to add to the complexity of the algorithm due to it not being executable. Thus tokens such as PROGRAM, INTEGER, COMMON, INTRINSIC and DIMENSION are ignored by the tool. The definition of an operand (as in "INTEGER I,X" ) is not counted as a use of an operand.
Tokens such as REWIND or BACKSPACE are not included as operators or operands because they add little to the overall complexity of an algorithm.
The total number of decisions is defined as being the total number of computed and assigned GOTOs plus the number of DO loop plus the number of IF statements plus the number of ELSEIF statements.
The number of statements is defined by the number of executable statements, the number of comments as the number of lines with a "C" or a "c" in the first column. The number of blanks lines is defined as being the number of totally blank lines plus the number of blank comment lines (including lines only containing white space).
A very basic form of measuring the complexity of the code is to count the number of executable statements in the code. This measure may be used to identify large monolithic program units.
An alternative measure of the length of the program is the number of statement lines. This includes declarations but excludes comments and blank lines.
A good piece of software should be well documented throughout, therefore the better the program the more comments the code should contain.
As with the number of comments this measure may be used as a measure of the clarity and readability of the code in a piece of software. Good software should be well presented to ease reading and debugging.
The comment measure means little on its own but the ratio of comments to statements gives a good measure of the readability and clarity of the code.
To describe Halstead's measures we first need some definitions
The measure of a program's volume is calculated by Halstead in the following way
V = N * log(n)
where n = n2 + n1 is the program's vocabulary and N
is the length of the program (given by N2 + N1) in the code.
Note also that the log is to the base 2.
The measure of a program's complexity as defined by Halstead is given by
L = 2 * n2/(n1*N2)
This can give an overall complexity measure of the program.
Halstead defined this to be
E = V/L that is (volume/complexity level)
Development time (in arbitrary time units) can be calculated as follows:
T = E/S (where S is Strouds number (18))
Halstead defined this measure to be
B = E^(2/3)/3000
McCabe described a way of measuring a program's complexity by it logical structure. It is probably still one of the most widely used and popular of all the metrics available today. Its simplicity is due to the way that it is calculated, this can be done in one of two ways
v(G) = e - n + 2
where v(G) is the cyclometric complexity of the flow graph (G) for the program in question and e is the number of edges in G and n is the number of nodes. or
v(G) = DE + 1
where DE is the number of binary decisions made throughout the program.
In the TOTALS row of the table (if -global option is specified) the total McCabe's number is divided by the number of program units to give the average, rounded to the nearest integer.
A binary decision is defined as being an arithmetic IF in which two labels are
identical, a logical IF, a block IF with no following ELSEIF and an optional following
ELSE clause, a computed or assigned GOTO with only two possible outcomes or a DO loop.
The absolute complexity of a program is defined to be the number of binary decisions divided by the number of statements.
Today there are many ways of measuring the complexity of a program, the measures included in this tool are typical of those which have been developed over the past few years.
Software complexity is a way of identifying, classifying and measuring various features and characteristics of a piece of code which may when considered lead to changes in code which will ultimately lead to better code and lower the lifetime cost of the product.
The first major paper published on the subject was in 1976 by McCabe[1] with his description of a measure for the cyclometric complexity of a program, a very simple measure to calculate.
Shortly afterwards in 1977 a book published by Halstead [2] described ways of calculating new measures which analysed the structure of software in more detail and allowed the calculation of development time and costs. Measures which were described in the paper are still the basis for the Software Metrics we have today.
Since Halstead published his paper many others have developed his work and created new ways of measuring the complexity of software, however it wasn't until the mid 1980's that the use of Software metrics in the computing industry started. This can be said to be attributed to software houses finding that maintainance of existing software was taking up, in many cases, over 70% of their budget. With this in mind it was clear that if the complexity of programs could be dramaticly reduced, the price of maintainance could be cut. Software Metrics are an easy way of assessing the complexity of software as development takes place.
The measure of "Knots" in software has also been included in this tool which has its basis in a paper published in 1979 [3]. If a piece of code has arrowed lines indicating where every jump in the flow of control occurs, a knot is defined as where two such lines cross each other. The number of knots is proportional to the complexity of the control flow.
Many other papers have been written on the subject of metrics which can not all be recorded here, but it is worth note that a bibliography published in Software Engineering Notes Volume 12[4] has a very clear and concise list of the papers associated with this subject.
[1] - McCabe, T.J. A Complexity Measure, IEEE Transactions on Software Engineering, Vol. SE-2, 308-320(1976).
[2] - Halstead, M.H. Elements of Software Engineering, Elsevier, New York(1977).
[3] - Woodward, M.R. et al. A Measure of Control Flow Complexity in Program Text, IEEE Transactions in Software Engineering, Vol. SE-5, 45-48(1979).
[4] - Waguespack, L.J. Badlani, S. Software Complexity Assessment: An Introduction and Annotated Bibliography, Software Engineering Notes, Vol. 12 no.4 pp.52-70(1987).
nag_metrics recognises the ANSI standard Fortran 77 intrinsic functions, the US Military Standard intrinsic functions, and the double complex intrinsics defined in nag_Fortran77.
nag_metrics may produce inaccurate results if the total program is not supplied. No warning will be given.
nag_metrics works on a garbage in garbage out basis if the parsing stage is not executed. It is advised that the program is first run though nag_pfort before executing the metrics tool to ensure that the code is legitimate Fortran 77.
Copyright, Numerical Algorithms Group, Oxford, 1991-2001