INAUGURAL LECTURE 1988
University of Sheffield.
CAN WE BE SAVED FROM COMPUTERS ?
Professor W M L Holcombe BSc, MSc, PhD, MBCS, CEng.
This title provides a theme which enables me to explore a number of important issues facing the computer industry and society in general.
$1 Systems, systems everywhere!
Computers are appearing in all sorts of places and in the modern world it seems to be impossible for us to survive without them. Everyone has contact with them, whether it is simply as the recipient of computer printouts such as electricity bills and wage notifications, or in a more direct relationship as a computer user. The advance of the almost ubiquitous PC into business, industry, education and even the home, has brought with it a great increase in the potential applications of the technology in nearly all walks of life. We are becoming ever more demanding and ambitious in the ways in which we use computers and the basic assumption that they must necessarily be a 'good thing' is widespread. In fact it is now hard to imagine how we have survived without them in many areas for so long.
However we often embrace technology without really thinking about the wider implications and this can lead us into fairly disastrous situations. The philosophy of the total system approach as eloquently promoted by my predecessor Doug Lewin [Lewin 1982] is of fundamental importance but it is notoriously perplexing to put into practise, especially in an era of more and more technical specialism which makes it increasingly difficult for professionals from different backgrounds to communicate effectively.
The question 'Is the use of a computer to carry out this task a good idea ? is often answered from too narrow a perspective and the consequences of the wrong answer can be serious indeed.
$2 The individual and the machine.
In this section we will examine some of the relationships that exist between individuals and the computers that impinge in an explicit way on their lives.
A decade or so ago the press would occasionally report the case of some poor pensioner who had received an electricity bill of a million pounds or so. In very few cases was it found that the entire street's lighting was connected to his meter, usually it was attributed to some computer error or other. The nature of the error was rarely made public but it could have been an error made by the person entering data into the machine, possibly a hardware failure of some sort or, more mysteriously, a software error. I use the phrase 'more mysteriously' advisedly for it was often as much a mystery to the programmers who were in charge of the development and maintenance of the software as it was to the general public.
Although these spectacular errors are now less newsworthy they still happen - recall a few months ago the embarrassment of a major bank that had to admit that many millions of pounds had been mistakenly paid into a number of peoples' accounts by their computer system.
The unfortunate poll tax payers of St. Albans are faced with a poll tax next year which will be £11 more than it should be because of a mistake in the software being used to estimate the number of poll tax payers in the city. The software over-estimated the number of payers by 3,000 and so the city collected less than it should have this year and must make it up next year.
One might enquire as to how many similar happenings have been successfully 'hushed up'?
The problems of large data processing systems and their reliability have been the subject of much thought for a number of years. Most large organisations have developed their systems over a number of decades and the amount of effort that they represent can be costed into many millions of pounds. They are mostly written in Cobol a language developed for this type of application. By the very nature of the language it is often difficult to understand what the programs actually do in a precise way. It is likely that the overall behaviour of the program under 'normal' operating conditions is well understood, what is more problematical is the behaviour of the system under unusual but not impossible conditions. Much of the early code is not documented very well, the original developers have long since gone, and a mystery surrounds many of the detailed features of the code. By the very nature of the dynamics of business and organisations the original requirements of the system will also have changed. There will therefore have been a need to change the software and this will involve replacing parts of the program with new parts, adding extra bits and generally messing around in a rather haphazard way. On route the programmers may well have tried to correct some of the obvious errors in the original code but will have also, unwittingly, introduced new ones. It has been estimated that for every error corrected in a piece of software between 2 and 3 new ones are introduced in the process. Since they will have left little evidence of what they have done, apart from their new code, anyone coming along later will not be in a position to understand the system fully.
Things are so bad that the problems of the maintenance of large systems are likely to be one of the dominant obsessions of the future. In fact, one commentator has estimated that by the year 2000 one quarter of all the school leavers in the USA will have to become Cobol maintainers if that country's businesses are to be able to operate adequately. This may be an exaggeration, let us hope so because the prospects of doing this type of work are not likely to prove attractive to many!
Before we leave the world of data processing we might like to consider the problem of time, or more precisely, of date. Many software systems in use in banking and other businesses involve a continuous representation of time. At midnight every day the date is moved on by one. Unfortunately in many systems the programmers have left us with a less than adequate mechanism for this. A few years ago another major bank had a problem caused by the existence of leap years. Two banks attempted to interconnect their automatic bank machine systems, however one bank's programmers were 'aware' of leap years whereas the other's were not. When the next leap year arrived the whole system collapsed at midnight on the 28 th of February.
A similar problem may arise at the turn of the century. Many systems represent the year as two digits only and it is highly likely that the system does not realise that 99 is followed by 00. This means that at the end of 1999 very unpredictable things may happen especially as nowadays many different systems need to talk to each other across the world. If thay cannot agree on the date they may not 'know' what to do. We can expect a certain amount of chaos to ensue. My advice is to go along to your bank or building society on December
31 st. 1999 and withdraw all your money - presumably in ECU's, either hard or soft, and stuff it under your mattress. Of course, if everyone takes my advice it may result in the collapse of the capitalist system, should we then blame computers for that?
So far we have discussed fairly routine types of systems, those that basically just 'crunch numbers' or 'shuffle data around' albeit on a large scale. They do not seem to involve much in the way of intelligence. Of course decisions are often made on the basis of the data that these systems produce but their activity is less geared to automated decision making than the carrying out of lower level activities. The increasing role of more 'expert' systems is, however, affecting the 'man in the street' in a number of, possibly, unfortunate ways.
Recall the recent great share crash of 1987. Computerised share dealing was blamed for the spiralling collapse of share prices and the subsequent bankruptcies. Here the programmers had devised rules that told the computers whether to buy or sell and the mechanistic application of which turned a minor perturbation into a catastrophe. Many people learnt lessons from this experience but equally many either didn't or have since forgotten. So-called expert systems are now to be found in many sensitive areas where the results of the machine's deliberations can affect people's lives and livelihoods in fundamental ways. Is the technology sufficiently well understood to give us some reassurance that everything is well? Even if we could be sure about the software and the hardware behaving properly we are still treading in very unknown territory when it comes to the elucidation of the appropriate knowledge - the facts and the rules - upon which expert systems depend. The problem of maintaining and updating these knowledge bases is a serious one and there are many concerns about the possibility of introducing hidden conflicts during the course of this maintenance. What if faulty advice from the expert system is used to try and help people recover safely from some industrial accident?
$3 The organisation and the machine.
If the benefits of machines to the average citizen are less than explicit the benefits to organisations are much more apparent. Any company that fails to embrace IT in an enthusiastic manner must contemplate rapid extinction. Or is this a gross simplification ? We must look more carefully at the benefits that computers - which lie at the heart of IT - may bring, and take great pains to ensure that enthusiasm for technology, marketing hype and unrealistic expectations don't destroy what might be achieved if more sensible counsel prevailed. It is often very difficult to quantify precisely the benefits that computers bring to organisations and to establish whether the costs involved are worthwhile.
The major problems arising from the introduction of computerised operations within companies seem to concern the purchase of unsuitable systems, the failure to organise suitable training programmes, the lack of support for the process from important sectors of the company - especially senior management and the failure to adapt and evolve the company culture and procedures in the light of the new technology.
Failure to purchase suitable systems can be blamed on a number of factors, lack of understanding by the purchaser of what the systems can and cannot do, failure on the part of the supplier to deliver an appropriate high quality system and the failure to recognise the important role software and liveware plays in the system. On the last point there are many places where the main consideration is the hardware and lack of foresight concerning the sort of software that is available has resulted in systems being installed that cannot carry out the intended task through lack of suitable available software. Often the hardware costs consume the major part of the budget and little is left for software purchases and training costs.
The record of software houses in delivering suitable systems is also a matter for serious concern. The construction of a bespoke software system is a lengthy and costly process. In some surveys of particular market sectors over a representative period the number of cases where no system was delivered ran as high as 48%, with a further 47% of systems delivered but unusable, and a tiny minority (3%) of systems capable of being used as ordered with some modification and only 2% used as delivered. [Jackson]
On the other hand it is notoriously difficult for software houses to ascertain the client's requirements and even if they are clear and agreed at the start of the process they may change substantially with the passage of time before delivery. Since these systems may be massively complex and involve many thousand man-years of effort we need to be able to control the design process much more effectively and to make maximum use of existing components as possible. The recent developments in formal methods and object orientated design are partly driven by these considerations.
Systems analysis, among other things, has developed as a mechanism for analysing an organisation's activities and identifying the role or roles that computer systems may play in furthering the objectives of the organisation. Mistakes made here can prove to be very costly. Last week a regional health authority admitted purchasing a large IBM mainframe computer system nearly 2 years ago for £3000000. The system is still lying in a basement unused and not even unpacked. It has lost two thirds of its value. The original aim of the machine was to provide a central resource and infrastructure for the associated district health authorities. Unfortunately the consultants the districts employed came to a radically different solution to the region's and clearly not enough people talked to each other at a critical time to prevent this shocking waste of money. The tale is, unfortunately, not an isolated one.
Even if suitable systems are identified, installed and are working properly the success of the investment cannot be guaranteed unless attitudes within the company adapt to the new circumstances. There is a dilemma here. Should the suppliers ensure that the system is tailor made to fit the purchasing organisation, by ensuring that the systems reproduce the way the company operates and causes the least disturbance to the existing management processes, or should the company be expected to adapt to the 'correct' way of doing things as perceived by the system's designer. The former way is prohibitively expensive initially and the latter can be expensive in the long run if the systems cannot be used effectively. The best solution is probably to recognise that IT will change the way an organisation functions, it can enable things if positive attitudes are struck. The designers have a responsibility for addressing all aspects of the clients needs and the organisational culture, and the clients must accept that adaptation of their organisation in a planned and sensible way will take full advantage of their investment in both machines and people.
Can organisations be saved from computers? Here the answer is - it is up to the organisations, if they try to impose their culture onto the system or allow the system to dominate their culture then there will be problems. If, however, the organisation adapts and evolves as a whole to the computer and its solutions then the future will probably be rosy.
$4 "There is one safe thing for the vanquished ; not to hope for safety".
Commercial data processing systems are one thing but there are more and more examples of computers being used to control industrial and other dynamic processes.
Some examples, a few of which have hit the headlines lately, are
'fly by wire' airliners eg European A320 airbus,
nuclear power stations eg Sizewell B,
car management systems,
automatic washing machines,
chemical process plants,
patient monitoring systems,
In all these there is a microprocessor or microprocessors controlling mechanical and electrical devices integrated into the particular system. Increasingly these systems involve quite complex software and in many cases this presents us with difficult design problems.
A number of incidents have been reported which indicate that design faults in both the hardware and software systems are responsible for the overall system acting in an unsafe way.
Over the last couple of years about 200 people have been killed or seriously maimed by industrial robots going out of control in Europe.
Computer control systems have been partly implicated in nuclear accidents.
Software used in car control systems has been seriously flawed.
Aircraft control systems have failed causing crashes.
Domestic appliances have suffered from software errors which have resulted in accidents for example washing machines blowing up.
Satellites have gone out of control because of minute errors in the control software.
The list could go on and although it is hard to establish the number of people who have been killed because of computer design errors of one type or another it is clearly a serious situation. There seems to be no limit to the applications of computers in this area and what is worrying is that there are no methods available that can engineer safe and reliable systems to the standards required.
Let us briefly contrast the situation in, say, software engineering compared to more traditional forms of engineering. In the latter the relationship between failures and common design parameters is more or lesscontinuous for much of the time. If we gradually alter some important design feature, for example, the thickness of a strut in some structure, the behaviour of the component will also gradually change up to some fairly well understood threshold area where catastrophes, such as fracture, become likely. If you change a piece of software by a small amount, however, the behaviour can be affected drastically. The replacing of a comma by a full stop in a satellite control program caused a devastating failure and total loss of the mission in one example. This non-linearity of software under perturbations lies at the heart of the problem of building safe computer systems and identifying potential hazards. The existing techniques are not sufficient and so we must examine carefully what we are doing in this arena.
The arena is called safety-critical computing and has recently caught the attention of a number of people including the government and some sections of the media.
The European A320 airbus is a good example of what is going on. In order to challenge the market lead of Boeing a decision was taken to go for a technological leap forward and to design the airliner so that its flight was completely under the control of a computer system. The pilot has a much reduced role and the major decisions are taken by the control software. This then enabled the designers to use much thinner wings and a lighter fuselage because the computers could control large wing flaps and control surfaces which ensures that the plane is aerodynamically stable. This would have great benefits in fuel economy and operating costs thus securing a market advantage. To overcome the reliability problems associated with so much responsibility being placed on a computer system the usual fault-tolerant approach was taken whereby a number, in this case five, computer systems operate in parallel and a voting arrangement ensures that even if two computers produce wrong answers the majority will succeed in behaving correctly. The computers are also much more gentle with the airliner than human pilots are and cause much less stress to the structure. This fact has enabled the designers to economise further and to lighten the structure to an extent that if the pilot could take over in the event of an emergency the plane would fall apart anyway. You will be aware that the A320 has been involved in two fatal crashes recently. The reasons for the latest, in India, have not yet been made public. The French accident was blamed on pilot error although the investigating authority is not known to have any expertise in the analysis of software reliability and many questions remain unanswered. Perhaps the commercial implications of serious design flaws in the computer system destroying public confidence in the plane are too great.
The situation in the Sizewell B project is another indication of technology out of control. Here computers will be in complete control with very little opportunity for manual intervention in the event of a disaster. Although the regulatory framework is quite extensive we are still very much in uncharted waters with these systems. The 'Star Wars' project is another completely insane example, no one can guarantee that this will be a safe system overall and opportunities for thorough testing of the system are clearly limited.
Faced with this problem the software engineering community is split into three main camps. One group, the optimists, say that there is no problem, we can already design safe systems with relatively unsophisticated techniques so what's all the fuss! Their faith rests principally on the use of fault tolerant design techniques. Where there are five computer systems in parallel and a voting process, as in the A320, there will often be five separate design teams writing the software based on the same requirements document. The theory is that they won't all make the same mistakes. The trouble is that some studies have shown that they will. Most software engineers have been trained in the same way, used the same textbooks, tried out the same examples and there is no guarantee that they won't continue to think in a similar way to thir colleagues. The increasing use of standard structured methodologies more or less ensures that this will become more certain.
The next group, the evangelists preach that there is a serious problem but their particular religion - design method, language, tool etc. will solve all our problems. These people will be trying to sell you their ideas, software, courses etc. for real money. The most influential, at present, are the formalists which gives me a problem since I am regarded as being one of them! I certainly believe that mathematical methods are essential - where I differ is in the conclusion that the currently available ones are either practical or sufficient.One of the current dreams of the establishment is as follows. We will formally specify the system and then transform the specification into a program written in a safe subset of Ada. The program will then be known to be correct. The Ada program will be compiled by a compiler which is written in a rigorously defined assembler type language called Vista. The compiler will also have been proved correct. The system will then be run on a Viper chip which has also been formally proved to be a correct design. The trouble with all this is that the verification process is very difficult and costly at present, the chips cannot be tested either at fabrication or during operation for any other than very low level stuck-at faults. The original specification may have been based on a poorly understood environment in which the system is operating. Some of the specification languages and reasoning logics are also inadequate. For example the language RAISE, which has been developed at great cost, cannot handle time very well and it is usually time that makes safety critical systems critical. It would also be helpful if the language could deal with fuzzy concepts in a principled way. I will return to formal methods later.
The final group are the realists who argue against all this madness until we know more about what we are doing. This message is the least popular one amongst the decision takers in industry and government but their view is receiving more support as time goes on and more failures occur.
Another important point about these systems is that they are essentially parallel and the use of parallel processing, such as transputer based systems, is natural. However, currently such developments are banned by the MOD despite the fact that the technology has been designed to be able to handle parallelism naturally and is reasonably well founded on theoretical grounds. The use of a single processor, even if formally verified, may not be as safe when looking at the system overall. Work in the Department and at the National Transputer Support Centre in Sheffield is currently addressing some of these issues.
Analysis of the failures of some safety-critical systems has been done [Bellamy & Gyers] and although this cannot give a complete picture it can provide some indications of potential trouble spots. One point to emphasise is that accidents are often caused by combinations of errors in different parts of the system. 18% of incidents involved computer hardware failures and 77% of incidents involved software or operator errors. Of the latter 47% were caused by the operator failing to follow correct procedure and 59% derived from inadequate or incorrect information being supplied to the operator by the system.
The problems posed by designing the interfaces of safety critical systems must be solved if we are to have confidence in them. This involves understanding the environment of the system, the user and the physics of silicon. We are still far from this ideal.
$5 "How use doth breed a habit in man".
How many people have sat in front of a computer in a state of confusion, apprehension and even terror! I certainly have. Perhaps if the system involved was a word processor it might not have serious consequences, maybe a few files get lost and in the worst case the poor frustrated user might get violent with the terminal.
Suppose, however that the terminal was controlling some safety-critical system such as a nuclear power station or chemical works? As we saw from the available evidence relating to safety related incidents the most error prone aspect of the system is often the user and the interface between the user and the system.
What are the basic problems in the design of the user interface? In most systems the processes involved in the sequences of interactions between users and systems are only dimly understood. Although some design principles are emerging from the wealth of research in Human-Computer Interaction (HCI) we are still a long way from being able to reason about the likely dynamics of these interactions and the consequences for the integrity of the system.
The main problems can be listed as;
the user does not know what to do next to achieve a particular goal;
the user does not understand why the system has behaved in a particular way;
the system has failed to provide the user with appropriate information.
Although much progress has been made on the ergonomics of interfaces, the shape of input devices such as keyboard and mice, the colours of screen displays and to some extent the design of these such as windows, forms, menus etc, the real problems seem to lie in identifying possible conceptual models the user has of the system and trying to design the interface to support 'correct' conceptual models. This involves the provision of appropriate information, not too much and not too little, that will reinforce a correct view of the system in such a way that the user knows what to do to achieve a goal and what the system will do in response. The problem with this is, of course, that users are individuals and we cannot assume that a unique correct model of the system exists. Different people have different goals with a given system, their goals may change over time and their understanding of the system will develop as they interact with it. How can the system designer handle all these problems? We have much to learn in this area.
Looking at user interfaces in general what is this progress? A number of improvements have occurred over the years and interfaces are widely regarded as being much more 'user friendly' than before. In the early days of automated banking some machines gave you your money before your card. Many people used to walk away leaving their cards behind. Their principle objective had been to get money and once this had been achieved they went. A simple rearrangement of the machine's actions overcame this problem. Other problems with user interfaces required a more substantial technical solution and so we have moved into the era of windows, icons and mice and there is much activity amongst manufacturers in designing operating system interfaces and environments with these ingredients. Commercially the Mac has been seen to be the leader in interface design and other 'more primitive' systems, such as the basic PC, are regarded by many as providing an inferior facility. This situation is beginning to change as PCs become more window based but we must take care that the intuitive appeal of these window systems does not influence a proper scientific evaluation of their value. Research into different types of interface usually centres around the user and the ease with which the user can handle the system. Techniques such as input sampling, video recording as well as interviews and questionnaires can tell us a lot about the way in which the user carried out his or her interactions and achieved their goals. We can attempt to construct an approximation to the user's conceptual model from this information and this is valuable. But suppose we look beyond the immediate concerns with the comfort of the user to try and analyse the quality of the user's final processing product. Does the ease of use of a system automatically guarantee that the final product of the processing is better? Research in this area is much more scarce. Recently workers at an American University carried out an, admittedly fairly informal, analysis of the quality of student assignments carried out using PCs and Macs. By analysing the sophistication of sentence construction and other measures of linguistic quality they came to the conclusion that the PC users generally produced better quality work than the Mac users. If this is an accurate picture it raises some interesting questions. It may be that the poorer student was attracted to the 'easier' system or it may be that the emphasis in the research on usability needs to be readdressed to an analysis of the relationship of the final product to the style of interface. We should not jump to conclusions in these matters.
While on the subject of the quality of student project reports it is important to emphasise that the content of the report must always take precedence over the presentation. With the advent of powerful desktop publishing systems it is easy to be seduced into spending many hours perfecting the presentation, fiddling with the fonts, generating fancy graphics and so on. So much so that the final product, although beautiful, has no substantial content.
This is an example of the diversionary power of computers, in many cases they can help us produce a better product more efficiently, but we must remain in charge of the machine and not vice versa.
An area of considerable interest to the Department is speech recognition devices. If perfected these will open many new opportunities for the application of computers since the mechanism of talking directly to the machine without having to use keyboard or mouse will liberate the technology from the 'desk top'.
Can we be saved from the computer as users? To some extent we are getting better at designing friendly systems but there is still a long way to go and so the answer here is - maybe.
$6 Desperate diseases require desperate remedies.
What has recent research been able to contribute to the solutions of these problems? A lot of effort and funding has produced some significant advances in our understanding of the design of computer systems and many new ideas have been introduced into the research community. Some companies have started to apply them to real industrial problems although the take up has not been as good as one might expect. Let us look at one major area where British researchers have made an outstanding contribution. This is the area known as formal methods. These are mathematically based methods, usually involving the application of mathematical logic and abstract algebra. The main aim is to construct abstract mathematical models of systems and devise formal techniques for reasoning about them. This involves the definition of symbolic theories and the rules which enable us to prove properties, or theorems, about the models. In the most highly developed forms these methods allow us to state precisely the specification of the system and to transform this specification systematically into an implementation - a program or a chip design, with a rigorous mathematical proof that these transformations preserve the properties of the specification. This is the dream of some in the formal methods community. Once a specification has been constructed and analysed we can then, almost automatically and with total confidence build a totally correct working system. This is particularly attractive to the safety-critical systems industry. In fact the Ministry of Defence has issued mandatory standards for the design of safety-critical software that require the extensive use of formal methods,[MOD 1989]. The problems with the practical use of these methods is that they are very hard to use for large systems. They must be supported by software tools such as automatic theorem provers which can automatically discover all the relevant properties of the system in a rigorous and logical way. Such creatures are still crude and of little practical use. The other main criticism is that the software engineer is essentially being asked to be a research pure mathematician. In most new systems the abstract mathematical structures that you need to examine have never been studied before. You may be able to use some previous work that relates loosely to the system in question but in general you are being asked to construct your own new theory and prove fundamental theorems about it. The number of people who can do this sort of work correctly, and there is no point 'proving' incorrect theorems, is strictly limited. How can we solve all our problems in such a way? We are teaching this material to our first year undergraduates reasonably successfully, many other departments are amazed at our success, but to expect them to use the current techniques in anger on real systems in industry is perhaps expecting too much. Part of the problem is the notational opacity of the methods. They carry with them a high cognitive overload and the accompanying danger of error. We are exploring ways of using more graphical methods for describing these formal systems and some examples will be given at the end.
One major criticism of the attitude of formal methods enthusiasts is their complacency in the light of possible mistakes and misunderstandings which may arise in the course of the formal transformation process. We are interested in integrating rigorous testing throughout the design process. Many say that this is only necessary to evaluate the specification against the client's requirements. However there have been examples of famous researchers describing formally derived algorithms in the literature which have been accepted by the research community for many years and which have recently been found to have subtle but fatal flaws when subjected to fairly simple tests. The philosophy of challenging each formal model or theory to the force of scientific testing has always been at the heart of any scientific enterprise and we are convinced that the role of testing is important and must be recognised more.
What of other design fashions - object-orientated design for example? It is too early to say as to whether it is any more valuable than what we have already. The scientific evaluation of design methods in software engineering is very difficult. Bear in mind that the problem of scale is the principal cause of problems. Some software systems are the most complex artifacts ever built by man. How can we carry out sensible experiments that can compare and contrast different approaches to design? There are two main approaches, the historical and the experimental. In the former we use detailed databases constructed during real projects and try to extract appropriate information about how the project was managed, the quality of the final product and attempt to draw some conclusions. One of the main sources for this work is a US Navy database of major projects which goes back several decades. Unfortunately most of the methods used are not particularly relevant now and the systems were restricted to being Fortran codes. An alternative approach is to use several teams each building the same system but using different methods. This invariably means teams of University students and the projects are small and the environment not entirely realistic. Nevertheless some interesting conclusions have been drawn from these experiments taking great care over the interpretation of the results and trying to allow for distorting factors. One simple project I ran in the Department established a clear superiority for the use of the language Modula-2 over the more traditional teaching language of Pascal but the most interesting results compared the structured design approach with a new concept introduced by IBM Federal Systems Division called the 'clean room',[Basili 1988].
The clean room operates like this. A design team is responsible for constructing a formal specification of the system in consultation with a client group and for turning this specification into executable code, that is, a program. The team have some tool support such as editors, syntax checkers etc. but they do not have access to any compilers. What this means is they cannot try their programs out to see if they 'work'. When they are satisfied that their program is finished they hand it over to an independent testing team that carries out extensive rigorous testing to see if the program is satisfactory. The experience of IBM is that this procedure has dramatically improved quality and productivity and this has been confirmed by carefully designed trials with competing teams of students at the University of Maryland. The reasons are not hard to find. Many years ago, during the age of punched cards, you submitted your program as a stack of cards and returned for the results a few days later. If you had got your program wrong it caused you serious delays and a waste of time and effort. The quality of software in those days was probably much better than is the case today when interactive compilers are available to all. Now the temptation is to run your program rather than reason about it to find out if it works. Further more the role of informal testing of software carried out during the design stage is to confirm that the design is correct rather than testing to see if it is faulty. This often has the result that the testing is inadequate and the software quality is low.
The scientific evaluation of different software design strategies has not been extensive, as I have remarked above. We therefore need to question very carefully the claims of enthusiasts for this method and the other. For example, although object-orientated methods have been used apparently successfully, in the design of user interfaces, no proper evaluation of the method in contrast to other methods, such as the clean room, has been published. Currently the support for such approaches must be on the basis of conviction rather than science.
I have surveyed some of the ideas that are being proposed as a way of beating the problems mentioned easrlier. Apart from the appearance of new methods the other trend is towards the use of tools to support the design process. These range from editors, to semantic analysers, theorem provers, code generators, test generators and project management environments. Some tools are being developed to support the maintenance and re-engineering of existing systems and this is a vitally important area. However although good tools are valuable bad tools or tools supporting poor methods are a liability. The solution does not lie in tools alone. The question of training is an important one and I address this in the next section.
$7 The training dimension.
T he problems we face place important responsibilities on us as educators and I want to say a few words about how we, as a Department, are trying to face up to them. At one time Computer Science degrees consisted mostly of programming courses together with some technical material covering the architectural details of machines. Things have changed dramatically in recent years. Now we are much more interested in design in a wider context and the development of professional skills and the construction of a foundation for a lifetime of intellectual activity. We believe that we are in the forefront of developments in this area and we are rapidly gaining a reputation for innovation which is being followed in many other places. The reform of our courses has involved us in a great deal of work but this is essential if we are to face up to the challenges I have discussed. Computing is such a rapidly changing subject that the lifetime of a course is 2 or 3 years at most. Material which was basic research 5 years ago is now firmly established as a large part of our core first year undergraduate syllabus. Even the theoretical foundations are changing rapidly. All of this activity takes place at a time when the numbers of students wishing to study some aspect of computing is growing rapidly ( I was amazed the other day when I added up the number of students currently enrolled with us for something or other - there are 988 of these124 are postgraduates).
Our courses are now based on a firm theoretical basis with as many aspects of the professional life of a software or computer engineer featuring as possible. Not only do we discuss the implementation of systems in a variety of modern settings using the latest software tools and equipment, we also examine wider issues.
So, for example, students have to do a number of group projects from Day 1 which involve feasibility studies of possible computer applications which look at social, environmental, legal and other issues. Reports and presentations have to be made and are assessed. This type of activity is established throughout the curriculum and is intended to integrate the coursework and provide an opportunity for relating it to the real world. Some of you may have suffered from visits by some of these students from time to time and I would like to thank those of you who have helped in this way.
Another innovative project tries to reproduce the clean room. This project, we call it the 'crossover project', takes place in Year 1 towards the end of term 2 and into term 3 when there are few other lectures. Each group spends a week or so constructing a formal specification for a different system in consultation with their client, their tutor. This is then marked and the resulting document is passed on to another group for the basic design phase. Thus each group gets experience in reading and writing formal specifications. The next phase is the detailed coding carried out by a third group with the original group then receiving a supposedly working system for a weeks rigorous testing. This last phase is enjoyed by all as they have to write a critical report which must be backed up with test evidence for their conclusions an gives the group a chance to pull the work of others to bits.
The emphasis on professional development is found throughout the course. We also try to encourage comprehension skills by requiring students to read papers and give presentations on them and one interesting idea that stems from my experiences developing a revolutionary Further Mathematics A level syllabus in Northern Ireland is to include comprehension questions in exam papers. Not a new idea, arts subjects have been doing this for years, but it is rather novel in this context.
The involvement of students with real clients from industry is another recent development and we hope to be able to continue this process in the future.
The introduction of the Cognitive Science Degree is another exciting venture. With the Departments of Psychology and Control Engineering we are developing a curriculum that combines developments in artificial intelligence with software enginering underpinned with the most recent understanding of how we think and react with the world. With an eye to Europe, we have been discussing plans for a new degree in Computer Science and Modern Languages. This will try to relate natural languages and artificial languages and be a foundation point for work in computational liguistics and machine translation. The professional training of software engineers who can operate across naional boundaries will be another goal.
Looking at our educational objectives as a whole there are, alongside the usual ones of developing understanding, appreciation and personal skills, the twin aims of analysis and synthesis so necessary for software engineers. And perhaps orthogonally to all this is the attempt to provide some inspiration and just a little anarchy. Inspiration to progress enthusiastically into an exciting future and the anarchy that might encourage them to continually question why?
$8 The technical bit - a simple example from systems modelling.
T he first example is concerned with a computer controlled railway level crossing and the intention is to examine some of the safety issues in this situation.
As in many real-time and safety critical systems there are a lot of things happening at the same time. If we try to model the situation without taking this concurrency into consideration we will, at best, only be able to analyse part of the system and our conclusions about the safety of the ensuing design may be misguided or even wrong.
A very simple version of the system will be considered here. The gross simplification is necessary in order to explain the underlying approach, in a real example there will be many more aspects to consider.
The system consists of a railway line divided into sections with some sensing device to indicate when a train has entered each section. There is a level crossing gate that must be lowered at the approach of a train - to prevent traffic from crossing the line when a train is coming. A computer will be controlling the system, it will need to know when a train is coming, it will then have to cause the gate to be lowered ; and then raised when the train has left the scene.
We can represent the main features of the model with a construction called a Petri net. This technique has been used extensively for the modelling of concurrent systems in many areas.
There are three main 'ingredients' to the model :
(1) a set of processes such as the sensor recognising the train entering the first section operating, the gate being lowered etc. These are represented on the diagram by the horizontal bars a, b, ..., f.
(2) a set of places, in the diagram they are the circles numbered 1, 2, ..., 9 that control the activities of the processes and contain resources - in this case approval for things to happen.
(3) a collection of arrows that link places to processes and
processes to places.
The net can be turned into a dynamic entity by allocating it some resources and defining how these resources are used by the net. The resources are indicated by black dots. In the second picture we have distributed resources to some of the places to indicate an initial state of the system, namely the appearance of a train at the beginning of the system and the gate being up. There are thus dots or 'tokens' at places 1 and 8.
The net operates rather like a weird, parallel pin-ball machine. A process can operate if there is at least one token at all the places that point to it. Thus process a
can operate but no others. When it does operate it will consume the token at 1 and donate new tokens to 2 and 6. Thus we have diagram 3. Now both e and b are
ready to operate. In this situation either both can operate together or one after the other. The consequences of this then feeds through the net until we have a token at 5 which represents the departure of the train.
To represent the various possibilities of the system's behaviour we can construct what is known as a 'reachability tree'. The places with tokens are specified in brackets and the possible process activity indicated by labelled arrows. Thus we can trace all the possible ways in which the system behaviour by following the different paths in the diagram 4.
We can identify potentially unsafe states of the system. Clearly one of these is represented by tokens at both 3 and 8. And this can happen if b operates before e. The designer must now attempt to redesign the system to prevent this from happening. A simple device to do this is to introduce an interlock place as indicated in diagram 5. Then b cannot operate until e has done.
There are many more detailed issues to be considered before a safe and reliable system can be built. This rather trivial example is just intended to indicate one way of analysing safety in a concurrent system involving computers and mechanical and electrical devices. There is much more to consider, such as the reliability of the power supplies, the way in which the system should behave in the event of the failure of some part of it etc.
However, such nets are not very practical in analysing large systems since they become impossible to handle. The models are overcome by a mass of low-level detail. It has therefore been necessary to introduce higher-level net models that allow for data abstraction, the explicit representation of data processing and the hiding of unnecessary detail during the analysis of systems. Such techniques are now becoming very important in the modelling of concurrent systems.
My colleagues and I are using such techniques for a variety of applications including the development of functional testing strategies for VLSI systems, the security and maintenance of information in hypermedia and the representation of browsing semantics in such systems and many more.
Diagram 6 illustrates a model of a hardware component (a shift register) developed by Saleem Rathore in order to evaluate the suitability of these techniques for modelling VLSI. We are optimistic that this will prove very useful, since it is possible to model very large systems, such as microprocessors etc. without getting swamped in detail. We can then use these models to generate functional test strategies which can be used as the basis of much more sophisticated testing of the fabrications than is currently possible.
Other work in my group is concerned with the foundations of a rigorous theory of functional software testing and its relationship with formal methods; the integration of software testing into the design process at all levels; the construction of models for the analysis of users interacting with evolving user interfaces; a temporal semantics for reasoning about the possible behaviours of interfaces of real-time systems etc.
A further interest of mine is in the construction of models of types of biological and biochemical processing. These systems have evolved a highly efficient type of processing that can operate in very unstable environments. There are a number of interesting theoretical questions one could ask about this area. Can we perhaps learn from the way organisms organise their processing. It is intensely parallel and complex and yet stable and in many cases self repairing. Perhaps one day we will be able to harness this processing in some way.
We hope that these projects, driven as they are by a desire to improve the quality and safety of computer systems, will provide practical and principled solutions to some of the very real problems that face the industry.
$ 9 Conclusions.
P erhaps this lecture has seemed a little on the pessimistic side. I hope that I haven't left you with the impression that the computer is a complete waste of time. It has had a fundamental influence on life and civilisation will never be the same again. Much of computing is relatively trouble free (although our staff network collapsed while I was in the middle of preparing this lecture; luckily I had a reasonably recent back up!) and the achievements of the industry have been spectacular. I have tried, however, to temper some of the euphoria that infests some areas of the industry and question some of the more outrageous examples of computer applications. As long as I have managed to make people more aware of the possible disadvantages and even dangers of using computers before we really know what we are doing with them then I will have achieved something.
The existence of major problems, such as safety, invariably brings forward many sectors of the industry claiming to have the definitive solutions - using this method, that tool or language. It is nonsense, of course. The problems cannot be solved in such a way. We should strive to understand systems better, to reason about them using suitable formalisms, to test them, to evaluate the design processes and to strive for better quality products. However this should be accompanied with the realisation that we will never be able to build a perfect system and therefore if we decide to use computers in safety-related situations we must be prepared for things to go wrong. And when they do go wrong we may not be able to find out why so easily, thus the lessons from these disasters may not be fully learnt and our methods of building such systems will improve much slower than public opinion demands. It is therefore the responsibility of the decision makers, the marketeers and accountants to recognise that they are putting an unknown price tag on safety which may one day be called to book.
So in answer to my question 'Can we be saved from computers?' my best reply is 'maybe - but only if we realise the limitations of the technology at hand and accept the inevitable consequences of trying to run before we can crawl - let alone walk !'
[Basili 1987] IEEE Trans. Soft. Eng.
[Lewin 1982] 'A system is a system is a system is a system.' Inaugural lecture : University of East Anglia.
[Jackson 1988] 'Recent advances in software engineering: an industrial perspective.'
[MOD 1989] Defence standards 00-55.
$4 Shakespeare - Two gentlemen of Verona
$5 Virgil - Aeneid
$6 Guy Fawkes