Delhi Blue:

RT Delphi:

An Efficient, “Round-less” Almost Real Time Delphi[i] Method

This article was published in

Technological Forecasting and Social Change, 73 (2006)321-333

Theodore Gordon and Adam Pease

Abstract. The authors have recently developed a new approach to performing a Delphi study that does not involve the use of sequential “rounds” and as a result, greatly improves the efficiency of the process and shortens the time to perform such studies. This paper describes this process, RT Delphi, and illustrates its use in a decision-making application drawn from the Millennium Project of the American Council for the United Nations University. The illustrative application involves setting priorities among strategies for dealing with anticipated terrorist activities that might be initiated by a single deranged individual.

1. History of the Method: The Delphi method was developed at the RAND Corporation in the late 1950’s and 60’s as an effective means for collecting and synthesizing expert judgments. Since the first RAND study was published 1964 [1], the technique has been used very often across a broad spectrum of topics. It is a principal method of futures research and has found application in planning, decision making, and policy research. For a discussion of the method, see [2, 3, 4]. Participants are carefully chosen for their expertise in some aspect of the problem under study and are promised anonymity with respect to their answers. In general, Delphi studies involve feedback of information from one round to the next, including (for numerically answered questions) the average or median of responses, and typically, reasons furnished by participants for holding extreme positions. The method is certainly not limited to numerical applications, however.

The process tends to move the group’s responses toward consensus, although reaching consensus is not necessarily the central objective or a measure of success of such studies. It also produces a set of reasons behind the responses. The value of the Delphi method rests with the ideas it generates, both those studies that evoke consensus and those that do not. The arguments for the extreme positions also represent a useful product.

The method has had myriad applications but has also drawn a number of criticisms, including the long times involved in accomplishing such studies [5].

In September, 2004, the Defense Advanced Research Projects Agency (DARPA) awarded a Small Business Innovation Research grant to Articulate Software, Inc. to develop a Delphi-based method for improving the speed and efficiency of collecting judgments in tactical situations where rapid decisions are called for. The grant was based on a decision making problem: a hypothetical decision maker, uncertain about tactics that might be followed in accomplishing a specific objective, calls on a number of experts to provide their judgments about value of the alternative approaches. Delphi was specified in the grant as the method to be employed. The objective was to improve the speed of the process, to real time if possible (hence the name: RT Delphi). The number of participants representing different areas of expertise was assumed to be small, perhaps 10-15 people. The RT Delphi design is particularly applicable in this situation: synchronous participation, a small number of participants, rapid completion required, but can be used when participation is a synchronous, the number of participants is greater, and more time is available.

A second aspect of this grant which will not be described in detail here was to utilize advanced artificial intelligence (AI) and natural language (NL) processing in analyzing the non numerical responses of the Delphi. When incorporated, advanced AI, largely invisible to the user, would improve the process through the use of a formal ontology, to harmonize language and meaning, involve theorem proving, to catch clashes among participants, employ natural language understanding, to get user input to a form the machine can “understand,” and introduce automation to allow for larger groups or a faster process, because work is offloaded to the machine. This aspect of the system is currently a research prototype and the subject of future work. Additionally, NL processing will be useful in identifying duplicate inputs when the language used by two respondents is not precisely the same and in clarifying or eliminating logical inconsistencies.

The NL component of this work is called the Controlled English to Logic Translation system (CELT) [6,7]. CELT accepts a simplified form of grammatical English. Although the complexity of the linguistic input is limited, the system has a 100,000 word-sense vocabulary. In order to capture the meaning of the linguistic input in a form suitable for machine understanding, we use a formal representation stated in mathematical logic. The logical representation, called the Suggested Upper Merged Ontology (SUMO) [8], is more general than the original English, although also more precise and formal. SUMO and its associated ontologies cover roughly 5000 formal concepts. An additional component is a system than can automatically perform automated deduction on the logical representation, and is called Sigma [7]. We connected these components to the web-based Delphi matrix interface, providing a demonstration of the use of CELT, SUMO and the Sigma inference system to catch contradictions and redundancies in studies using RT Delphi.

2. Description of the Method

Imagine a Delphi study involving a set of numerical question. When each respondent joins the on-going study, he or she is presented an on-screen form that contains, for each question:

The average (or median) response of the group so far (and possibly the distribution of responses)
The number of responses made so far
A button that opens a window showing reasons that others have given for their responses.
A button that opens a window that allows the respondent to type in justifications for their own answer.
And finally, a space for the new respondent’s numerical estimate, answering the question.

The respondent sees, for each question, average (or median) response of the group so far (1) and the number of responses (2) involved in arriving at the average or median. In considering his or her answer to each question the respondent may refer to the reasons others have been given (3) by pressing a button and opening a “reasons window.”

Considering this information, the respondent provides an input (5) and instructs the computer to “save” the answer. The group average or median is updated immediately and presented back to the respondent and anyone else who has signed on.

If the respondent’s answer to any question is beyond a pre-specified distance from the average or the median, an attention-getting indicator flags the question for the respondent. When the flag is “up” the respondent is asked to give reasons for their response (4) which, when saved, become an entry in the “reasons window” and is seen later when anyone opens that window (3).

There is no explicit second round. When the respondent comes back to the study in a minute or a day, the original input form is presented to him or her. Of course, by then others may have contributed judgments, the averages or medians may have changed and other questions may be flagged since the group response may have changed sufficiently to move the respondent’s previous answers outside of the pre-specified distance from the average or the median since the last time the input page was viewed.

In this way the Delphi requirements of anonymity and feedback are met and the process, once underway yields the distribution of the group’s responses and reasons for the extreme positions. The process can be synchronous or asynchronous, and if implemented on an Internet site, can involve a world wide panel (as is the intent of the Millennium Project). The administrator can publish a cut off time (an hour, a day, a week, or a month away) and encourage participants to visit the site often before that time. There will be no “stuffing of the ballot box” since each participant has only one form- their original form- that is always brought back when the participant revisits.

This description applies to a series of estimates of the attributes of future developments (including policies), for example their value, timing, impacts, probabilities, or backfire potential. We call this a “1D RT Delphi.” The process can also be used in two dimensional applications in which the structure of the questionnaire is in matrix form (“2D RT Delphi”), as illustrated in later paragraphs of this paper. The matrix format is ideal for utility matrix, decision models, input/output, cross impact or other such applications.

In summary: only one round is involved. Each respondent views their own earlier response when they return to the study. As they continue to watch their input form (or later on a return visit) they also see the new averages, medians, distributions, and reasons given by other panelists for their positions. This information appears whenever new inputs are received from other participants.

The applications of this system include forecasting, foresight, and policy studies involving experts, in any problem in which the synthesis of expert opinion is necessary or desired. The participants may be:

In small groups operating synchronously in a conference room with laptop computers connected wirelessly to the web site where the software resides, with anticipated completion of the exercise in say 20 minutes.

Participating individually from remote locations in scheduled on line sessions designed for example to explore and evaluate policy options for a decision maker with anticipated completion of the exercise in say 20 minutes.

Participating asynchronously from remote locations at convenient times over a somewhat longer period.[ii]

The cutoff point is determined by the study manager, usually when the feedback has resulted in a steady state.

3. How to Do It

The method described above can be implemented via a site on Internet or on any network. Selecting expert participants is still extremely important as it is in all Delphi studies.

Suppose a conventional Delphi were designed to collect opinions from experts about the date for a manned Mars landing. The steps involved might be:

• Identify experts from the required disciplines: e.g. rocket scientists, geologists, and bio-scientists

• In round 1, participants would be asked to provide their judgment about the date

• In round 2, the range of dates would be presented to the group, and persons holding opinions at the extremes of the range would be asked to reassess their opinion in view of the group's range and to provide reasons for their positions.

• In round 3, the emerging group judgment on a date would be presented along with reasons for the extreme opinions. Each member of the group would be asked to reassess his or her position in view of the reasons presented.

In a RT Delphi study of a manned Mars landing date, experts would be identified as before from appropriate disciplines. The on-line software would have to be readied for the specific application; this step is similar to preparing the first round questionnaire of a conventional Delphi.

In RT Delphi, after signing on with an authentic password, the experts might be presented a problem statement and summary of the present situation (e.g. funding, organization, unsolved problems, etc.) and, having read this, proceed to their on-line form that might appear as follows:

Question: Given the summary of the situation that you have just read, from your perspective, when do you believe a manned Mars mission might actually launch?

Text Box: • Average response of the participating panel so far: 2028

• Interquartile range: 2025- 2033

• Number of people who have contributed an answer: 14

• To see the reasons for their answers click here:

• Your estimate of launch date _____________

• Please provide reasons for your answer, click here: :

• To save your estimate, click here:

Pressing the “see the reasons” button would lead to a screen similar to the one below:

Text Box: • Average response of the participating panel so far: 2030

• Interquartile range: 2025- 2033

• Number of people who have contributed an answer: 15

• To see the reasons for their answers click here:

• Your estimate of launch date ___2060___________

• Please provide reasons for your answer, click here

• To save your estimate, click here: When back on the questionnaire form, the respondent would add his or her estimate of the expected launch date, press “save,” and the information they had entered would immediately be used in the calculation of the average (or median) date. Suppose the respondent entered an estimate of say, 2060; the form might appear as follows:

The “reasons” question shows in bold italics to call special attention to this request since the estimate provided by the respondent is outside of the interquartile range. The respondent might then press that button and type in:

I believe that the project will run into significant difficulties, both funding and technical. The public won’t tolerate a failure- look at the shuttle program- and it will be hard to conduct the Mars project without failure.

Then any later respondent asking to see “reasons” would see:

If the study is run synchronously (that is, all participants are on line at the same time) all would see their forms change as new answers are received. They would see the group average and interquartile range. If their answers differed by more than a preset number from the group’s average, they would be asked for reasons and could see reasons offered by others for their answers. The respondents could change their earlier responses if they wished to do so.

Now consider asynchronous applications (that is, respondents join at times convenient to them). When a respondent signs on to the study at a second, third or any later time, his or her original form would be presented again, showing the original estimate, but with the new group average and interquartile range, as well as the new compilation of reasons for prior answers. . If their answers differed by more than a preset number from the group’s average, they would be asked for reasons and could see reasons offered by others for their answers. The respondents could change their earlier responses if they wished to do so.

In either case, after sufficient participants had contributed, the administrator could “freeze” the results and declare the study complete.

Of course, in a real case, many more questions than those shown in the Mars illustration might be included, such as estimates of dates for intermediate steps involved in completing the mission, estimates of funding requirements, and setting priorities of alternate strategies and policies.

In preparing for the study it is necessary to provide a set of “initial conditions” so that the first respondent does not see a null questionnaire. This can be done by using judgmental responses from the beta test panel or using plausible and illustrative entries.

4. Conducting a RT Delphi Analysis[iii]

The DARPA contract mentioned earlier called for a decision making application and the authors chose to use an example from the Millennium Project. When signing on, the participant was given the following instructions:

Then a statement of the hypothetical problem was presented:

The analysis technique built into the process was a utility matrix in which alternate actions were compared on the basis of the degree to which each was seen to meet a set of previously stated criteria. (This illustrates a 2D RT Delphi process). The alterative strategies included at the initiation of the study were:

Governments modify school curricula to remove cultural biases

The world implements a vastly improved disease early warning system

UN sponsors vigorous anti-terrorist campaign among religious leaders

Governments build redundancy into societal and technical infrastructure

Governments establish dialogs with dissidents

The UN employs advanced detection systems for all of its WMD on site inspections

Systematically alter policy to defuse terrorist recruitment

And the criteria were:

The decision is not likely to have serious negative consequences

The decision is likely to be effective

The decision can be implemented quickly

The decision is plausible

The decision is likely to provide useful feedback to alter future strategy

The decision has reasonable cost

The questionnaire was in the form or a matrix (alternative strategies vs. criteria) and the respondents were asked to fill in the matrix with judgments about the weights of each criterion and the degree to which each alternative strategy met each criterion. The instructions read as follows:

And a portion of the matrix into which judgments could be entered appears below. Some cells are darkened to draw attention to questions in which the respondent’s answers differ considerably from the group average. The underlined text indicates a hyperlink.[iv]

The decision is not likely to have serious negative consequences

The decision is likely to be effective

The decision can be implemented quickly

The proposed decision is plausible

The decision is likely to provide useful feedback to alter future strategy

The decision has reasonable cost

Weights

Avg.: 4

Responses: 5

Reasons.

Input 4

Justify

Avg.: 6

Responses: 5

Reasons.

Input 5

Justify

Avg.: 7

Responses: 5

Reasons.

Input 6

Justify

Avg.: 8

Responses: 5

Reasons.

Input 7

Justify

Avg.: 3

Responses: 5

Reasons.

Input 4

Justify

Avg.: 6

Responses: 5

Reasons.

Input 5

Justify

Governments modify school curricula to remove cultural biases

Avg.: 10

Responses: 5

Reasons.

Input 10

Justify

Avg.: 6

Responses: 5

Reasons.

Input 1

Justify

Avg.: 4

Responses: 5

Reasons.

Input 6

Justify

Avg.: 5

Responses: 5

Reasons.

Input 4

Justify

Avg.: 9

Responses: 5

Reasons.

Input 8

Justify

Avg.: 9

Responses: 5

Reasons.

Input 10

Justify

The world implements a vastly improved disease early warning system

Avg.: 2

Responses: 5

Reasons.

Input 1

Justify

Avg.: 3

Responses: 5

Reasons.

Input 2

Justify

Avg.: 1

Responses: 5

Reasons.

Input 2

Justify

Avg.: 5

Responses: 5

Reasons.

Input 7

Justify

Avg.: 9

Responses: 5

Reasons.

Input 8

Justify

Avg.: 2

Responses: 5

Reasons.

Input 1

Justify

UN sponsors vigorous anti-terrorist campaign among religious leaders

Avg.: 6

Responses: 5

Reasons.

Input 6

Justify

Avg.: 7

Responses: 5

Reasons.

Input 7

Justify

Avg.: 4

Responses: 5

Reasons.

Input 4

Justify

Avg.: 6

Responses: 5

Reasons.

Input 6

Justify

Avg.: 9

Responses: 5

Reasons.

Input 9

Justify

Avg.: 6

Responses: 5

Reasons.

Input 1

Justify

Governments build redundancy into societal and technical infrastructure

Avg.: 8

Responses: 5

Reasons.

Input 9

Justify

Avg.: 9

Responses: 5

Reasons.

Input 10

Justify

Avg.: 3

Responses: 5

Reasons.

Input 4

Justify

Avg.: 9

Responses: 5

Reasons.

Input 10

Justify

Avg.: 4

Responses: 5

Reasons.

Input 5

Justify

Avg.: 9

Responses: 5

Reasons.

Input 10

Justify

Governments establish dialogs with dissidents

Avg.: 5

Responses: 5

Reasons.

Input 4

Justify

Avg.: 1

Responses: 5

Reasons.

Input 2

Justify

Avg.: 5

Responses: 5

Reasons.

Input 7

Justify

Avg.: 7

Responses: 5

Reasons.

Input 6

Justify

Avg.: 4

Responses: 5

Reasons.

Input 2

Justify

Avg.: 2

Responses: 5

Reasons.

Input 3

Justify

In this implementation, the respondent’s inputs in each cell were accommodated using a drop down menu that ranged from 1 to 10 and included a “no comment” possible response.

The computation is automatically performed, and a respondent is presented with two rank order listings of the highest scoring alternatives, one based on the group averages and the other based on the respondent’s inputs. This comparison is another source of information that may lead the respondent to revise their inputs.

These scores are computed as weighted sums:

Score y = Sum [Wt (x) x cell (x, y)]

Where Score y is the score of alternative y, Wt (x) is the weight accorded criterion x, and cell (x, y) is the judgment in the cell that depict how well alternative y meets criterion x.

The study is left on line until sufficient data have been collected. Sufficiency is defined by the administrator and is likely to be based on the number of responses received, the spread in judgments, and the richness of the reasons furnished by the respondents.

The existing version of the software is available for open-source download at http://delphiblue.sourceforge.net .

5. Variants of RT Delphi

RT Delphi, can be used in a conference room setting. Imagine a conference room equipped with WiFi and the participants in the conference room using wireless equipped laptops. In this case the participants could sign on to a web site with the software. An inquiry of the sort described here could be run, roundless, in real time.

If the participants are dispersed, they could participate by connecting to the appropriate Internet site.

Of course it will be possible to include “participant calibration” in the computation. Respondents can be asked to self appraise their expertise or their confidence in their answer and the computation can de-rate the response for lack of expertise or confidence. The problems introduced by this approach, however, are extensive since some participants are self effacing and tend to call themselves in expert while others, less qualified may consider themselves more expert.

6. Existing On-Line Delphi Systems

There are several existing systems which also provide frameworks for conducting Delphi and Delphi-like studies. TechCast, an online service provided by TechCast, LLC (William Halal) at http://www.techcast.org is an innovative on-going technological forecasting subscription service which anticipated some of the “1-D” RT Delphi features. Experts provide judgments about the timing of predefined technological events and comments about those events. They see a summary of the numerical answers of experts who preceded them although not their reasons, and hence some Delphi principles do not apply (e.g. feedback by participants of reasons for extreme opinions). Later visits of experts to the site bring up their prior numerical responses and comments for re-evaluation but not the comments (reasons) other participants provided for their answers.

Perseus provides survey forms that can be shaped into Delphi questionnaires. (http://www.perseus.com/survey/company/index.html). The Millennium Project has used this approach over the past few years in executing its “Lookout Panels” in which several hundred international experts participate. A similar on-line survey system is Zoomerang: http://info.zoomerang.com/

The Rotisserie system of the H2O project, (http://h2o.law.harvard.edu/index.jsp) is a forum-like approach that is designed to encourage thoughtful feedback to comments by other participants in a topical study. In this approach every post is assigned to one other particular participant to answer. Thus the domination of a forum by a few aggressive participants is avoided and “shooting from the hip” responses tend to be replaced with more thoughtful comments. Another web-based system, CivicSpace, is designed for community building from grass roots “bottom up” participation. It also offers survey capability. (http://civicspacelabs.org)

Salo and Gustafsson of the Helsinki University of Technology have described the use of group support systems in foresight applications [9]. Deme is a web-based group support platform that is designed to facilitate on-line group discussion and decision making. (http://groupspace.org/). It contains a number of features including the capacity to conduct a survey among participants, together with the usual groupware communication facilities.

Calibrum (http://calibrum.com/) offers several powerful survey products that have been used in Delphi studies. “Surveylet.” provides survey support and “Strategylet,” is designed for cross impact applications. The “Surveylet” format designed for use in the “Euforia” study is in matrix form and employs pull down menus [10]. For “A study on European Information Society Technologies” [11], Calibrum, using Surveylet, also provided the on line design.

7. Strengths and Weaknesses of the Method

The principal strength of RT Delphi is its efficiency and applicability to both “standard” Delphi topics and those of matrix design such as decision making, input/output, and cross impact.

The greatest weakness of RT Delphi is that only a proof of concept prototype exists. More development is required to place it into full scale operation, particularly the asynchronous application. Principal among the development needs are an administrator package that will permit easy editing of alpha inputs, real time presentation of results and tracking of progress over time.

8. Applications

The design of almost any multi-round Delphi study could be changed to utilize the RT Delphi approach outlined here. Of particular importance are the applications involving a matrix format. In a conventional questionnaire, the questions would be asked sequentially; a 10 by 10 matrix would thus require a tedious and numbing sequence of 100 questions. With the matrix arrangement suggested here, the whole set could be presented on one page. The matrix format suggests use of RT Delphi to produce an input/output matrix in which the coefficients in the cells are provided by the judgment of experts rather than the more usual, but complex, econometric regressions.[v]. Similarly, a cross impact problem can be handled much more simply in this way. In this application the experts would provide the conditional probabilities in the cells and the probability computation could be accomplished at the end of the process or even during its use.

REFERENCES

[1] Gordon T.J. and Helmer, O.: Report on a long-range forecasting study, The Rand Corporation. P-2982 (1964).

[2] Linstone, H., and Turoff, M.: The Delphi Method. Techniques and Applications, Addison-Wesley, Reading, MA 1975.

[3] Gordon, T. The Delphi Method, Futures Research Methodology V2, CD ROM, the Millennium Project, American Council for the United Nations University (2003).

[4] Turoff, Murray and Hiltz, Starr Roxanne, “Computer Based Delphi Processes,” an invited chapter in Adler, Michael and Ziglio, Erio, eds, Gazing into the Oracle: The Delphi Method and Its Application to Social Policy and Public Health, London, Kingsley publishers.

[5] Huckfeldt, V.E., and Judd, R.C.: Issues in large scale Delphi studies, Technological Forecasting and Social Change 7, 175-184 (1974).

[6] Pease, A., and Murray, W., (2003). An English to Logic Translator for Ontology-based Knowledge Representation Languages. In Proceedings of the 2003 IEEE International Conference on Natural Language Processing and Knowledge Engineering, Beijing, China, pp 777-783.

[7] Pease, A., (2003). The Sigma Ontology Development Environment, in Working Notes of the IJCAI-2003 Workshop on Ontology and Distributed Systems. Volume 71 of CEUR Workshop Proceeding series.

[8] Niles, I & Pease A. “Towards A Standard Upper Ontology.” In Proceedings of FOIS 2001, October 17-19, 2001, Ogunquit, Maine, USA. See also www.ontologyportal.org

[9] Ahti Salo and Tommi Gustafsson, “A Group Support System for Foresight Processes,” Int. J. Foresight and Innovation Policy, Vol. 1, Nos. 3/4, 2004 249. Systems Analysis Laboratory, Helsinki University of Technology,

[10] The Knowledge Society Delphi Survey, PREST, Institute of Innovation Research, University of Manchester, UK http://les.man.ac.uk/PREST/euforia/delphi.htm

[11] The FISTERA Delphi Report, Future Challenges, Applications and Priorities for Socially Beneficial Information Society Technologies. PREST / WP 4 – 1st Futures Forum, the University of Manchester, UK, 2005

BIOGRAPHICAL NOTE

Theodore Gordon is Senior Research Fellow of the Millennium Project of the American Council for the United Nations University. He is co-founder of the Project, founder, and CEO for 20 years of the Futures Group, and developer of several methods of forecasting. He can be reached at tedjgordon@att.net.

Adam Pease is CEO of Articulate Software and the creator of the Suggested Upper Merged Ontology (SUMO). He can be reached at apease@articulatesoftware.com

[i] Much of the work reported here was performed under the DARPA Small Business Innovation Research project "Group Decision Optimization with Delphi and Ontology" (SB043-041 - D043-041-0152).

[ii] The Millennium Project intends to use this system in a global “Lookout Panel” application involving its worldwide nodes.

[iii] While much of the decision making study described here was completed in the DARPA contract, it is shown in a form that represents an improved implementation.

[iv] Note that the numbers in this chart are randomized input, not the result of a real study, and are used for illustration only.

[v] The use of Delphi in collecting judgments for an input/output matrix was suggested by Jon Landeta in a paper titled “Current Validity of the Delphi Method in the Social Sciences, in review for publication byTechnological Forecasting and Social Change, scheduled for 2006.