RT Delphi:
An Efficient, “Round-less” Almost Real Time
This article was published in
Technological Forecasting and Social Change, 73 (2006)321-333
Theodore Gordon and Adam Pease
Abstract. The authors have recently developed a new approach
to performing a
1. History of the Method: The
The process tends to move the
group’s responses toward consensus, although reaching consensus is not
necessarily the central objective or a measure of success of such studies. It
also produces a set of reasons behind the responses. The value of the
The method has had myriad
applications but has also drawn a number of criticisms, including the long
times involved in accomplishing such studies [5].
In September, 2004, the Defense
Advanced Research Projects Agency (DARPA) awarded a Small Business Innovation
Research grant to Articulate Software, Inc. to develop a Delphi-based method
for improving the speed and efficiency of collecting judgments in tactical
situations where rapid decisions are called for. The grant was based on a
decision making problem: a hypothetical decision maker, uncertain about tactics
that might be followed in accomplishing a specific objective, calls on a number
of experts to provide their judgments about value of the alternative
approaches.
A second aspect of this grant
which will not be described in detail here was to utilize advanced artificial
intelligence (AI) and natural language (NL) processing in analyzing the non
numerical responses of the
The NL component of this work is called the Controlled
English to Logic Translation system (CELT) [6,7]. CELT accepts a simplified form of grammatical
English. Although the complexity of the linguistic input is limited, the system
has a 100,000 word-sense vocabulary. In order to capture the meaning of the
linguistic input in a form suitable for machine understanding, we use a formal
representation stated in mathematical logic. The logical representation, called
the Suggested Upper Merged Ontology (SUMO) [8], is more general than the
original English, although also more precise and formal. SUMO and its
associated ontologies cover roughly 5000 formal concepts. An additional
component is a system than can automatically perform automated deduction on the
logical representation, and is called Sigma [7]. We connected these components to the web-based
2. Description of the Method
Imagine a
The respondent sees, for each
question, average (or median) response of the group so far (1) and the number
of responses (2) involved in arriving at the average or median. In considering
his or her answer to each question the respondent may refer to the reasons
others have been given (3) by pressing a button and opening a “reasons window.”
Considering this information, the
respondent provides an input (5) and instructs the computer to “save” the
answer. The group average or median is updated immediately and presented back to the respondent and anyone
else who has signed on.
If the respondent’s answer to any
question is beyond a pre-specified distance from the average or the median, an
attention-getting indicator flags the question for the respondent. When the
flag is “up” the respondent is asked to give reasons for their response (4)
which, when saved, become an entry in the “reasons window” and is seen later
when anyone opens that window (3).
There is no explicit second round.
When the respondent comes back to the study in a minute or a day, the original
input form is presented to him or her. Of course, by then others may have
contributed judgments, the averages or medians may have changed and other
questions may be flagged since the group response may have changed sufficiently
to move the respondent’s previous answers outside of the pre-specified distance
from the average or the median since the last time the input page was viewed.
In this way the
This description applies to a
series of estimates of the attributes of future developments (including
policies), for example their value, timing, impacts, probabilities, or backfire
potential. We call this a “1D RT
In summary: only one round is
involved. Each respondent views their own earlier response when they return to
the study. As they continue to watch their input form (or later on a return
visit) they also see the new averages, medians, distributions, and reasons
given by other panelists for their positions. This information appears whenever
new inputs are received from other participants.
The applications of this system
include forecasting, foresight, and policy studies involving experts, in any
problem in which the synthesis of expert opinion is necessary or desired. The
participants may be:
The cutoff point is determined by
the study manager, usually when the feedback has resulted in a steady state.
3. How to Do It
The method described above can be
implemented via a site on Internet or on any network. Selecting expert
participants is still extremely important as it is in all
Suppose a conventional
• Identify experts from the required disciplines: e.g. rocket
scientists, geologists, and bio-scientists
• In round 1, participants would be asked to provide their
judgment about the date
• In round 2, the range of dates would be presented to the
group, and persons holding opinions at the extremes of the range would be asked
to reassess their opinion in view of the group's range and to provide reasons
for their positions.
• In round 3, the emerging group judgment on a date would be
presented along with reasons for the extreme opinions. Each member of the group
would be asked to reassess his or her position in view of the reasons presented.
In a RT Delphi study of a manned
Mars landing date, experts would be identified as before from appropriate
disciplines. The on-line software would have to be readied for the specific
application; this step is similar to preparing the first round questionnaire of
a conventional
In RT Delphi, after signing on
with an authentic password, the experts might be presented a problem statement
and summary of the present situation (e.g. funding, organization, unsolved
problems, etc.) and, having read this, proceed to their on-line form that might
appear as follows:
Question: Given the summary of the
situation that you have just read, from your perspective, when do you believe a
manned Mars mission might actually launch?
Pressing the “see the reasons”
button would lead to a screen similar to the one below:
When back on the
questionnaire form, the respondent would add his or her estimate of the
expected launch date, press “save,” and the information they had entered would
immediately be used in the calculation of the average (or median) date. Suppose
the respondent entered an estimate of say, 2060; the form might appear as
follows:
The “reasons” question shows in
bold italics to call special attention to this request since the estimate
provided by the respondent is outside of the interquartile range. The
respondent might then press that button and type in:
I believe that the project will run into
significant difficulties, both funding and technical. The public won’t tolerate
a failure- look at the shuttle program- and it will be hard to conduct the Mars
project without failure.
Then any later
respondent asking to see “reasons” would see:
If the study is run synchronously
(that is, all participants are on line at the same time) all would see their
forms change as new answers are received. They would see the group average and
interquartile range. If their answers differed by more than a preset number
from the group’s average, they would be asked for reasons and could see reasons
offered by others for their answers. The respondents could change their earlier
responses if they wished to do so.
Now consider asynchronous
applications (that is, respondents join at times convenient to them). When a
respondent signs on to the study at a second, third or any later time, his or
her original form would be presented again, showing the original estimate, but
with the new group average and interquartile range, as well as the new compilation
of reasons for prior answers. . If their answers differed by more than a preset
number from the group’s average, they would be asked for reasons and could see
reasons offered by others for their answers. The respondents could change their
earlier responses if they wished to do so.
In either case, after sufficient
participants had contributed, the administrator could “freeze” the results and
declare the study complete.
Of course, in a real case, many
more questions than those shown in the Mars illustration might be included,
such as estimates of dates for intermediate steps involved in completing the
mission, estimates of funding requirements, and setting priorities of alternate
strategies and policies.
In preparing for the study it is
necessary to provide a set of “initial conditions” so that the first respondent
does not see a null questionnaire. This can be done by using judgmental
responses from the beta test panel or using plausible and illustrative entries.
4. Conducting a RT Delphi Analysis[iii]
The DARPA contract mentioned earlier called for a decision
making application and the authors chose to use an example from the Millennium
Project. When signing on, the participant was given the following instructions:
Then a statement of the hypothetical
problem was presented:
The analysis technique built into
the process was a utility matrix in which alternate actions were compared on
the basis of the degree to which each was seen to meet a set of previously
stated criteria. (This illustrates a 2D RT Delphi process). The alterative
strategies included at the initiation of the study were:
Governments modify school curricula to remove
cultural biases
The world implements a vastly improved disease
early warning system
UN sponsors vigorous anti-terrorist campaign
among religious leaders
Governments build redundancy into societal and
technical infrastructure
Governments establish dialogs with dissidents
The UN employs advanced detection systems for
all of its WMD on site inspections
Systematically alter policy to defuse terrorist
recruitment
And the criteria were:
The decision is not likely
to have serious negative consequences
The decision is likely to
be effective
The decision can be
implemented quickly
The decision is plausible
The decision is likely to
provide useful feedback to alter future strategy
The decision has reasonable
cost
The questionnaire was in the form or a matrix (alternative
strategies vs. criteria) and the respondents were asked to fill in the matrix with
judgments about the weights of each criterion and the degree to which each
alternative strategy met each criterion. The instructions read as follows:
And a portion of the matrix into
which judgments could be entered appears below. Some cells are darkened to draw
attention to questions in which the respondent’s answers differ considerably
from the group average. The underlined text indicates a hyperlink.[iv]
. |
The decision is not likely to have serious negative consequences |
The decision is likely to be effective |
The decision can be implemented quickly |
The proposed decision is plausible |
The decision is likely to provide useful feedback to alter
future strategy |
The decision has reasonable cost |
Weights |
Avg.: 4 Responses: 5 Reasons. Input 4 Justify |
Avg.: 6 Responses: 5 Reasons. Input 5 Justify |
Avg.: 7 Responses: 5 Reasons. Input 6 Justify |
Avg.: 8 Responses: 5 Reasons. Input 7 Justify |
Avg.: 3 Responses: 5 Reasons. Input 4 Justify |
Avg.: 6 Responses: 5 Reasons. Input 5 Justify |
Governments modify school curricula to remove cultural biases |
Avg.: 10 Responses: 5 Reasons. Input 10 Justify |
Avg.: 6 Responses: 5 Reasons. Input 1 Justify |
Avg.: 4 Responses: 5 Reasons. Input 6 Justify |
Avg.: 5 Responses: 5 Reasons. Input 4 Justify |
Avg.: 9 Responses: 5 Reasons. Input 8 Justify |
Avg.: 9 Responses: 5 Reasons. Input 10 Justify |
The world implements a vastly improved disease early warning
system |
Avg.: 2 Responses: 5 Reasons. Input 1 Justify |
Avg.: 3 Responses: 5 Reasons. Input 2 Justify |
Avg.: 1 Responses: 5 Reasons. Input 2 Justify |
Avg.: 5 Responses: 5 Reasons. Input 7 Justify |
Avg.: 9 Responses: 5 Reasons. Input 8 Justify |
Avg.: 2 Responses: 5 Reasons. Input 1 Justify |
UN sponsors vigorous anti-terrorist campaign among religious
leaders |
Avg.: 6 Responses: 5 Reasons. Input 6 Justify |
Avg.: 7 Responses: 5 Reasons. Input 7 Justify |
Avg.: 4 Responses: 5 Reasons. Input 4 Justify |
Avg.: 6 Responses: 5 Reasons. Input 6 Justify |
Avg.: 9 Responses: 5 Reasons. Input 9 Justify |
Avg.: 6 Responses: 5 Reasons. Input 1 Justify |
Governments build redundancy into societal and technical
infrastructure |
Avg.: 8 Responses: 5 Reasons. Input 9 Justify |
Avg.: 9 Responses: 5 Reasons. Input 10 Justify |
Avg.: 3 Responses: 5 Reasons. Input 4 Justify |
Avg.: 9 Responses: 5 Reasons. Input 10 Justify |
Avg.: 4 Responses: 5 Reasons. Input 5 Justify |
Avg.: 9 Responses: 5 Reasons. Input 10 Justify |
Governments establish dialogs with dissidents |
Avg.: 5 Responses: 5 Reasons. Input 4 Justify |
Avg.: 1 Responses: 5 Reasons. Input 2 Justify |
Avg.: 5 Responses: 5 Reasons. Input 7 Justify |
Avg.: 7 Responses: 5 Reasons. Input 6 Justify |
Avg.: 4 Responses: 5 Reasons. Input 2 Justify |
Avg.: 2 Responses: 5 Reasons. Input 3 Justify |
In this implementation, the
respondent’s inputs in each cell were accommodated using a drop down menu that
ranged from 1 to 10 and included a “no comment” possible response.
The computation is automatically
performed, and a respondent is presented with two rank order listings of the
highest scoring alternatives, one based on the group averages and the other
based on the respondent’s inputs. This comparison is another source of
information that may lead the respondent to revise their inputs.
These scores are computed as
weighted sums:
Score y
= Sum [Wt (x) x cell (x, y)]
Where Score y is the score of
alternative y, Wt (x) is the weight accorded criterion x, and cell (x, y) is
the judgment in the cell that depict how well alternative y meets criterion x.
The study is left on line until
sufficient data have been collected. Sufficiency is defined by the
administrator and is likely to be based on the number of responses received,
the spread in judgments, and the richness of the reasons furnished by the
respondents.
The existing version of the software
is available for open-source download at http://delphiblue.sourceforge.net
.
5. Variants of RT Delphi
RT Delphi, can be used in a
conference room setting. Imagine a conference room equipped with WiFi and the
participants in the conference room using wireless equipped laptops. In this
case the participants could sign on to a web site with the software. An inquiry
of the sort described here could be run, roundless, in real time.
If the participants are dispersed,
they could participate by connecting to the appropriate Internet site.
Of course it will be possible to
include “participant calibration” in the computation. Respondents can be asked
to self appraise their expertise or their confidence in their answer and the
computation can de-rate the response for lack of expertise or confidence. The
problems introduced by this approach, however, are extensive since some
participants are self effacing and tend to call themselves in expert while
others, less qualified may consider themselves more expert.
6. Existing On-Line
There are several existing systems which also provide frameworks
for conducting
Perseus
provides survey forms that can be shaped into
The
Rotisserie system of the H2O project, (http://h2o.law.harvard.edu/index.jsp)
is a forum-like approach that is designed to encourage thoughtful feedback to
comments by other participants in a topical study. In this approach every post
is assigned to one other particular participant to answer. Thus the domination
of a forum by a few aggressive participants is avoided and “shooting from the
hip” responses tend to be replaced with more thoughtful comments. Another
web-based system, CivicSpace, is designed for community building from grass
roots “bottom up” participation. It also offers survey capability. (http://civicspacelabs.org)
Salo
and Gustafsson of the Helsinki University of Technology have described the use
of group support systems in foresight applications [9]. Deme is a web-based
group support platform that is designed to facilitate on-line group discussion
and decision making. (http://groupspace.org/).
It contains a number of features including the capacity to conduct a survey among
participants, together with the usual groupware communication facilities.
Calibrum (http://calibrum.com/)
offers several powerful survey products that have been used in
7. Strengths and Weaknesses of the
Method
The principal strength of RT
Delphi is its efficiency and applicability to both “standard”
The greatest weakness of RT Delphi
is that only a proof of concept prototype exists. More development is required
to place it into full scale operation, particularly the asynchronous
application. Principal among the development needs are an administrator package
that will permit easy editing of alpha inputs, real time presentation of
results and tracking of progress over time.
8. Applications
The design of almost any
multi-round
REFERENCES
[1] Gordon T.J. and Helmer, O.: Report on a long-range
forecasting study, The Rand Corporation. P-2982 (1964).
[2] Linstone, H., and Turoff, M.: The
[3] Gordon, T. The Delphi Method, Futures Research
Methodology V2, CD ROM, the Millennium Project, American Council for the United
[4] Turoff, Murray and Hiltz, Starr Roxanne, “Computer Based
Delphi Processes,” an invited chapter in Adler, Michael and Ziglio, Erio, eds, Gazing into the Oracle: The Delphi Method
and Its Application to Social Policy and Public Health, London, Kingsley
publishers.
[5] Huckfeldt, V.E., and Judd, R.C.: Issues in large scale
[6] Pease, A., and
[7] Pease, A., (2003). The Sigma Ontology
Development Environment, in Working Notes of the IJCAI-2003 Workshop on
Ontology and Distributed Systems. Volume 71 of CEUR Workshop Proceeding series.
[8]
[9] Ahti Salo and Tommi
Gustafsson, “A Group Support System for Foresight Processes,” Int. J. Foresight and Innovation Policy,
Vol. 1, Nos. 3/4, 2004 249. Systems Analysis Laboratory,
[10] The Knowledge Society Delphi Survey, PREST,
[11] The FISTERA
BIOGRAPHICAL NOTE
Theodore Gordon is Senior Research Fellow of the Millennium Project of the American Council for the United Nations University. He is co-founder of the Project, founder, and CEO for 20 years of the Futures Group, and developer of several methods of forecasting. He can be reached at tedjgordon@att.net.
Adam Pease is CEO of Articulate Software and the
creator of the Suggested Upper Merged Ontology (SUMO). He can be reached at apease@articulatesoftware.com
[i] Much of
the work reported here was performed under the DARPA Small Business Innovation
Research project "Group Decision Optimization with
[ii] The Millennium Project intends to use this system in a global “Lookout Panel” application involving its worldwide nodes.
[iii] While much of the decision making study described here was completed in the DARPA contract, it is shown in a form that represents an improved implementation.
[iv] Note that the numbers in this chart are randomized input, not the result of a real study, and are used for illustration only.
[v] The
use of