A Multi-Agent Adaptive Traffic Signal Control System Using Swarm
Intelligence and Neuro-Fuzzy Reinforcement Learning
Wei Lu, Yunlong Zhang and Yuanchang Xie
Abstract— This research develops and evaluates a new multiagent adaptive traffic signal control system based on swarm
intelligence and the neural-fuzzy actor-critic reinforcement
learning (NFACRL) method. The proposed method combines
the better attributes of swarm intelligence and the NFACRL
method. Two scenarios are used to evaluate the method and the
new NFACRL-Swarm method is compared with its NFACRL
counterpart. First, the proposed control model is applied to
isolated intersection signal adaptive control to evaluate its
learning performance. Then, the control system is implemented
in signal control coordination in a typical arterial. In the
isolated intersection, the proposed hybrid method outperforms
its previous counterpart by improving the learning speed and
is shown to be insensitive to reward function parameters. In
the network, by introducing a coordination scheme inspired
by swarm intelligence, the proposed method improves the
performance by up to 12% and has a faster learning speed.
I. INTRODUCTION
Traffic signal operation in urban areas is a tough task faced
by traffic engineers. Due to the heavy volume of vehicular
traffic during peak hours and the stochastic nature of traffic
flow, this task requires proper and sophisticated timings
without which significant delay and traffic congestion may
happen. Conventional signal control systems are either pretimed or actuated control. Pre-timed control is the easiest to
implement, however, it has the least flexibility to respond
to short-term traffic and traffic pattern changes. Actuated
control is designed to solve this problem by extending green
phases in respond to real-time traffic. But this strategy
sometimes is too myopic in the way that it only looks at
the movements being served, and thus causes long queues
on other movements when traffic is heavy. [1]
Adaptive control uses traffic information collected in
real-time from on-street detectors in signal control. Many
adaptive signal control systems have been proposed and
proved to outperform both pre-timed and actuated control
[2]. However, there some general limitations associated with
these adaptive control systems. One is the applicability: a lot
of the existing adaptive control systems achieve optimization
by dynamic programming, which may have restrictions from
the problem formulation and solution procedures. Another
limitation is the computational burden: adaptive control systems usually result in a huge burden of computation, which
is especially the case in centralized systems. Though some
W. Lu and Y. Zhang are with the Zachry Department of
Civil Engineering, Texas A&M University, TX 77843-3136, USA
[email protected], [email protected]
Y. Xie is with the Civil and Mechanical Engineering Technology, South Carolina State University, Orangeburg, SC 29117, USA
[email protected]
authors [3] proposed approximate dynamic programming to
overcome the computational nightmare, the applicability is
still a problem and the use is limited to isolated intersections.
In addition, hardware implementation for communication is
a heavy cost task in adaptive control systems, especially
for centralized systems. So we can identify a need for a
distributed adaptive control system with high computational
efficiency that can be implemented in traffic networks.
More recently, Xie and Zhang developed a Neural-Fuzzy
Actor-Critic Reinforcement Learning (NFACRL) control
method [1]. Satisfying and promising results were reported in
the evaluation after the tuning and training process. However,
they didnt explicitly consider the coordination among the
signal controllers located at different intersections, neither
a comprehensive study on how the parameters and training
would influence on the results was conducted. As mentioned
by the author, the learning results might be sensitive to
the reward function parameters. As we will see in section
4, this sensitivity may render the reinforcement learning
inefficient, and thus compromise its applicability in big traffic
network. This drawback is because of the assumption that the
parameters in the reward function are tuned. But different
traffic demand may have different optimal parameters in
the reward function accordingly. Since finding the optimal
parameters in the reward function is another sub-problem
and the number of experiments needs to be run to achieve
that goal is huge, there is a problem in practice. So there is a
need to resolve this problem. We solve this problem by introducing swarm intelligence into the NFACRL method. Swarm
intelligence has been applied into traffic signal operations
recently. In 2009, Putha et al. [4] developed an Ant Colony
Optimization algorithm to solve oversaturated network traffic
signal coordination problem. In 2006 and 2007, Bazzan and
Olivera [5], [6] used metaphors of pheromone accumulating
and task allocation of social insects to model traffic light
agents. They treated each intersection (plus its signal light) as
a social insect, which grounds its decision on the pheromone
trail (accumulating when vehicles are waiting) in the environment. They realized coordination in a network through this
distributed approach. However, their intersection only has
two signal plans to choose, which is not flexible enough to
realize local optimization. Besides, their over-simplified road
network and traffic data makes the results less convincing and
less practical.
In this research, a method combining the desirable
attributes of NFACRL method and swarm intelligence
is developed. The proposed hybrid method is named
Neuro-Fuzzy Actor-Critic Reinforcement LearningSwarm
2011 IEEE Forum on Integrated and
Sustainable Transportation Systems
Vienna, Austria, June 29 - July 1, 2011
978-1-4577-0992-0/11/$26.00 ©2011 IEEE 233(NFACFL-Swarm), which incorporates the simple and effective task allocation and learning process of the swarm
intelligence while maintains the accuracy and flexibility of
the NFACRL method.
The remainder of the paper is organized as follows.
First the literature on the existing approaches for traffic
signal coordination, methods of reinforcement learning and
applications of swarm intelligence is reviewed. Then, a signal
control method combining NFACRL and swarm intelligence
is proposed. Simulation scenarios in which the proposed
model is validated are described and performance results
are then presented and analyzed. Concluding remarks then
follow.
II. LITERATURE REVIEW
A. Traffic Signal Control and Synchronization
The current signal control can be classified into pre-timed
signal control, actuated control and adaptive signal control.
Most of current traffic control systems such as pre-timed or
actuated control are limited due to their inability to respond
to short-term traffic demand and pattern changes [2].
Pre-timed timing plans are determined on off-line average
traffic volume data. Some certain techniques such as mixed
integer programming, genetic algorithms and stochastic assignment are used in optimizing the timing plans [7]. This
type of control is weak in handling traffic flow fluctuations
and traffic flow pattern changes, which is often the case
in modern cities where the business centers are no longer
located exclusively downtown [5].
Actuated control can partially resolve this problem. However, it grounds its phase decisions primarily on the immediate arrivals of the movements being served. This myopia may
results in unsatisfactory control performance when traffic
demand is heavy [5], [8].
Adaptive traffic control systems, whether or not they have
fixed cycle length and phase sequences, determine the basic
operation parameters in real time based on existing and
predicted traffic flow [2], [3]. Adaptive traffic control has
attracted significant attention and researchers have developed
several programs using different optimization techniques and
hierarchical architectures. Some well-known adaptive control
packages are summarized in Table I with their characteristics
in terms of traffic data availability, signal coordination,
objective for optimization and servo mechanisms.
B. Microscopic Simulation and Multi-agent System
Microscopic traffic simulation models have received an
increasing attention. These models enable the researchers to
study and evaluate traffic system performance in scenarios
that are not able to be addressed by traditional approaches
due to their complexity (such as congested traffic conditions and incident management). A simulation platform
based on cellular automata is proposed and described in
[9]. Besides, a lot of commercial microscopic simulation
packages have been developed in the last decade, such as
MITSIM, PARAMICS, AIMSUN-2, CORSIM and VISSIM.
TABLE I
REVIEW OF ADAPTIVE CONTROL PACKAGES
Program Traffic Signal Obj. for Servo
Data coordination Opt.
OPAC Online Through Delay Decentraupstream profile lized
UTOPIA Online With offset Stops Centraupstream optimization & delay lized
SCATS Online With offset Capacity Centradownstream optimization lized
SCOOT Online With offset Stops, delay Centraupstream optimization , congestion lized
PRODYN Online Possible Total delay Decentraupstream lized
MOVA Online Nil Stops, delay Decentraupstream , capacity lized
DDYPIC Offi-line Nil Delay Decentralized
Though all these packages models vehicles in an objectoriented manner, none of them are strictly defined agentbased simulation system because of the essential differences
between object and agent.
Multi-agent systems have recently attracted the attention
of traffic researchers and have been used in traffic operations.
The multi-agent traffic system has been modeled in different
ways by traffic scientists. In 2002, Logi and Ritchie [10]
proposed a multi-agent architecture for cooperative congestion management. In 2005, Ossowski et al. [11] outlined
the design guidelines for the construction of a multi-agent
decision support system and then presented a demo. Kosonen
[12] developed a real-time traffic signal control system
combining multi-agent system and fuzzy logic control in
2003. Later, Dresner and Stone [13], [14] proposed a multiagent traffic management scheme using reservation-based
intersection control mechanism in which the vehicles are
controlled by agents. They showed that their method could
produce results that closely approximate an overpass and are
hundred times better than that by traffic lights.
III. MODEL DEVELOPMENT
The NFACRL with variable phase sequences (NFACRLV) structure, reward function and learning procedure of the
NFACRL-V used in [2] is also used in this paper., for more
details about NFACRL, readers can refer to [1], [2]. But
we use a different action selection method instead of the
greedy method and a reinforcement learning method inspired
by swarm intelligence. In addition, coordination among the
signal controllers is explicitly considered.
Bonabeau et al. [15] proposed a mathematical model about
how the labor is divided in social insects colonies. Their task
distribution depends on the insects response threshold and
stimulus for each task. These interactions among insects and
the environment make the dynamic distribution of tasks happen. These concepts are used in our research in the following
way: each agent (traffic signal controller at one intersection)
acts like a social insect. It has different tendencies to execute
one of its actions (one of its available phases) according
to the environment stimulus, which is related to the action
234values from NFACRL and the thresholds that can be updated
in a self-reinforced way.
A. Computation of Stimulus
In NFACRL, the action value V (Aj; S(t)) describes the
tendency to execute action j at state S(t), so it is a good
indicator of the stimulus from the environment. After the action values are obtained, at time t, the stimulus of executing
action q is:
stq;t = (1 − α)[V (Aq; S(t)) − V (Aj; S(t))] + α aj
jAsj; (1)
where j = argminj2As V (Aj; S(t)), meaning the action
with the least tendency at time t. The first term of (1) is to
make sure that the stimulus is positive. The second term
is trying to take the neighbor signals into consideration,
where α is the influence coefficient and aj is the number
of agents performing action j in the area and As is the set
of agents in the area that can perform action j. In the isolated
intersection scenario, α equals 0; in the arterial network, α
can be any value between 0 and 1. In this research, α is set
to be 0.5. The intuitive explanation of the second term is
to encourage controllers to execute the same action, so that
they can achieve synchronization.
With the insertion of the neighbors influence on the agents
decision, the aim is to have the advantages of group formation based on direct communication, especially prioritizing
the global optimization, while also considering V (Aj; S(t))
which is focused on a more local optimization.
B. Task Allocation
Equation (2) defines the response function (the probability
to select the action j as a function of stimulus intensity sj
of the signal agent i. Let k = argmaxj2As V (Aj; S(t)),
meaning the action with the largest tendency at time t.
Tθij(stj) =
8<:
st2
j
st2
j+θij 2 ; if j = k
(1 − st2 st2 k
k+θik 2 ) st2 jst + 2 jθij 2 ; otherwise (2)
where θij is the response threshold for agent i to execute
action j and stj is the stimulus associated with action j.
C. Reinforcement Learning
The threshold is updated in a self-reinforced way. Each
signal i in the model has one threshold value to each action
j.
θt+1
ij = θij t − lijδt (3)
where δt is the decision time interval, which is set to be 3
seconds in this research. lij is the learning coefficient defined
by (4):
lij = 1 − 2σij(t) (4)
where σij(t) is an indicator of learning success, at time
step t, σij(t) is defined by (5):
σij(t) =
vuut
1
n − 1
nX k
=1
[wik k (t) − wik k (t − 1)]2 (5)
where wk
ik(t) denotes the action weight at time step t for
the k-th action output of agent i.
IV. EXPERIMENTAL RESULTS
In this section we conduct two series of experiments.
First we compare the NFACRL-Swarm method with the
NFACRL-V method in terms of learning ability, sensitivity to
reward function parameters and performances, these experiments are done based on an isolated intersection. Then the
NFACRL-Swarm method is introduced to an arterial to test
the coordination module. Also, the performance is compared
with its NFACRL-V counterpart. Note that NFACRL-V
method has been proved to be better than the traditional
traffic control algorithms.
Data set used in this section is the same as that in
[2], which is from a chosen segment of FM 2818 (Harvey
Mitchell Parkway) in College Station, Texas. The morning
peak period traffic data collected on October 7, 2004 from
7:00 A.M to 8:00 A.M. is used.
The proposed method is coded as a .dll (dynamic link
library) file and implemented in VISSIM through the signal
control interface. All the runs are conducted on VISSIM
5.10 with a desktop computer with Core 2 CPU @3.00
GHz and 8GB RAM. Table II summarizes the basic model
input parameters, i.e., β1, β2, β3, β4, β5, which are nonnegative coefficients for each variable in the reward function
of learning process. The variables in the reward function are:
number of vehicles that have passed the intersection from
approaches being given green signal, number of vehicles in
queue, number of vehicles newly added to queues, number of
vehicles in approaches being given green signal, and number
of vehicles being stopped when signal is switched from green
to red. For more details of reward function and reinforcement
learning process, readers are recommended to refer to [2].
A. Experiments on Isolated Intersection
A three-approach intersection of Rio Grande Boulevard
and FM 2818 (Fig. 1) is used in this evaluation. To test
their sensitivity to the crucial parameters, both controllers
are evaluated with different combinations of reward function
parameters, as shown in Table 2. For each combination,
90 training runs with random seeds are conducted. In this
evaluation, average delay per vehicle (hereinafter referred to
as delay) is chosen as the performance criteria. The corresponding performances of three configurations are shown in
Table 3. In Figure 2, the performances of two methods are
compared together with the best performance value that can
be obtained by NFACRL-V. The best value is the average
of 30 runs after 90 training runs. Note that 30, 60 and 90
in Table 3 and Figure 2 represent the averaged performance
value of runs 1-30, 31-60, 61-90, respectively.
From Table III and Fig. 2, the following observations can
be made:
1) In configuration 1 and 3, the NFACRL-V method fail
to achieve the goal of reinforcement learning since the
delay goes up after 60 training runs.
235Fig. 1. A three approach intersection
TABLE II
β PARAMETERS
Configuration β1 β2 β3 β4 β5
1 3 0.77 0.25 3 16
2 3 1 0.5 3 8
3 3 0.75 0.5 3 8
2) In all the configurations, the NFACRL-Swarm method
learns well. Since the performance of NFACRL-Swarm
method tends to be steady after 30 training runs,
generally it has a faster learning speed than NFACRLV.
3) In two of the three configurations, the performance of
NFACRL-Swarm obtained after 30 runs is even better
than that of NFACRL-V method after 90 runs.
B. Experiments on Arterial
The proposed NFACRL-Swarm method is also evaluated
on an arterial network shown in Fig. 3 to test the new
methods coordination performance. Again, 90 training runs
with random seeds are conducted. The performances are
presented and compared with the original method NFACRLV. In addition to delay, stopped delay, and number of stops
per vehicle, speed is also considered in this evaluation. The
simulation results and comparison are presented in Table IV
and Fig. 4.
From Fig. 4 the following observations can be made:
1) The NFACRL-Swarm method consistently outperforms
the NFACRL-V in all the four criteria at the whole
training process.
2) Compared to the best1 values by NFACRL-V, the
NFACRL-Swarm method has improved the performance in terms of speed, delay and stopped delay
by 2.2%, 5.5%, and 12.2% respectively, with the only
exception of number of stops per vehicle.
3) In general, the NFACRL-Swarm method learns faster
than its NFACRL-V counterpart.
V. CONCLUSIONS
This research investigates the application of swarm intelligence in distributed adaptive traffic signal control. A new
1The best value is the average of 30 runs after 90 training runs
TABLE III
SUMMARY OF DELAY OF TWO METHODS
Delay (s/veh.)
30 60 90
NFACRL-V
Configuration 1 16.6 16.05 16.87
Configuration 2 15.87 15.22 15.06
Configuration 3 16.18 15.74 16.01
NFACRL-Swarm
Configuration 1 15.63 14.24 14.19
Configuration 2 14.48 13.67 13.67
Configuration 3 16.44 14.8 14.74
Fig. 2. Delay comparison of three configurations
method based on swarm intelligence and neuro-fuzzy actorcritic reinforcement learning is developed and evaluated at
an isolated intersection and an arterial. Compared to previous
studies, this hybrid method incorporates the simple and effective task allocation and learning process which are inspired
by swarm intelligence with the elaborate and realistic phase
configurations that is found at the NFACRL method. The
proposed new method also considers signal controller coordination at an arterial scenario. A comprehensive comparison
of the proposed NFACRL-Swarm method with its NFACRLV counterpart is conducted based on VISSIM simulation.
This research identifies the sensitivity problem of reward
function parameters of the NFACRL method and shows that
the new NFACRL-Swarm can overcome this problem. In the
evaluation on an isolated intersection, for all combinations of
parameters the proposed new method produces considerably
less delay than NFACRL-V. Some bad combinations of
parameters cause the NFACRL-Vs failure in reinforcement
learning, while in all the tests the NFACRL-Swarm method
learns well. Compared to the lowest delays obtained by
the NFACRL-V method after tuning the parameters and
90 runs of training, the new method produces lower delay
after only 30 runs of training. This increase of learning
236Fig. 3. A segment of FM2818
TABLE IV
SIMULATION RESULTS OF TWO METHODS AT DIFFERENT TRAINING
STAGES
Model Range Speed Delay Stopped # of
(s/veh.) Delay Stops
(s/veh.) /veh.
NFACRL-Swarm 30 24.88 57.42 36.18 1.3
60 25.91 51.36 29.75 1.3
90 26.16 50.01 28.19 1.32
NFACRL-V 30 23.79 64.96 41.9 1.41
60 24.47 60.82 36.87 1.44
90 25.48 53.67 30.35 1.46
best 25.6 52.9 32.1 1.25
Improvement 30 1.09 7.54 5.72 0.11
60 1.45 9.47 7.13 0.13
90 0.68 3.66 2.16 0.14
vs. best 0.56 2.89 3.91 -0.07
Improvement (%) 30 4.6% 11.6% 13.6% 7.8%
60 5.9% 15.6% 19.3% 9.3%
90 2.6% 6.8% 7.1% 9.5%
vs. best 2.2% 5.5% 12.2% -5.9%
speed can extend the new methods practical applicability and
viability. In the evaluation on an arterial, by introducing a
coordination scheme inspired by swarm intelligence, the proposed NFACRL-Swarm method outperforms its NFACRLV counterpart in terms of delay, stopped delay and arterial
speed. Also, the new method learns faster than the original
one.
Encouraging results are obtained from this this research.
To further improve the performance and applicability of the
NFACRL-Swarm control, future work will include clearer
explanation for the mechanism of swarm intelligence that is
implemented in this research. A more comprehensive evaluation of the parameters needs to be conducted. It would also
be interesting to extend the modeling framework to consider
different traffic network configurations (e.g. network with
more than two arterials that need to be coordinated).
REFERENCES
[1] Y. Xie, Y. Zhang, and L. Li, “Neuro-fuzzy reinforcement learning
for adaptive intersection traffic signal control,” in TRB Annual Meeting Compendium of Papers, Transportation Research Board Annual
Meeting, Jan. 2010.
Fig. 4. NFACRL-Swarm vs. NFACRL-V
[2] Y. Xie, “Development and evaluation of an arterial adaptive traffic signal control system using reinforcement learning,” Ph.D. dissertation,
Texas A&M University, College Station, TX, 2007.
[3] C. Cai, C. K. Wong, and B. G. Heydecker, “Adaptive traffic signal
control using approximate dynamic programming,” Transportation
Research Part C: Emerging Technologies, vol. 17, no. 5, pp. 456 – 474,
2009, artificial Intelligence in Transportation Analysis: Approaches,
Methods, and Applications.
[4] R. Putha, L. Quadrifoglio, and E. Zechman, “Comparing ant colony
optimization and genetic algorithm approaches for solving traffic
signal coordination under oversaturation conditions,” Computer-Aided
Civil and Infrastructure Engineering, pp. no–no, 2011.
[5] A. L. C. Bazzan, “A distributed approach for coordination of traffic
signal agents,” Autonomous Agents and Multi-Agent Systems, vol. 10,
pp. 131–164, 2005.
[6] D. de Oliveira, P. Ferreira, A. Bazzan, and F. Klgl, “A swarm-based
approach for selection of signal plans in urban scenarios,” in Ant
Colony Optimization and Swarm Intelligence, ser. Lecture Notes in
Computer Science. Springer Berlin / Heidelberg, 2004, vol. 3172,
pp. 143–156.
[7] E. Cascetta, M. Gallo, and B. Montella, “Models and algorithms for
the optimization of signal settings on urban networks with stochastic
assignment models,” Annals of Operations Research, vol. 144, pp.
301–328, 2006.
[8] G. Abu-Lebdeh and R. Benekohal, “Signal coordination and arterial
capacity in oversaturated conditions,” Transportation Research Record
: Journal of the Transportation Research Board, vol. 1727, pp. 68–76,
2000.
[9] B. da Silva, A. Bazzan, G. Andriotti, F. Lopes, and D. de Oliveira,
“Itsumo: An intelligent transportation system for urban mobility,”
in Innovative Internet Community Systems, ser. Lecture Notes in
Computer Science. Springer Berlin / Heidelberg, 2006, vol. 3473,
pp. 224–235.
[10] F. Logi and S. G. Ritchie, “A multi-agent architecture for cooperative
inter-jurisdictional traffic congestion management,” Transportation Research Part C: Emerging Technologies, vol. 10, no. 5-6, pp. 507 – 527,
2002.
[11] S. Ossowski, A. Fernndez, J. Serrano, J. Prez-de-la Cruz, M. Belmonte,
J. Hernndez, A. Garca-Serrano, and J. Maseda, “Designing multiagent
decision support systems for traffic management,” in Applications
of Agent Technology in Traffic and Transportation, ser. Whitestein
Series in Software Agent Technologies and Autonomic Computing.
BirkhŁuser Basel, 2005, pp. 51–67.
[12] I. Kosonen, “Multi-agent fuzzy signal control based on real-time
simulation,” Transportation Research Part C: Emerging Technologies,
237vol. 11, no. 5, pp. 389 – 403, 2003, world Congress on Intelligent
Transport Systems.
[13] K. Dresner and P. Stone, “Multiagent traffic management: a
reservation-based intersection control mechanism,” in Autonomous
Agents and Multiagent Systems, 2004. AAMAS 2004. Proceedings of
the Third International Joint Conference on, 2004, pp. 530 – 537.
[14] ——, “Multiagent traffic management: an improved intersection control mechanism,” in Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems, ser. AAMAS
’05. New York, NY, USA: ACM, 2005, pp. 471–477.
[15] E. Bonabeau, M. Dorigo, and G. Theraulaz, Swarm intelligence: from
natural to artificial systems. New York, NY, USA: Oxford University
Press, Inc., 1999.
238