The goal of behavioural cloning is not only to induce a successful controller, but also to achieve better understanding of the human operator’s subconscious skill. Behavioural cloning was successfully used in problem domains as pole balancing, production line scheduling, piloting (see the Claude's page), and operating cranes.
We give the demos and brief information about our experiments in 3 different domains:
The learning examples for our behaviour cloning were control traces from an experiment (in progress) where four students learned to manually control the bike simulator and complete the control task. They were required to balance the bike and drive it to the goal was in x coordinate 100 m. from the start position, with the initial frame direction of the bike -pi/2 rad. from the goal. The trial was successful if the bike reached the 5 m. radius of goal within 100 seconds and did not fall on the way.
The state of the bike is described by six variables: the tilt angle of the bicycle from vertical and its velocity, the angle between the front wheel direction and the bicycle direction (due to the deflection of the handlebars) and its velocity, the distance from the goal and the angle of the bike's frame relative to the goal. The system is controlled by two actions: the torque to apply to the handlebars and the displacement of the center of mass.
We used the simulator with the parameters as J. Randløv and P. Alstrøm: Learning to Drive a Bicycle using Reinforcement Learning and Shaping, ICML-98 (bicycle.ps.gz: 726 kb).
SEE THE CLONE IN ACTION: our simulator is written for MS-DOS and graphics in VGA mode. To try it, just download graphical bike demo for the MS-DOS, unzip it and run Bike.exe from MS-DOS or from Windows. Different simulator options are described in BkDemo.txt.
See also:
Technical
report about the bike cloning using qualitative induction.
Program
code in C for the bike simulator from page of Jette
Randløv.
The state of the system is specified by six variables: trolley position X and its velocity, rope inclination angle Phi and its angular velocity, rope length L and its velocity. Two control forces are applied to the system: force XF to the trolley in the horizontal direction and force YF in the direction of the rope.
We used experimental data from manually controlling the crane from a previous
study (Urbancic, 1994).
In that study, six students volunteered to learn to control the simulator.
Remarkable individual differences were observed regarding the characteristics of
the strategy they used. Some operators tended towards fast and less reliable
operation, others were more conservative and slower, in order to avoid large
rope oscillations.
SEE THE CLONE IN ACTION: to see the demo
download graphical
crane demo for the MS-DOS (presented at QR'99), unzip it and run .bat or .pif
files from DOS or Windows.
See also the corresponding paper (cloning the crane):
D.Suc, I.Bratko: Modelling
of control skill by qualitative constraints, (zipped
postscript) Thirteenth International Workshop on Qualitative Reasoning,
editor: Price, C., pages 212-220, Aberystwyth: University of Aberystwyth, Loch
Awe, Scotland, June 1999
See also:
Program code
in C++ for the crane simulator.
Acrobot and crane dynamics system and our experiments with human learning
(poscript document,
zipped postscript)
The state of the Acrobot is defined by angle q1 and its velocity and angle q2 and its velocity. One difficult and well-known problem is swing-up control. Here, the task is to move the Acrobot from its stable downward position to its unstable inverted position as fast as possible. A strategy to drive the controllable joint q2 so as to excite oscillation of q1, must be found. The oscillation must grow until a point of the unstable equilibrium, i.e. when the system's center of mass is directly above the q1 joint.
SEE THE CLONE IN ACTION: to see the demo download graphical acrobot demo for the MS-DOS, unzip it and run .bat or .pif files from DOS or Windows.
See also:
Acrobot and crane dynamics system and our experiments with human
learning (poscript
document, zipped
postscript)