Monitoring – Oberon RTS

Overview

Oberon RTS provides mechanisms to monitor the running system, including the control programs and processes:

For development and actual usage:

Run-time error detection, handling, and reporting: traps and aborts

For development:

Keeping track of certain process state data, such as the procedure call stack, to be used for finding and fixing causes of run-time errors.
Measuring the performance, such as process run-time values, to be used to assess the efficiency and feasibility of the running system.

Run-time Errors

Run-time errors are logged, to allow the inspection of error causes and the corresponding system’s responses by its error handling on a timeline.

Example log output: ASSERT violation (trap 7):

2021-12-20 04:00:16 TRAP 7 in proc t1 at pos 601 in TestErrors at 000198F8
2021-12-20 04:00:16 SYS RESET: reset all procs.
2021-12-20 04:00:16 PROC RST: adt  2
2021-12-20 04:00:16 PROC RST: gc  1
2021-12-20 04:00:16 PROC RST: t1  4
2021-12-20 04:00:16 PROC RST: cmd  3
2021-12-20 04:00:21 SYS OK SCR: 00001000

Stack Overflow Monitor

Apart from its role in the context of run-time error detection, the stack overflow monitor also keeps track of the actual stack memory used by each process. Determining the needed stack space for each process is a critical design decision for a control program. The actually used stack space can be displayed using System.ShowProcesses (see below).

Monitoring stack overflow in software, while possible, would come at a substantial processing overhead (and a compiler change if we want to do this systematically). Therefore, Oberon RTS uses a hardware device in the FPGA. Hence, apart from setting the correct monitoring limits, and keeping track of the actually used stack space, upon coroutine transfer, there is no software overhead.

Calltrace

Module Calltrace, together with a hardware device in the FPGA, provides the functionality to print out a procedure call trace, that is, the stack of procedure calls as they occurred until the point where Calltrace.Trace is called.

Of course, such a stack needs to be maintained for each process separately. During development it’s helpful to print this call trace when encountering a run-time error.

The call trace stack is maintained by the hardware device in the FPGA. Apart from switching between the separate stacks upon coroutine transfer, there is no overhead in the software for this functionality.

Example: simple call procedure chain from a command, with an ASSERT violation in the last called procedure. Each called procedure calls Calltrace.Trace:

call trace stack: 3 id: 50
  TestCall1             00018B1C 0000000C       3
  Cmds                  00012BD0 000004EC     315
  Cmds                  00012CF8 00000614     389
  Cmds                  00012C70 0000058C     355
max depth: 13

call trace stack: 3 id: 51
  TestCall2             0001897C 0000000C       3
  TestCall1             00018B20 00000010       4
  Cmds                  00012BD0 000004EC     315
  Cmds                  00012CF8 00000614     389
  Cmds                  00012C70 0000058C     355
max depth: 13

2021-12-20 05:05:05 TRAP 7 in proc cmd at pos 90 in TestCall3 at 000187D8

call trace stack: 3 id: 99
  TestCall3             000187D8 00000008       2
  TestCall2             00018980 00000010       4
  TestCall1             00018B20 00000010       4
  Cmds                  00012BD0 000004EC     315
  Cmds                  00012CF8 00000614     389
  Cmds                  00012C70 0000058C     355
max depth: 15

2021-12-20 05:05:05 SYS RESET: reset all procs.
2021-12-20 05:05:05 PROC RST: adt  2
2021-12-20 05:05:05 PROC RST: gc  1
2021-12-20 05:05:05 PROC RST: cmd  3
2021-12-20 05:05:10 SYS OK SCR: 00001000

Note that the Calltrace.Trace output is not part of the log. Direct log printing was enabled to see the log entries amid the trace output.

The numbers printed for each trace entry correspond to the BL instruction for the call done, first the absolute address, second the module-local address, third the module-local line of assembly code.

Process Monitor

Module ProcMonitor, together with a hardware device in the FGPA, provides the functionality to measure

the time spent in each phase of the scheduler, Evaluation and Execution, and
the time each process uses for its execution.

Measuring the run-times of the two phases is done over a sampling period of one second, recording the maximum value with each pass through Evaluation and Execution, respectively. These values can be inspected using System.Watch.

Measuring the processes’ run-time records the value of the latest run, plus the maximum ever used. These values can be inspected using System.ShowProcesses.

System.ShowProcesses

For the idle system, running only the system processes:

  id   ty  pr  tm   n  st    rt     rtm  ovfl    stk adr    size     hot    used  unused
  adt   0   0   7   2   6    10     196     0   0000F0EC     256       0     120     136
  gc    0   0   7   1   6    10      11     0   00014BFC     512       0      40     472
  cmd   0   3   4   3   3   497     631     0   0006C000   15872     512    1636   14236
  scheduler                                     0006FE00     512       0      92     420

  timers: t0: 5  t1: 10  t2: 20  t3: 50  t4: 100  t5: 200  t6: 500  t7: 1000

id: process id, adt = audit, gc = garbage collector, cmd = command handler
ty: type, such as system, or essential
tm: timer used (if timed process)
n: number in scheduler list
st: status, such as timed, ready, or delayed
rt: latest run-time, in microseconds
rtm: max run-time, in microseconds
ovfl: number of overflows detected
stk adr: stack address
size: stack size, in bytes
hot: stack hot zone size, in bytes
used: stack memory used, in bytes
unused: stack memory unused, in bytes
timers: timer periods, in milliseconds

Note the special entry for the scheduler, which just a coroutine, not a full-blown process.

System.Watch

For the idle system, running only the system processes:

Clock freq    42857142
Total RAM       524288
Modules space   442368  100224  22%
Heap space       65536    4928   7%
Disk sectors     65536     229   0%
Procs                3
MemLim        00080000
stackOrg      00070000
stackSize     00004000   16384
GC limit         49152
Eval                 9      12
Exec                10   97820
Eval/Exec          933

Eval: the latest sampling of the run-time, and the maximum ever sampling, of the Evaluation phase, in microseconds
Exec: the latest sampling of the run-time, and the maximum ever sampling, of the Execution phase, in microseconds
Eval/exec: the latest percentage of the Evaluation of the Execution phase, in percent * 10

Note the equality of the maximum run-time of process cmd, above, and of the Execution phase (plus/minus two microseconds due to the control logic).

With Some Load

With an idle system, the value Eval/Exec of 933, corresponding to 93.3%, is high, because the three system processes don’t put sufficient load onto the Execution phase. When loading the system with 15 processes (the system processes plus 12 test processes), the Evaluation phase’s run-time goes down to a more reasonable 8.7%.

System.ShowProcesses
  id   ty  pr     per   n  st    rt     rtm  ovfl    stk adr    size     hot    used  unused
  p3    1   0   75000   7   6   102     325     0   00018D90     256      32     100     156
  p2    1   0    5000   6   6    96     452     0   00018C90     256      32     100     156
  p1    1   0   30000   5   6   249     446     0   00018B90     256      32     100     156
  p0    1   0   40000   4   6   134     363     0   00018A90     256      32     100     156
  adt   0   0 1000000   2   6     7     223     0   0000F410     256       0     164      92
  gc    0   0 1000000   1   6     7       8     0   0001403C     512       0      40     472
  p7    1   1   15000  11   6    77     490     1   00019190     256      32     100     156
  p6    1   1   75000  10   6    32     344     1   00019090     256      32     100     156
  p5    1   1   55000   9   6   172     357     1   00018F90     256      32     100     156
  p4    1   1   20000   8   6   242     407     1   00018E90     256      32     100     156
  p11   1   2   95000  15   6   128     414     0   00019590     256      32     100     156
  p10   1   2   20000  14   6   140     395     2   00019490     256      32     100     156
  p9    1   2   70000  13   6   255     382     2   00019390     256      32     100     156
  p8    1   2   70000  12   6    45     363     2   00019290     256      32     100     156
  cmd   0   3  100000   3   3     9   97802     0   0006C000   15872     512    1900   13972
  scheduler                                         0006FE00     512       0     284     228

System.Watch
  Clock freq    42857142
  Total RAM       524288
  Modules space   442368  106944  24%
  Heap space       65536   11808  18%
  Disk sectors     65536     229   0%
  Procs               15
  MemLim        00080000
  stackOrg      00070000
  stackSize     00004000   16384
  GC limit         49152
  Eval                35      40
  Exec               409   97803
  Eval/Exec           87       0

With some more substantial processing by the test processes, the Eval/Exec value will even decrease to about 2% (scheduler overhead).