Dual RISC5 CPU

Introduction

The idea of adding a second RISC5 CPU to my Arty-A7-100 based system has been brewing in the back of my mind for some time now. For control system, a second CPU can add interesting possibilities. While multi-CPU (or multi-core) processing on conventional systems mostly provide performance, a controller can profit of hardware and system level redundancy to increase reliability, resilience, and system design flexibility. For example, the controller could run two instances of the same control program and continuously compare results and measurements, or separate time consuming tasks, or communications over a slow connection with remote systems or the user, from the real-time controlling unit, or have one CPU (and system and control program) take over should the other fail.

The Oberon system does not have any notion of multi-CPU (or multi-core) processing. But for redundancy, that is not relevant, as we want independent subsystems, and one operating system making use of two (or more) CPUs would still be one single point of possible failure.

To run on the RISC5 CPU, the Oberon compiler can produce code

  • as standalone programs, without any operating system needs or support, or
  • for dynamically loading and linking by the operating system

An example of a standalone program is the boot loader. At first sight, this might seem an option to create part of a control program to run on one of the CPUs, while the other runs the Oberon system. However, standalone programs are limited: they cannot import any modules, and they don’t include run-time error checks (traps). These limitations make standalone programs only suited for the simplest control programs.

Which leaves “normal” Oberon programs, to be dynamically loaded and linked. For redundancy and security reasons the CPUs should not share their RAM, so unless the Oberon system is modified to build the required code structure on both CPUs, it is straight forward to have two full Oberon systems running concurrently – the memory footprint of Embedded Oberon or Oberon RTS is sufficiently small.

For reasons of code maintenance, consistency, and cost, however, as well as the upload function of Astrobe, using one SD card per Oberon system is not ideal.

Hence, I envisioned the following simple use case.

  • One Astrobe console for commands and file uploads.
  • One SC card, one filesystem.
  • Each CPU is able to execute control programs with scheduled processes.
  • Peripheral devices are allocated and wired to a specific CPU.
  • Each CPU has its own, independent BRAM block.
  • Each CPU runs the same operating system code, twice instantiated.

System Structure

  • Each CPU runs the core operating system, consisting of the Inner Core, plus module Oberon, which contains the process scheduler, including the various drivers for system control, process timers, watchdog, etc., as defined by the IMPORT list of Oberon.mod.

  • One CPU handles the Astrobe console interactions, including file uploads.

  • To boot the system, the CPUs take turns to access the filesystem on the SD card. As control programs usually load all required modules at start-up, this seems to be an acceptable approach, at least to get off the ground in this first attempt.

Hardware

In the FPGA, each CPU has its dedicated:

  • RISC5 core (obviously)
  • BRAM block, half of the available 512k
  • PROM
  • bus system (inbus, outbus, codebus, adr, auxiliary signals)
  • process controllers (timed, signalled)
  • system control register
  • reset circuitry

The CPUs share clock signals. Each CPU has a buffered RS232 device for system messages, and for CPU 0 for the Astrobe console interactions. Each CPU has other devices wired to its buses, but that’s not relevant here.

The SPI device for accessing the SD card is shared.

Device IO

Each CPU has 1024 bytes of IO address space, of which the top 64 bytes form a shared IO space. Access and switch-over to the shared IO address range is gated with hardware logic in the FPGA, using a semaphore type of semantics (claim, release). Unlike with a (software) semaphore, where the requesting process would be queued and suspended until access becomes available, the availability of the access to the shared IO has to be checked explicitly after a claim. The processor could be blocked when the shared IO space is used by another processor, of course, but we might have other processes running on the requesting processor which can still run while the requesting process awaits its turn to access shared IO.

The SPI device for the SD card is allocated in this shared IO address space.

All other devices allocated and wired to a specific CPU get an address below the shared IO space, which is controlled exclusively by the corresponding CPU. Corresponding devices for each CPU, for example their aforementioned RS232 device, are wired to the same IO address, so the operating system and control programs don’t have to distinguish on which CPU they run.

Boot Process

Each CPU has its own PROM with the boot loader. The boot loader also defines the RAM layout scheme. Right now, with each CPU getting half of the BRAM, the boot loader in each PROM is exactly the same. With a different BRAM distribution, however, each CPU would need its own specific boot loader – the executed code would still be the same, but the RAM layout parameters would differ.

Upon system start, the boot loader simply claims access to the shared IO space, and awaits its turn. Note that the circuitry to claim and release shared IO access is hooked up to IO addresses in the non-shared IO range, of course.

MODULE* BootLoad;

  CONST
    (* ... *)
    SysCtrlRegAdr = -68;
    SkipReload = 1;
    SkipCardInit = 2;
    
    MCPUctrlAdr = -180;   (* IO address for claim/release of shared IO circuitry *)
    CPUidAdr = -148;      (* IO address to read the CPU id/number *)
    ClaimCmd = 1;

  VAR cpu: INTEGER;

BEGIN
  (* ... *) 
  IF ~SYSTEM.BIT(SysCtrlRegAdr, SkipReload) THEN  (* if system boot (the abort handler skips reloading) *)
    SYSTEM.GET(CPUidAdr, cpu);                    (* get cpu number/id *)
    SYSTEM.PUT(MCPUctrlAdr, ClaimCmd);            (* claim shared IO access *)
    REPEAT UNTIL SYSTEM.BIT(MCPUctrlAdr, cpu);    (* wait until shared IO access is granted *)
    (* ... *)
  END;
  (* ... *)
END BootLoad.

Then, in the body of Oberon, right before the Loop is called:

MODULE Oberon;
  IMPORT (* ...* ) MCPU;
  (* ... *)
BEGIN
  (* ... *)
  MCPU.Release;
  REPEAT UNTIL MCPU.AllDone(); (* await other systems to finish booting *)
  Loop
END Oberon.

That is, the operating system is loaded for each CPU, in sequence, and the Loops in each system start as soon as the systems on all CPUs are up and running.

This boot mechanic would allow for more than two CPUs. Also, there is no defined sequence which CPU (or system) boots first, and such sequence should not be assumed by the programmer. The first claim request of any boot loader gets handled first. With the current implementation, CPU 0 starts first, but that’s an implementation detail.

If two claim requests are issued at the same time (same clock cycle), the CPU with the lower number gets precedence, similar to interrupt requests in the interrupt controller. In fact, the shared IO controller is very similar to the interrupt controller, issuing an access enable signal in lieu of an interrupt signal.

Without any claim request, the FPGA circuitry gives CPU 0 control of the shared IO space.

MCPU.mod

The very preliminary device driver module for shared IO access is boring, it just abstracts and implements directly the interface to the hardware access arbitration and switch-over circuitry.

MODULE MCPU;

IMPORT SYSTEM;

  CONST
    CPUadr = -148;
    CtrlAdr = -180;
    
    ClaimCmd = 1;
    ReleaseCmd = 2;

  PROCEDURE* CPU*(): INTEGER;
    VAR cpu: INTEGER;
  BEGIN
    SYSTEM.GET(CPUadr, cpu)
    RETURN cpu
  END CPU;
  
  PROCEDURE* Claim*;
  BEGIN
    SYSTEM.PUT(CtrlAdr, ClaimCmd)
  END Claim;

  (* ... *)

END MCPU.

Reset and Restart

Right now,

  • pressing the abort button causes a hardware reset for both CPUs an their peripherals, and the execution of the abort handlers in both systems (CPUs);

  • restarting one system due to error-recovery does not impact the other, which happily continues to execute its control program while the other reboots,

  • executing Oberon.Restart reboots both systems.

Oberon.Restart jiggles the System Control Register bits, ie. the dual restart is solved in hardware.

Each system has its own watchdog.

CPU-to-CPU Communication

A first implementation of inter-CPU channels are described here.

Command and Program Execution

Commands can be executed on CPU 0 using the Astrobe console.

On CPU 1, commands can be executed using the remote commands facility.

In addition, remote commands can be run using the on-startup feature. I have extended the corresponding on-startup config tool to be able to write onstartup.cfg files for each system.