Fungus is "an architecture designed and optimised for Funge", according to Alexios Chouchoulas, who wrote a very clever paper on it, back in 2001. I read that paper several years ago, and in 2007, having some low-level systems experience under my belt, I decided to go ahead and implement a simulator and full suite of tools for the Fungus virtual machine.
Fungus, like Befunge, is a near-total departure from the one-dimensional world of traditional processor architecture. Its program counter (PC) moves in two dimensions, according to a two-dimensional "delta", in exactly the same way that the Befunge "IP" moves through two-dimensional Funge-space. To complement its forward-looking dimensionality, Fungus also adds some retro styling, in the form of 18-bit vector registers and memory words. (Fungus has a RISC instruction set, in which each instruction occupies exactly one 18-bit word. It could hardly be otherwise, if each instruction is to keep a consistent meaning no matter from which direction the PC encounters it!)
Possibly the cleverest aspect of Fungus, however, is its approach to ASCII. The "zero" opcode is interpreted as TRP, "trap", which invokes a kernel call corresponding to its 9-bit operand. (The Fungus specification never mentions the kernel by name, but the code in low RAM is essentially a kernel, no matter what you call it.) This means that any 18-bit word whose top 10 bits are all zero — i.e., any 7- or 8-bit ASCII character — invokes the kernel. Therefore, it is perfectly possible to write a Befunge-interpreter kernel, such that loading an ASCII Befunge program into high RAM and sending the PC there would actually run the Befunge program!
At this Web site, you can download a mirrored copy of Alexios Chouchoulas' original PDF specification, as well as my Fungus tools: a virtual machine, an assembler (producing bastardized ELF executables), a disassembler, a selection of demo programs, and the above-mentioned Befunge kernel (in the ELF format recognized by the virtual machine).
Chouchoulas' Fungus specification contains a typo: the RET instruction is given in the body text as 0XX 001 XXX XXXXXXXXX2 (colliding with the LI instruction). However, it's given correctly as opcode 1112 in Table 1 of the same document. My Fungus virtual machine implements RET as 1112 consistently, ignoring the apparent typo in the spec.
Chouchoulas' Fungus specification contradicts itself on the encoding of the branching instructions SZ, SNZ, DZ, and DNZ. In the body text, the tested register is specified as being encoded in the A field of the instruction; but in Table 1, the tested register is encoded in the X field, leaving the entire rd of the instruction word as "don't care" bits. The latter encoding makes more sense. (It lets the rd be used for data storage, and makes all of the Group 0 instructions use X instead of some instructions' using A and some X.) Therefore, my Fungus virtual machine implements Table 1's more sensible encoding.
Chouchoulas' Fungus specification is full of mistakes and unclear passages regarding the DZ and DNZ instructions. For example, he describes them as setting ΔPC to (−1,−1) or (1,1), and then describes those directions as "northeast" and "southwest". However, (−1,−1) is really "northwest", and (1,1) is really "southeast". This can be inferred from the normal meanings of those words in Befunge, and also from the description of the TRP instruction on page 11, where Chouchoulas states that going "north" from row 0 takes the PC to row 777.
The spec is also extremely unclear on the effect of the current masking mode on DZ and DNZ — the masking mode certainly affects the setting of ΔPC, but does it also affect the part of A that is tested against zero? and does DZ.X set ΔPCy to zero, or leave it unchanged? Table 1 seems to use the underbar to indicate the parts of an instruction affected by the current masking mode, but the precise meaning of the underbar seems to differ between LV and DZ.
My Fungus virtual machine simulates the instruction DZ $X as follows below. The operation of DNZ is exactly the same, except that it replaces the words "was zero", below, with "wasn't zero".
Pay special attention to the semantics of DZ and DNZ when applied to the ΔPC register itself! DNZ $2 always diverts, and DZ $2 never diverts.
Chouchoulas' specification dictates that in an 18-bit word written as a vector, the y ordinate (the wo, or the most significant half-word) is written after the x ordinate, like so: (x,y).
However, the y ordinate is also the most significant half-word when using with a register's contents as an 18-bit integer value; thus the vector (000,777)8 is equivalent to the 18-bit unsigned integer 7770008.
Chouchoulas' descriptions of the INC and DEC instructions are therefore incorrect; INC actually adds the vector (1,0), and DEC adds (−1,0). Other sections of the specification may be wrong as well. In this documentation, and of course in the virtual machine itself, I have tried to stick to the official convention of writing (x,y)-endian vectors and wo-endian integers.
The fungasm Fungus assembler provides the following pre-defined aliases for useful individual assembly instructions. The GON, GOS, GOE, GOW, JR, and SHL mnemonics were suggested by Chouchoulas; the particular opcode chosen for NOP is based on a footnote in Chouchoulas' paper.
In the table, the notation .m means that the alias takes an optional masking mode. In the interest of clarity and simplicity, some aliases do not accept masking-mode suffixes.
GON — Go North | DZ.Y $0 |
GOS — Go South | DNZ.Y $0 |
GOW — Go West | LI $2, -1 |
GOE — Go East | LI $2, 1 |
GONW — Go Northwest | LV $2, -1 |
GOSE — Go Southeast | LV $2, 1 |
GOB — Go Back | SUB $2, $0, $2 |
JR A — Jump to Register | ADD $1, $0, $A |
MR.m A, B — Move Register | ADD.m $A, $0, $B |
NEG.m A, B — Negate | SUB.m $A, $0, $B |
NOP — No Operation | INC.Y $7, $7 |
SHL.m A, B — Shift Left | ADD.m $A, $B, $B |
IRET — Interrupt Return | SMR $0, #IRET |
Notice that there are no aliases GOSW or GONE, because those operations cannot be represented in a single Fungus instruction. (The instruction set would have been more useful if the DZ and DNZ instructions were defined to load the vectors (1,−1) and (−1,1) instead of (1,1) and (−1,−1), since the latter constants can already be loaded in fewer cycles with LV, as shown above.)
As mentioned in the specification, INC.Y adds the y component of the vector (1,0) to the y component of the given register; thus, it adds zero, producing no effect.
I borrowed the idea of machine-specific registers, or MSRs, from Intel's x86 architecture. MSRs provide a way for the hardware designer to extend the capabilities of the hardware in ways the original designer didn't anticipate. They are a mixed blessing, since two different "Fungus" machines could end up supporting vastly different, and even incompatible, semantics. And I didn't enjoy hacking up Chouchoulas' elegant design. But Fungus, as it stood, needed more functionality, and MSRs were the cleanest way to provide it.
Therefore, my Fungus virtual machine adds two new instructions to the instruction set:
LMR — Load Machine Register | SMR — Store Machine Register | ||
Instruction | LMR x, r | Instruction | SMR x, r |
Format | 1mm 111 xxx 000 rrrrrr | Format | 1mm 111 xxx 001 rrrrrr |
Semantics | X := MSR(R) | Semantics | MSR(R) := X |
Cycles | 5 | Cycles | 5 |
The LMR instruction reads a value from the specified MSR and stores it in the specified register. The write into X is affected by masking modes X and Y; LMR.X writes only the rd of the specified register, and LMR.Y writes only the wo.
The SMR instruction writes the value of register X into the specified MSR. The write may be affected by masking modes X and Y; conceptually, SMR.X writes only the rd of the specified register, and SMR.Y writes only the wo. However, it is possible for a MSR to be read-only, or writeable only in vector or scalar mode; that's up to the hardware or machine simulator.
Since the R field is 6 bits wide, there are 64 unique MSR numbers. However, some MSRs might be read-only, and others write-only; for example, a virtual machine implementation could arrange for reading from MSR #001 to perform keyboard input, while writing to MSR #001 would perform console output. MSR callback functions have access to the entire state of the virtual machine, so a single MSR could provide practically unlimited services. (For example, "writing" a single MSR could trigger a service that would read a series of words from memory and dispatch on their values. However, that kind of service seems to me antithetical to the simplicity and hardwariness of Fungus.)
To make it easier for the programmer when dealing with machines with different capabilities, this specification mandates that reading the value of an unimplemented machine register (i.e., one to which no other semantics have been attached) should always return zero, and writing to an unimplemented machine register should always have no effect.
My Fungus virtual machine provides hooks for the front-end to add machine-specific registers with whatever semantics it likes. Several MSRs are implemented by my virtual machine by default, and they are the subjects of the following subsections.
In general, I have organized the MSRs into the two categories "critical" and
"non-critical"; or equivalently, "hardware-targeted" and "simulator-targeted". The
"critical" MSRs all have numbers in the range 408 to 778, and
implement critical hardware functionality such as operating-system security and
asynchronous interrupts. The "non-critical" MSRs have numbers in the range
008 to 378, and provide helpful things like console I/O
that you wouldn't expect to find on a real hardware platform.
The first two "non-critical" machine-specific registers are INPUT
(MSR #008) and OUTPUT (MSR #018). These provide
a fairly high-level I/O interface, reading and writing ASCII characters on the
simulator's stdin and stdout.
A real hardware implementation would probably have an input system that generated
an asynchronous interrupt whenever a key on the keyboard was
pressed, and then the kernel would be responsible for bookkeeping tasks such as
maintaining the state of the Shift and Caps Lock keys and keeping a buffer
of unread input characters. But that would be really complicated, so in the
interest of keeping the simulator accessible, I've given it the INPUT register.
Reading from INPUT stores a nine-bit ASCII character in the destination register.
In masking mode X, only the rd is written; in masking mode Y, only the
wo is written; in vector mode, both half-words are given a copy of the
input character; and in scalar mode, the rd is sign-extended into the
wo. Reading from a stream that is at end-of-file results in a value of
−1 (7778). Writing to INPUT has no effect.
A real hardware implementation's output mechanism would depend entirely on
what kind of output device was hooked up to the machine. If the device were a
computer monitor, then part of low RAM would probably be set aside as a "video
memory area". The monitor would read data from this array and plot characters
or pixels on the screen according to their values. But that would be really
complicated, so in the interest of keeping the simulator accessible, I've given
it the OUTPUT register.
Writing to OUTPUT prints one or two ASCII characters to stdout.
In masking mode X, only the rd is printed; in masking mode Y, only the
wo is printed; in vector mode, first the wo is printed and then
the rd; and in scalar mode, the behavior is undefined. All characters
are ANDed with 25510 before being printed, so printing 7458
is equivalent to printing 3458. Reading from OUTPUT has no effect.
Finally, the simulator provides a way to halt the program. Writing any
value, in any mode, to the machine-specific register PRGMEXIT
(MSR #028) ends the program. The value written is used as the
simulator's return code to your real operating system, so
SMR $0, #PRGMEXIT is the equivalent of return 0;
in C or C++.
The Fungus specification claims that one of the design aims of Fungus was to support
"hardware contexts", so that programs could run in small sandboxes of the address space,
wrapping at the edges of those boxes, "thus forming a sub-torus of the super-torus that
is Fungus' main memory." I'm not sure what Chouchoulas meant, but I have implemented a
similar concept myself.
My Fungus virtual machine fells two rds with one stone — its implementation
of what I choose to call "hardware contexts" in the same stroke gives us a distinction
between "supervisor mode", in which all of memory is accessible; and "user mode", in
which the PC is restricted to a sandbox as described in the preceding paragraph.
The virtual machine's implementation uses three machine-specific registers,
named HCON, HCAND, and HCOR. (Their MSR numbers are #0408, #0418,
and #0428.) The HCON register is a read-only register; it cannot be changed
by the SMR instruction. Instead, it is set to zero whenever a TRP occurs,
and it is set to 1 whenever a RET occurs. (This happens in addition to their
usual semantics of saving and restoring PC and ΔPC.) The HCAND and HCOR registers
are writeable only when HCON is non-zero; if HCON is zero, then SMR $0,HCAND
has no effect. However, HCAND and HCOR are always readable with LMR.
When HCON is zero, the machine behaves exactly as described in the Fungus spec.
We say that the processor is in kernel mode.
When HCON is non-zero (i.e., 1), all address calculations are changed. Instead
of doing anything with a raw memory address A, we now use the value
(A & HCAND) | HCOR. Also, any time the PC changes,
its value is masked in the same way. In this situation, we say that the processor
is in user mode.
Corollary: If HCAND is 07777778 and HCOR is zero, then the setting
of HCON doesn't have any effect on the calculation of memory addresses. (However,
it still affects whether the processor is considered to be in kernel mode for other
purposes, such as the use of privileged instructions.)
Example: Suppose HCON is 1, HCAND is 0070078, and HCOR is
0705408. Then the only addresses accessible by the PC are of the form
07X54X8, where X means "don't care". If ΔPC is (000,001) and PC
is (076,547)8, then the next value of the PC will be (076,540)8.
The PC is effectively restricted to an 8-by-8 sandbox, until it executes a TRP
instruction.
Example: Suppose HCON is 1, HCAND is 1001008, and HCOR is 0000008.
Suppose ΔPC is (0,1) (i.e., east). Then the instruction LV $1,777
sets the PC to (100,100)8, and the next instruction executed will be fetched from
(100,100)8, because the change due to ΔPC is masked away by HCAND.
The next instruction is not fetched from (100,000)8, nor from
(100,101)8.
Simulator I/O
Hardware contexts
The next built-in MSR provided by my Fungus virtual machine is OSEC (MSR #0438). OSEC provides a facility through which an operating system can ensure that user-mode programs don't perform any actions that threaten the security of the system. For example, it would probably be a bad idea to let user programs use the SMR instruction to change the value of HCAND!
MSR #0438 is organized as a bit vector of 16 bits; the top two bits of the y component (the wo) are ignored. Before executing any instruction I, if HCON is non-zero, the hardware computes the four-bit value ((I.G << 3) | I.OP), and uses it as an index into OSEC. If that bit of OSEC is set (i.e., non-zero), then the hardware takes an asynchronous interrupt to (777,777)8.
In this way, the operating system kernel can protect itself from malicious user-mode code with the following instruction sequence on boot:
LV.Y $3, 0100 (* index 1111: LMR and SMR *) LV.X $3, 0201 (* index 0111: RET; index 0000: TRP *) SMR $3, #OSEC (* disallow those three instructions in user mode *)
User-mode code cannot write outside of its sandbox, and even if it writes an unsafe instruction such as TRP inside its sandbox in an attempt to break out, OSEC will catch the unsafe instruction and turn it into a harmless interrupt. (Of course, the operating system's handler for that interrupt had better be correct, or OSEC's vigilance will have been for naught.)
Notice that kernel mode is still allowed to use the "unsafe" instructions; otherwise, there would be no way at all to regain the full use of the processor, since SMR $X, #OSEC is one of the disallowed instructions!
For a computer to interact with the outside world, it needs some efficient way to accept input. For example, if the user presses a key on the keyboard, the processor needs to stop whatever it's doing and process that keystroke! One naive way to handle the problem is to poll the keyboard input device every few milliseconds, to see whether a key is being pressed; but polling is far too inefficient for real-world use.
Also, if we are to design a multi-processing operating system to run on Fungus, the system must have some way to switch between user processes — or to kill a malicious or runaway user process. To do that, the system needs to be able to wrest control from the current PC, without any special action on the PC's part.
To solve both of these problems, we introduce the notion of asynchronous interrupts. An asynchronous interrupt is a physical event (a keypress, say, or a timer tick) that interrupts the normal flow of control in the hardware, and sends the PC to an interrupt handler, in much the same way that a Fungus TRP instruction sends the PC to a trap handler.
Unlike traps, asynchronous interrupts can be called unexpectedly, at almost any time. Therefore, the usual trick of saving PC and ΔPC in $6 and $7 won't suffice; the program might be storing valuable information in $6 and $7 already, and wouldn't like the hardware to randomly overwrite those registers.
Therefore, the asynchronous interrupt system uses its own distinct mechanism to save and restore the values of the PC and all six other writable machine registers. The machine-specific registers ISTACK (#0508) and DISTK (#0518) are used for this purpose. When an asynchronous interrupt foo is taken, it is as if the following instructions have been executed:
Both ISTACK and DISTK are readable and writable.#ISTACK := #ISTACK + #DISTK SW.V $1, [#ISTACK++] SW.Y $2, [#ISTACK++] SW.X $3, [#ISTACK++] SW $4, [#ISTACK] SW.X $5, [#ISTACK--] SW.Y $6, [#ISTACK--] SW.V $7, [#ISTACK--] LMR $3, #ISTACK TRP #foo
The machine-specific register IRET (#0528) is write-only; reading from it returns the value zero. When a value foo is written to it, it is as if the following instructions have been executed:
LW.V $1, [#ISTACK++] LW.Y $2, [#ISTACK++] LW.X $3, [#ISTACK++] LW $4, [#ISTACK] LW.X $5, [#ISTACK--] LW.Y $6, [#ISTACK--] LW.V $7, [#ISTACK--] #ISTACK := #ISTACK - #DISTK HCON := foo
The last built-in MSR provided by the Fungus virtual machine is TICKS (MSR #0708). TICKS is read-only; writing to it produces no effect. Reading TICKS with LMR returns the number of "clock cycles" since the first instruction. (Each Fungus instruction takes roughly 4 clock cycles, as detailed in the spec. Jumps take a little longer, and traps take 8 cycles.)
The simfunge simulator can't run without a "program image" — something to put in memory so that the PC has something to execute. A memory full of zeros is just boring.
The program image is specified in a special kind of binary file, which contains not just the values of each cell in memory, but also a location for the PC to be at when the simulator starts running. This binary file is called an ELF file. The ELF format is a popular binary format used on most Unix and Linux platforms. simfunge deals with a particular kind of ELF file; clearly, you won't be able to run these ELF files on your home computer, because they contain Fungus instructions instead of x86 instructions! However, you should be able to use a tool such as GNU readelf to examine a FungELF file, if you find that sort of thing interesting. (On the other hand, GNU objdump is not that smart; it just complains "File format not recognized".)
The details of FungELF are not important. The only FungELF files you'll care about are the ones produced by fungasm, the Fungus assembler. It turns a plain-text, two-dimensional assembly file into a binary FungELF file, which can then be loaded back into memory and interpreted by simfunge.
Another way to create a FungELF file is to use the bef2elf utility to convert a plain-text Befunge file. The resulting binary will contain exactly the same information as the original text file, but with an ELF wrapper. It still won't run by itself on simfunge, because all the instructions in the file will be ASCII characters — TRP instructions.
The Fungus assembler is called fungasm. It converts a plain-text file written in Fungus assembly language into a binary FungELF file containing Fungus machine code.
Fungus assembly language, just like Fungus machine code, is two-dimensional. It is written (mostly) in a grid, like this:
.FILL ' ' .MACRO NOP LI.Y $0,777 .EQU OUTPUT 1 WORD 123456 GOE LV 3,DATA SW PC,[$3] GOS LW.S DPC, [5+6] GON NOP (* this is LMR $3, #OUTPUT .ENTRY .-(0,2) a comment *) NOP GOS {comment} SW.X $0, [++$PC] .ORG (4,3) WORD 0 .FILL 999d .ORG (777,000)
An assembly file consists of a series of sections, separated by blank lines. Each section consists of a two-dimensional block of Fungus instruction mnemonics and assembler directives (.EQU, .ORG, and so on). Assembler directives take up space in the grid just as instructions do, and directives refer to their own positions in the text; thus, the .ORG (4,3) directive in the above example indicates that its own space in the grid should be located at (004,003)8 — and therefore that the GOE instruction in the upper left corner will be located at (0,0).
If any section contains no origin — that is, if it consists entirely of assembler directives, and none of those directives is .ORG — then those directives are taken as global to the entire program. The first section in the program above is a global section, and it defines the global fill for the program (e_flags in the ELF header), as well as a global macro called NOP. The last section in the above program contains an .ORG directive, so its .FILL directive is taken as local (p_flags in the section header), not global. (But since the section is only 1x1 anyway, nothing ends up getting filled with 99910 anyway.)
Macros defined in non-global sections are taken as local to those sections, and are not visible in other sections. .EQU symbols are always visible globally, and their value is the same everywhere.
Comments in Fungus assembly follow the Pascal convention; they are bracketed by (* *) or curly braces { }. In the spirit of Funge, multi-line "block" comments are allowed; the assembler will treat two spaces in a comment as a "break" to the next line, and then treat any text under the previous line's comment text as part of the same comment. Comments, unlike instructions and directives, do not need to be lined up on the grid. Every comment must still have one opening delimiter and one closing delimiter, and they must match; it is incorrect to write
SMR $0, #2 {a trivial program} (* This comment is .ENTRY .-(0,1) invalid because it splits (* But this comment is into two *) pieces *) fine because it comes back together at the end *)
The assembler proceeds as follows when processing a file:
For examples of Fungus assembly language, see the asmdemos/ directory. One program that exercises most of the features of fungasm is kernel.asm [text, 22K], a ROM kernel for Befunge-93.
(where "hello.b93" is any Befunge-93 program) will produce the same effect as executing "hello.b93" in your Befunge-93 interpreter of choice.simfunge kernel.elf hello.b93
As explained in the introduction, the kernel works by loading into low memory a set of trap handlers, each of which implements the Befunge-93 instruction corresponding to its ASCII value. The kernel is responsible for maintaining the Befunge stack; doing non-trivial arithmetic such as multiplication and division; and inputting and outputting data. (For the input and output of individual ASCII characters, the kernel must depend on the kindness of the virtual machine. The kernel assumes that reading from MSR #00 will read a character from stdin, and writing to MSR #01 will write to stdout, as described in the section on I/O above. However, the kernel is responsible for algorithmically converting the sequence of ASCII characters "1","2","3" into the number 12310, and vice versa.)
I believe that "kernel.asm" implements all of Befunge-93 according to the specification, with the following exceptions (which may be fixed sometime in the future):