LOAD "PL/CBMBASIC",8,1: Commodore 64 BASIC for PostgreSQL

If you are of a certain age, the words 38911 BASIC BYTES FREE will do something to you that no amount of therapy can undo. You remember the blue screen. You remember typing in three pages of a listing from a magazine, getting ?SYNTAX ERROR IN 2340, and not knowing which of the three pages contained the typo. You remember that the disk drive was device 8, and that it was slower than continental drift.

I have some news. All of that now runs inside PostgreSQL.

PL/CBMBASIC is a procedural language extension that executes function bodies on Commodore 64 BASIC V2. Not a lookalike, not a tribute act: the actual Microsoft/Commodore interpreter from 1982, by way of Michael Steil's cbmbasic project, which statically recompiled the 6502 ROM into C. That C is compiled straight into the extension's shared library, so the interpreter lives inside your backend process. Every function call is an in-memory power cycle: zero the 64KB RAM array, reset the CPU registers, and re-enter the ROM at $E394. The whole ceremony costs about 15 to 20 microseconds, which is roughly a thousand times faster than the hardware ever managed, and quick enough to call per row over a large table without feeling guilty.

CREATE EXTENSION plcbmbasic;

CREATE FUNCTION hello(who text) RETURNS text AS $$
10 PRINT "HELLO, ";WHO$;"!"
$$ LANGUAGE plcbmbasic;

SELECT hello('WORLD');   -- HELLO, WORLD!

Yes, those are line numbers. Yes, they are mandatory. User code starts at line 10, like nature intended, because lines 0 to 9 are reserved: the extension injects your function arguments there as ordinary BASIC assignments before your code runs. A text parameter named who arrives as WHO$, a smallint named lives becomes a genuine 16-bit LIVES%, and everything numeric otherwise lands in a 40-bit CBM float, all nine glorious significant digits of it.

The validator has opinions, because BASIC V2 had opinions

Anyone who programmed a C64 for more than an hour discovered that you could not have a variable called TOTAL. The tokeniser crunched keywords anywhere, including inside identifiers, so TOTAL contained TO and became garbage. SCORE contained OR. BUDGET contained GET. Only the first two characters of a name were significant, so USERNAME and USERID... actually those were fine, US$ and US are different variables, but ALPHA and ALPS silently became the same string. And TI and ST were taken by the system.

The extension ships a validator, so PostgreSQL now delivers these opinions at CREATE FUNCTION time instead of leaving you to rediscover them at runtime:

ERROR:  parameter name "total" contains the BASIC keyword TO
HINT:  This is why nobody could ever have a variable called TOTAL
       on the Commodore 64.

Some traumas deserve better error messages than we got the first time round.

OUT parameters, by walking the variable table

When a BASIC program ends, its variables are still sitting in the emulated 64KB of RAM. So for OUT and INOUT parameters, the handler does the only reasonable thing: it walks BASIC's own simple-variable table, the 7-byte entries between VARTAB at $2D/$2E and ARYTAB at $2F/$30, decodes the type-encoded name bytes, and converts the 5-byte floats, 16-bit integers, and string descriptors back into SQL values.

CREATE FUNCTION divmod(num int, den int, OUT quot int, OUT rmd int) AS $$
10 QUOT=INT(NUM/DEN)
20 RMD=NUM-QUOT*DEN
$$ LANGUAGE plcbmbasic;

SELECT * FROM divmod(47, 5);   --  quot | rmd
                               -- ------+-----
                               --     9 |   2

PEEKing another process's memory for its results is not a pattern I expect to see in the PostgreSQL documentation any time soon, but it feels deeply right here.

The database is device 8

Here is the part I am most pleased with. On a Commodore 64, your data lived on the disk drive, device 8, and you spoke to it with OPEN, INPUT#, GET#, PRINT#, CLOSE, and the ST status variable. So in PL/CBM-BASIC, device 8 is the database. The "filename" you OPEN is an SQL statement, executed through SPI inside your transaction:

CREATE FUNCTION top_scores() RETURNS text AS $$
10 OPEN 1,8,0,"SELECT NAME, SCORE FROM HISCORES ORDER BY SCORE DESC"
20 INPUT#1,N$,S
30 IF ST<>0 AND N$="" THEN 60
40 PRINT N$;" ";S
50 IF ST=0 THEN 20
60 CLOSE 1
$$ LANGUAGE plcbmbasic;

Column values stream back one CR-terminated record at a time, and ST picks up the EOF bit (64) on the final byte, the same way it did from a 1541. The read-until-done loop you wrote in 1985 works unchanged, including the part where you had to be careful about empty files, which is what line 30 is doing.

It gets better. Secondary address 15 was the drive's command channel, the one you would PRINT# DOS commands to and read 00, OK,00,00 back from. That works too:

10 OPEN 15,8,15
20 PRINT#15,"DELETE FROM HISCORES WHERE SCORE < 1000"
30 INPUT#15,EN,EM$,RC,ES

The status record is 0,OK,<rows>,0, in honour of the original. Each PRINT# appends and the terminating CR executes, so you can build statements longer than BASIC's 255-character string limit by sending them in pieces. INSERT, UPDATE, DDL, whatever you like. It is an untrusted language for superusers only, so you were always going to be able to do terrible things; now you can do them with a FOR loop and no lower-case letters.

Runtime errors are trapped through the interpreter's own ERROR vector at $0300, the same plugin mechanism Simons' BASIC used, so a mistake arrives as a proper PostgreSQL error carrying the genuine ROM error code:

ERROR:  BASIC error: ?DIVISION BY ZERO  ERROR IN 20

And because the KERNAL's STOP routine (the RUN/STOP key scan, polled before every statement) is patched to CHECK_FOR_INTERRUPTS(), the immortal 10 GOTO 10 dies cleanly to statement_timeout. The RUN/STOP key finally works reliably. It only took 44 years.

Is it fast? Define fast

I benchmarked identical functions against PL/Python. PL/Python wins, as you would expect from a language that keeps a warm interpreter around instead of rebooting a Commodore 64 for every function call: roughly 14 to 19 times quicker on trivial calls, narrowing to about 6 times on a workload that queries 100 rows from inside the function, where both sides pay the same SPI costs. But the C64's floor is that 15 microsecond power cycle, which puts it in the same cost bracket as a non-inlined SQL function, and the interpreter chews through about a million BASIC statements per second. The machine this ROM shipped on managed about a thousand. A 1982 8-bit interpreter getting within an order of magnitude and a half of CPython, while performing a full power-on reset per call, is a better showing than anyone has a right to expect.

The fine print

It is 1982 in there, and the extension does not pretend otherwise. Everything is uppercased, including your string literals, because the C64's default character set had no lower case worth speaking of. Strings top out at 255 characters. Floats have nine significant digits. There is no NULL, because NULL had not been invented; it arrives as an empty string, and you can COALESCE in the query if you care. INPUT raises an error, because there is no keyboard attached to your database. POKE and SYS work against the emulated 64KB if you enjoy danger, and the worst a wild POKE can do is confuse the machine until the next call's power cycle.

The code is at github.com/darkixion/pl-cbmbasic, with a regression suite and a README containing the rest of the fine print. Superuser required, production deployment... would be a choice, nostalgia mandatory.

READY.
█

Comments

Popular posts from this blog

PostgreSQL docs reloaded

Deprecated features in PostgreSQL - Past to present

Don't let your PostgreSQL get exploited