Long distance runners talk about “hitting the wall” — the point where their body runs out of resources, making it almost impossible to continue. I hit an intellectual wall late Monday night. Fortunately, I not only finished the race but went an unexpected extra mile. (And now my brain circuits are fried.)
Not an actual race (beyond a vague self-imposed deadline), but a hobby project like one of those ships in a bottle. A painstaking and challenging task in the name of fun and exercise of acquired skill. But no race, no ship, no bottle. This was a software project.
My inner geek rampant, I built a virtual 32-bit CPU (in Python).
To in some small way justify two weeks of intense intellectual effort, I’m going to post all about it. In great detail. In multiple posts. Full warning, this is deep geekery, and I expect most readers will sit this one out.
Truth is, I’m insufferably pleased with myself. My virtual CPU works well enough that I implemented a creditable Bloom filter for strings with it. (I won’t get into Bloom filters here. If you’re new to them and curious, see this Wikipedia entry or my recent blog post about them. Then you’ll get the funny behind this xkcd cartoon.)
One thing that made this project delightfully weird was that it’s one thing to write a piece of software that works (or doesn’t at first, usually). But this was software that ran other software, so there were bugs to find in both the Python code and the ASM code. Double the debugging fun. Here’s a sample of some of the ASM code:
@bloom.filter.probe
spbpi 16 ; set BP to SP
push r0
push r1
push r2
loadsf r0, 0 ; parm: word pointer
call @bloom.filter.hash1 ; get first hash
call @bf.test.filter ; r31=hash
jz @bf.test.fail ; jump if fail-->
call @bloom.filter.hash2 ; get second hash
call @bf.test.filter ; r31=hash
jz @bf.test.fail ; jump if fail-->
loadi r31, 1 ; apparent success
putm c3, 5, "=HIT!" ; write message to output
@bf.test.exit
puti c3, EOL ; write EOL to output
pop r2
pop r1
pop r0
ret
@bf.test.fail
loadi r31, 0 ; definite fail
putm c3, 5, "=fail" ; write message to output
jump @bf.test.exit ; jump to exit-->
This is the subroutine that, given a string, determines if that string is in the filter. The full filter and test code is over three hundred lines of ASM.
It was amusing, while debugging the Python CPU code, to get an error only to realize the CPU code was working just fine. It was a bug in my ASM code that was at fault. Nice to know your code works, but even nicer to know it doesn’t take crap from your other code. Made me laugh a lot during development (or maybe I was just getting punchy by then).
§
I should pause here, take a breath, back up, and explain myself. As always, the question is how much or how little is required. The urge towards accuracy and completeness often has me erroring on the too much side, so I hope you’ll indulge me. Skip this section if you know all about CPUs and software sims of them.
Nearly all modern computers use what’s known as the von Neumann architecture. By “nearly all” I mean it would be rare indeed to find a modern digital that didn’t use that basic architecture. (I’ll go so far as to say none are known to me. I only guess some must exist.) At the heart of this architecture is the “brain”, the Central Processing Unit (CPU). It moves the data around and does the number crunching, which is all computers can do.
The rest of a computer is just memory, input/output (I/O), a power supply, and wires connecting things. And maybe some blinkenlights to liven things up for humans. Any “intelligence” a computer has comes from its CPU.
So, most of the fun and action is in the CPU.
Digital computers go back to the 1950s, but microprocessors — an entire CPU on a chip — revolutionized and democratized computers in the early 1970s. These engineering and design marvels gave us the personal computing revolution and enabled the phones and other computing devices most of us now take for granted.
A CPU is a physical device. A simulation is software that acts like a CPU. It allows one to write and test code for a given CPU without requiring the physical device. Detailed and accurate CPU sims have long been used for development of systems that will run on those CPUs.
I wanted to create a sim of an imaginary CPU. Just for fun. A ship in a bottle project to exercise my mind. And to play in a favorite old sandbox.
§
Generally, a CPU simulation is of an existing physical chip, but I made up the CPU, which I’ve named CJ68 (yes, it is obviously an homage, no, I won’t explain it). This meant I got to invent and design three different things:
- The virtual CPU itself, especially the instruction set.
- The software implementing the CPU.
- The assembly language for programming the CPU.
The main reason I still muck about coding things is the chance to design software solutions for “problems” that interest me. For instance, I recently was inspired to write a Sudoku solver. Works well with most Sudoku puzzles (and Tredoku puzzles, which were the inspiration behind the project). Solves most of them instantly.
I was never much interested in puzzles and have never even tried Sudoku (let alone Tredoku). But writing code to solve these puzzles, that appeals to me, and I can get lost for hours or days chasing a good solution to an interesting software puzzle.
This time I got lost for about two weeks. Two intense weeks with long days and short nights. Nice to know I can still get really into a project at my age. (Not my age so much as my cynicism and ennui.). And this is a project I can continue to have fun with when I’m in the mood to play with assembly code.
§
Which itself requires some explanation. CPUs operate according to instructions — numbers — in the memory locations containing the program. At this level, a program for the CPU is just a (usually very long) string of numbers, some representing instructions, some representing data.
In the earliest days of computing, programmers had to create these strings of numbers by hand. In the early days of this project, I had to do that to create simple test programs for my budding CPU to run. To make this less tedious and error-prone, assembly languages offer a (slightly) higher-level abstraction using “op codes” — words rather than numeric codes — and other syntactical sugar to make it easier (or, honestly, possible) for humans to write code for the CPU.
Here’s another example of some assembly code:
@bloom.filter.hash1 ; HASH 1 - r0 points to word
push r0 ; r31 returns hash
loadi r31, 0 ; accum/return
@bf.char.loop1
loadb r30, r0 ; get char [r0]
iadd r30, 0 ; test char
jz @bf.char.exit1 ; jump if ZF-->
putb c3, r30 ; write byte to output
rxor r31, r30 ; XOR
irol r31, 3 ; ROL 3
incr r0 ; next char
jump @bf.char.loop1 ; loop->
@bf.char.exit1
puti c3, EOL ; write EOL to output
iand r31, 0x0fff ; mask out bits we use
call @bf.write.output ; save to IO
pop r0
ret
This is another subroutine from my Bloom filter. This bit implements one of the two simple hash functions the filter uses. At this level, code only does very simple things, such as move data around or perform math on it. This is generally the lowest level of coding humans do. It’s called “programming close to the metal” — using the basic language of the CPU.
Things that begin with an ‘@’ sign are locations (addresses). The first column, indented eight spaces, contains the op codes. For example, “loadi” (LOADI) means “load an immediate value into a register”. This gets turned into the number 79 (4F in hex) by the assembler.
The second column, indented 20 spaces, has the parameters for the instruction. Most instructions require at least one parameter. For example, the LOADI instruction needs to know which register and what immediate value. Registers are specified by an “r” followed by the register number. IO channels are specified by a “c” followed by the channel number. The fragment above writes to “c3” — the third output channel.
Any part of a line starting with a semi-colon is a comment ignored by the assembler.
§
Some specifics about my imaginary CPU, the CJ68 (named following the tradition of old beloved CPUs such as the Z80):
- 32-bit architecture (potentially expandible to 64 bits).
- 32-bit address space (over four trillion addresses).
- 32 registers (configurable; max is 256).
- Any number of I/O channels (configurable at run-time).
- An instruction set with 168 instructions.
- An interrupt facility and a DEBUG wedge.
- Memory management for memory block allocation and protection.
- Has an assembler that can create loadable files.
- Program loader for pre-assembled programs.
- Copious logging and reporting, including register and memory dumps.
It amounts to just under 4000 lines of Python code. I’m willing to share it if anyone is interested. I’ll be writing about the Python code in some detail on my programming blog, but that’s probably a few weeks out. When I do, the post will have a link to a ZIP file containing all the code.
This post is mainly a self-congratulation (and, let’s be honest, brag) and celebration. I’m in that slightly weird “postpartum” stage of completing a project that’s obsessed me for weeks. I have a massive TODO list, but I still feel a bit at loose ends, as if I have nothing to do now (oh, if only).
§
It has been a lot of fun working at the assembly level again. I haven’t gotten seriously “down to the metal” since the 1980s. Since then, it has been high-level languages, even unto SQL and some 4GLs.
I actually got to the point, with C++ and Java (and JavaScript, and even Visual BASIC) that I was beyond sick to death of writing those ubiquitous “for-next” loops:
FOR I = 0 to 9
REM ... do important stuff in the loop ...
NEXT I
Programmers will recognize that as BASIC, an old computer language invented for training programmers. The name stands for Beginner’s All-purpose Symbolic Instruction Code.
The FOR and NEXT statements above are exactly what I saw written on a blackboard the first day I walked into my first college programming class (circa 1977). The rest, as they say, is history. A more modern version, for instance in Java, looks like this:
for (int ix=0; ix <= 9; ix++) {
// ... do important stuff in the loop ...
}
But does the same thing. (There’s a whole discussion about using ix rather than plain old i — mainly that it’s easier to recognize or search for.)
In my current language, Python, the loop looks like this:
002| #… do important stuff in the loop …
003|
Which I find much cleaner and more elegant, and Python has more sophisticated ways of iterating over a list of things. Which removes some of the tedium of oft-repeated code patterns. As such, Python has become my second-favorite language (you’d never guess my favorite). This xkcd cartoon about Python is right on the money. Python is downright liberating. (And the cartoon is accurate about “import antigravity” — try it yourself.
In CJ68.asm, the loop looks like this:
loadi r20, 1
loadi r21, 10
@loop
; ... do important stuff in the loop ...
incr r20
decr r21
jnz @loop
Which is even more tedious, but that’s assembly for you. What’s hysterical to me (but I’m notably warped) is that I originally wrote it like this:
@loop
loadi r20, 1
loadi r21, 10
; ... do important stuff in the loop ...
incr r20
decr r21
jnz @loop
Which is an infinite loop. As happened many times on this project, I first thought my CPU code had a bug (in this case, somehow mysteriously grown in my absence). Then I realized that, once again, the CPU was doing what I’d said, but I’d said badly.
§
While high-level languages are cool, my experience has been that programmers who learned assembly, especially if they learned it early in their career, are usually superior programmers. Learning to understand how a computer functions at the lowest level seems to inform their understanding of high-level languages. One realizes how the high-level stuff has to translate to the low-level stuff.
To be blunt, I judge other programmers on whether they understand assembly-level programming. (Also on how many languages they’re fluent in. The more, the better.) Which is not to say there aren’t experts in, for instance, JavaScript. But I do think more well-rounded programmers, especially those who grok the metal, can run rings around them.
[As with science fiction, there is a question whether certain minds are prone and drawn to assembly-level coding (or science fiction) or does early exposure form essentially tabula rasa minds. Are these more like learning a language fluently or like being a natural athlete?]
§
I had a lot of fun designing the instruction set. I took a first run at this three or four years ago and got something that showed signs of working. But my first pass at the instruction set left much to be desired, and I realized I needed to sit down and think it through. I wanted a thorough and orthogonal instruction set. I set the project aside and never got around to doing that design work.
I decided recently to revive the project. As it turned out, I ended up revising the instruction set several times, once fairly late in the project (which was a pain). But I finally ended up with something I rather like. I may post it somewhere along with some other docs for the project, but right now I need to let my brain cells rest and do other things (like blogging).
And I’m going to bask in the glow of accomplishing a big-ish project that I’m pleased with. I might even open a bottle of champagne to celebrate!
§ §
The answer is Lisp. The first programming language to make me laugh out loud while reading the manual. “Lisp does that? So cool!” From a Programmer Geek’s point of view, an amazing and much imitated language. I’ve used some of its principles in my own designs.
Python approaches it in coolness and elegance. And makes me laugh with delight about its capabilities, too.
Stay assembled, my friends! Go forth and spread beauty and light.
∇











September 10th, 2024 at 1:53 pm
The aforementioned extra mile is that, late last night, I found a solution for relative jumps in the assembler. Fixed jumps are easy, but relative jumps require calculating where the IP needs to be relative to where it currently is.
The solution turned out to be simple and kind of elegant, but I thought solving it was going to have to wait for the version 2 (if ever I revisited the project for another pass). Here’s a sample of what it looks like:
loadi r10, -5 jmpneg $j001 putm c4, 8, "not neg" EOL jmp $j002 @j001 putm c4, 4, "NEG" EOL @j002 iadd r10, 5 jmpz $j003 putm c4, 9, "not zero" EOL jmp $j004 @j003 putm c4, 3, "ZF" EOL @j004 incrw r10 jmpnz $j005 putm c4, 5, "zero" EOL jmp $j006 @j005 putm c4, 3, "NZ" EOL @j006The leading dollar sign signals a relative jump. (A leading at sign indicates a fixed address.)
September 15th, 2024 at 3:56 am
Permission to be insufferably pleased with yourself is granted!
I have my own CPU project ‘work in progress’ – a 16 bit ultra low power CPU where instructions have been largely remapped from an existing CPU so i can use a C compiler with it.
The big difference is…
YOU FINISHED IT!
Well done sir!
BTW there are online AI-based translators from Python to C or C++ available (with a limited number of free conversions) in case you want to get your virtual CPU to run faster, or run it on a microcontroller! You would probably need to make some minor code changes for the conversion to work but it is fairly straightforward. Or maybe “This exercise is left for the reader.”
I look forward to v2.
September 15th, 2024 at 12:13 pm
Thank you! It was fun on multiple levels. I know what you mean about “FINISHED” — I have a lot of projects that aren’t. After just sketching a version, CJ68 sat on the shelf for several years. I’ve finally been retired long enough to start digging into them (and either tossing them or getting serious about them).
I’m very prone to being attracted to an idea, sketching enough of it to get some sense of its shape, but then realizing it’s going to take sitting down and thinking about, and I’m not in the mood for that, so put it on the shelf, and I’ll get back to it. And there it often sits for a long time. In part because I forget and move on, in part because when I do stumble across it, there’s still that, oh, right, gotta do some serious thinking on that one, and not being up for it at that moment. The CPU project was that way for all those years. Oh, right, gotta sit down and figure out good op codes. Eh, maybe later. So, there’s a lot of the shelf, is what it amounts to.
Translating this to C++ would work okay, but you’re right about some mods. I’m using Python’s reflection capabilities to dispatch the op codes to the methods that handle them.
# Read program byte... b = self.memory.read(self.pc) name = OpLabel[b] method_name = f'op_{name}' # Call operation... self.pc += 1 ftn = self.__class__.__dict__[method_name] ftn(self)So, I’d have to turn that into a jump table operation. But most of the rest of it would probably have good enough equivalence to translate pretty easily. (Easy enough that I’d probably just do it myself if I did it. It would be fun to write C++ again. Or maybe the heart is fond because I don’t?)
What 16-chip are you targeting? Anyone I know?
September 16th, 2024 at 3:25 pm
(I don’t seem to be able to reply to your reply so I’m replying to myself!)
I’m generally ensuring a mapping from Texas Instruments MSP430 instructions to my CPU (I’ve done the hard bit of the project which is too choose a name: XaVI!)
September 16th, 2024 at 5:38 pm
Ha, yeah, gotta give it a name so you know what to call it. Not familiar with the TI chip.
September 10th, 2024 at 5:35 pm
FWIW, I posted the CJ68 Op Codes list.
September 11th, 2024 at 12:16 pm
I have no idea what you’re talking about here, but congratulations!
September 11th, 2024 at 4:58 pm
Thanks!
September 11th, 2024 at 6:58 pm
If you ever need a character to speak in technobabble, you have a rich resource here. 😄