270x Filetype PDF File size 2.12 MB Source: www.codeproject.com
How to create your own virtual machine! Part I
Presented by: Alan L. Bryan
A.k.a. Icemanind
Questions? Comments? Email me at
icemanind@yahoo.com
Please leave feedback if you enjoyed this tutorial. The more feedback I get, the more it’ll make me want to write Part II
Introduction
Welcome to my tutorial on virtual machines. This tutorial will introduce you to the concept of a
virtual machine and then we will, step by step, create our own simple virtual machine in C#. Keep in
mind that a virtual machine is a very complicated thing and even the simplest virtual machine can take
years for a team of programmers to create. With that said, don’t expect to be able to create your own
language or virtual machine that will take over .NET or Java overnight.
In this tutorial, we will first layout the plan for our virtual machine. Then we will create a very
simple intermediate language. An intermediate language is the lowest level language still readable by
humans. It is comparable to assembly language, which is also the lowest level language on most
computers. The first program we will create will be a very simple intermediate compiler that will convert
our intermediate language to bytecode. Bytecode is a set of binary instructions that our virtual machine
will be able to directly execute. It is comparable to machine language, which is a set of binary or
machine instructions that all computers and CPUs understand. This virtual machine will be our second
project. It will be a virtual machine, created from scratch in C# that will execute our bytecode. It will be
very simple at first, but then we will expand it by adding threading support and dual screen outputs
(you’ll find out what I’m talking about later).
All of the code in this tutorial is created using Visual Studio 2008 Professional, targeting the .NET
Framework 2.0. Since I’m targeting the 2.0 framework, you should be able to use Visual Studio 2005 as
well. Since creating a virtual machine really does dive down into the nuts and bolts of how computers
work, I am assuming the reader of this has a pretty good, or a basic knowledge of, programming,
hexadecimal and binary number systems, and threading. It would also really help to know something
about assembly language, although I will try to help you understand things on a need-to-know basis.
If I haven’t scared you off and you’re still interested in how to make a virtual machine, then let’s
begin!
How to create your own virtual machine in a step-by-step tutorial 2009
Brought to you by icemanind
Planning it out
As described in the introduction, the first thing we will want to do is draw out a rough blue print
of what our machine will be able to do. I have decided to call our machine, B32 (Binary 32), although, for
simplicity’s sake it will not be a 32-bit machine. It will be a 16-bit machine. B32 will have 64K of memory
and it can be addressed anywhere from $0000 - $FFFF. A B32 executable program can access any part of
that memory. Along with a 64K memory space, we will introduce 5 registers into our virtual machine. All
CPU’s and all virtual machines have what’s called registers. A register is similar to a variable. Registers
hold numbers and depending on how large the register is, determines how large of a number it can
hold. Unlike variables, however, registers do not take up memory space. Registers are “built into” CPUs.
This will make more sense once you see an example, which is coming up real soon.
To keep things simple, we will only implement 5 registers into our virtual machines. These
registers will be called A, B, D, X and Y. The A and B registers are only 8 bits in length, which means each
register can hold any number between 0 and 255 unsigned or between -128 to 127 signed. For now, we
are going to worry only about unsigned integers. We will get into signed later and we will briefly touch
on floating point numbers later. The X, Y and D registers will be 16 bits in length, capable of storing any
number between 0 and 65,535 unsigned or between -32768 to 32767 signed. The D register will be
something of a unique register. The D register will hold the concatenated values of the A and B registers.
In other words, if register A has $3C and register B has $10, than register D will contain $3C10. Anytime
a value in the A or B register is changed, then the value in the D register is also changed. The same is
true if a value in the D register is changed, the A and B registers will be changed accordingly. You will see
later why this is handy to have.
This has been a lot of dry talk, but here is a picture to represent our B32 registers:
B32Registers
A B X Y
8 8 16 16
bits bits bits bits
{
D
16
bits
Hopefully this makes sense to you. If not, you will catch on as we progress through the tutorial.
Earlier when I told you that our virtual machine had 64K of free memory for an executable to
use, that was not entirely true. Really it’s only 60K because 4000 bytes must be reserved for screen
3
How to create your own virtual machine in a step-by-step tutorial 2009
Brought to you by icemanind
output. I’ve chosen to use $A000 - $AFA0. This area of memory will map to our screen. In most CPUs and
most virtual machines, this memory is mapped inside the video card memory, however, for simplicity; I
am going to share our 64K of memory with our video output. This memory will give us an 80x25 screen
(80 columns, 25 rows). You may be thinking right now, “I think your math is off dude. 80 times 25 is only
2000”. This is true; however, the extra 2000 bytes will be for an attribute.
For those of us old enough to remember programming assembly language, back in the old DOS
days, will already be familiar with an attribute byte. An attribute byte defines the foreground and
background color of our text. How it works is the last 3 bits of the byte make up the RGB or Red, Green,
th
Blue values of our foreground color. The 4 bit is an intensity flag. If this bit is 1 then the color is
brighter. The next 3 bits make up the RGB values of our background color. The last bit is not used (back
in DOS days, this bit was used to make text blink, but in B32, it is ignored). You will see later how colors
are created using this method.
The final part of this section will define the mnemonics and the bytecode that make up a B32
executable. Mnemonics are the building block of our assembly language code that will be assembled to
bytecode. For now, I am only going to introduce enough for us to get started and we will expand on our
list throughout this tutorial. The first mnemonic we will introduce is called “LDA”. “LDA” is short for
“Load A Register” and what it will do is assign a value to the A register. Now in most CPUs and virtual
machines, you have what’s called addressing modes. Addressing modes determine how a register gets
its value. For example, is the value specified directly on the operand (an operand is the data that follows
the mnemonic) or does it pull a value from somewhere in memory or is loaded from a value assigned to
another register? There can be dozens of addressing modes, depending on how complex of a virtual
machine you want to create. For now, our virtual machine will only pull data directly specified in the
operand. We will assign this mnemonic a bytecode value of $01. Since we decided earlier that the A
register can only hold an 8 bit value, we now that the entire length of a “LDA” mnemonic that pulls
direct data from the operand will be 2 bytes in length (1 byte for the mnemonic and 1 byte for the data).
The next mnemonic we will discuss will be called “LDX”. “LDX” is short for “Load X Register” and,
just like “LDA”, it will load a value directly into the X register from the operand. Another difference
between “LDX” and “LDA” is the length. Since our X register can hold 16 bits of data, that means the
total length of the bytecodes will be 3 bytes instead of 2 (1 byte for the mnemonic and 2 bytes for the
data). We will assign this mnemonic a bytecode of $02. If I lost you guys, keep reading and I promise this
will make sense when we look at some examples.
The next mnemonic we will discuss now will be called “STA”. “STA” is short for “Store A
Register” and its function will be to store the value contained in the A register into a location
somewhere in our 64K memory. Unlike our load mnemonics, which pulls the value directly from the
operand, our store mnemonic will pull its data from the value stored in one of the 16 bit registers. We
will assign this mnemonic a bytecode of $03.
The final mnemonic we will discuss is call “END”. “END” will do exactly that. It will terminate the
application. All B32 programs must have an END mnemonic as the last line of the program. The operand
for the END mnemonic will be a label that will point to where execution of our B32 program will begin.
4
no reviews yet
Please Login to review.