This is a very long article (36k) of interest to hardware
hackers, assembly language programmers, and machine
architects.  It is a description of how I feel the
65xxx family should evolve.  Don't count on anything
you read here.  Nonetheless, you might find it interesting.
If you're not one of the aforementioned types, you may
want to skip the noise which follows...


The 65C816 Dream Machine

    This essay is an attempt to vent my frustrations.  
While the 65C816 chip is, without question, better than 
the 6502 and 65c02 chips that preceded it,  the 65c816 
leaves a lot to be desired.  Unless you count 
microcontrollers like the 8048, F8, or 8051, I've never 
encountered a chip as difficult to program in assembly 
language as the 65c816.  Those stupid M and X bits cause 
so much trouble I wonder if they're worth the trouble of 
using them.  Attempting to use the 65c816 in native mode 
while attempting to coexist with other 6502 routines 
(requiring emulation mode) such as ProDOS 8 can really 
push one's patience.  But wait!  There's a small chance 
things can be improved.  The WDM (William D. Mensch) 
instruction is reserved by the Western Design Center for 
instruction set expansion.  While I'm sure Mr. Mensch has 
other plans for this opcode, the following treatise 
provides my views on how this single opcode should be 
used.

    The WDM opcode should be used in the next version of 
the 65c816 (let's call it the 65c820, just to be amusing) 
to change the instruction set.  When the 65c820 resets,  
it should come up in the 6502 emulation mode, just like 
the 65c816 does now.  The XCE instruction could be used 
to switch to 65c816 mode just like the existing 65c816 
part.  The WDM opcode, which I'll call NAT (for NATive 
mode) will be used to switch the processor to 65c820 
native mode.  Once in the 65c820 mode, the 65c820 takes 
on a completely different character.  The only bounds 
I've placed on the new instruction set is that if you can 
perform an operation with a single instruction on the 
65c816, you can perform the same thing on the 65c820 with 
a single instruction.  All other aspects (including 
timing and instruction size) can vary.  I've also taken 
some liberties with the way certain instructions affect 
the flags.  For the most part however, 65c816 
instructions have an identical counterpart on the 65c820.

    Design Issues:  There are lot's of reasons for 
designing a new instruction set. My criteria are as 
follows:

1) The instruction set must mirror the philosophy of the 
6500 family.  A programmer    experienced with the 6502 
instruction set must feel comfortable with the 65c820    
instruction set.

2) The new instruction set must support high level 
language constructs better than    the 6502 and 65c816 
processors.

3) The new instruction set must be easy to learn and fun 
to use.

4) We must remember that fancy instructions are very 
difficult to implement in    silicon.  Hence super fancy 
instructions which provide limited functionality    must 
be left out.  For example, the 65c820 doesn't support 
floating point    instructions (although they could be 
added via a coprocessor).

5) The only (commercially popular) computer system that 
would ever use the 65c820    is an upgrade of the Apple 
IIGS.  Hence the instruction set should contain    
instructions that enhance the operation of an Apple II 
family machine.

6) The original 6502 instruction set was designed with a 
small set of basic    instructions complemented with a 
large set of addressing modes.  The 65c816    strayed 
from this philosophy, the 65c820 returns to it.

Based on these design issues, I offer the following 
machine; the 65c820:


_________________________________________________________
______________________

Programmer's Model:

    The 65c820 will contain several additional registers, 
above and beyond those available on the 65c816.  All 
registers are 16 bits.  The register bank includes:

A, AX      -- Accumulator and accumulator extension 
X -- X index register 
Y -- Y index register           
F -- Stack frame pointer 
S -- Stack pointer           
D -- Direct page register 
P          -- Program status word 
ABR        -- Auxillary bank register 
SBR -- Stack bank register 
DBR        -- Data bank register 
PBR -- Program bank register 
PC         -- Program counter 
LBound     -- Low bounds register 
HBound     -- High bounds register 

A, X, Y, S, D, & PC are mostly identical to their 65c816 
counterparts. AX is the accumulator extension used by the 
multiply and divide instructions. F is a special index 
register, useful for accessing local variables and 
parameters. P differs from the 65c816 version in that it 
is 16-bits wide.  Accessing the upper byte of this 
register is a privileged operation (more on that later 
on). DBR and PBR are similar to their 65c816 cousins, 
except they are now 16-bits long and allow you to 
position the program and data banks on any PAGE boundary 
(rather than a bank [64K] boundary). ABR is an auxillary 
data bank register. SBR lets you locate the stack 
anywhere in the 16Mbyte address space. LBound and HBound 
provide some rudimentary memory management functions.  
All memory addresses are added to LBound to produce the 
true physical address.  If the result- ing address is 
greater than HBound, an ABORT trap will be issued.  This 
allows you to load multiple programs into memory and 
protect them from being walked on by other programs.

    As I alluded to earlier, certain operations are 
PRIVILEGED.  The 65c820's program status word takes the 
following form:

 15 14 13 12 11 10  9  8  |  7  6  5  4  3  2  1  0 U/S  
I  M fpc *  *  *  *  |  N  V  *  *  D dir  Z  C

The low order 8 bits are identical to the 6502's P 
register except the B bit isn't present (it's not 
required) and the I bit has been moved to bit 14.  The 
dir bit controls the direction of various string 
instructions (ala 8086).  The low order 8 bits are called 
the USRPSW (user program status word).  The upper 8 bits 
are called tye SYSPSW (system program status word) and 
can only be accessed while in the system mode.  Bit 15 
(U/S) is the user/supervisor bit.  This bit determines 
whether or not you are in the user or system (supervisor) 
mode.  Bit 14 is the interrupt disable bit.  For 
protection reasons, a user mode program cannot have 
access to the interrupt disable bit (by turning off all 
interrupts and not turning them back on, a user mode 
program can cause all kinds of havoc). Bit 13 is the 
memory management bit.  If set, the LBound the HBound 
registers determine the location and extent of the 
logical address space.  If clear, then the logical 
address space and physical address space are the same.  
The fpc bit determines if a floating point coprocessor is 
installed.  If not, the floating point expansion 
instructions will cause an illegal instruction trap, 
otherwise, the FP instructions will be routed to the 
floating point coprocessor. The remaining bits in the P 
register are reserved for future use.


Opcode Format:

The 65c820's instruction set is broken down into 32 
classes.  They are

 0-MOV,  1-LEA,  2-LEAA,  3-LEAD,  4-LEAS,  5-XCHG,  6-
ADD,  7-ADC  8-SUB,  9-SBC,  A-CMP,   B-AND,   C-OR,    
D-XOR,   E-ASH,  F-LSH 10-ROT, 11-BIT, 12-ADDQ, 13-CMPQ, 
14-exp,  15-exp,  16-exp, 17-exp 18-exp, 19-Scc, 1A-Ccc,  
1B-Icc,  1C-Brnch,1D-Brnch,1E-exp, 1F-exp

"exp" refers to expansion.

The "typical" instruction format (for opcodes $00..$11) 
is

15 14 13 12 11 10  9  8  |  7  6  5  4  3  2  1  0  a  a  
a  a  a  a  s  d  |  r  r  r  o  o  o  o  o

where       a = addressing mode bits       s = size 
(0=byte, 1=word)       d = direction (0=to addressing 
mode loc, 1=from addressing mode loc)       r = register       
o = opcode (one of the group values above).

There are 64 possible addressing modes (since there are 
six "a" bits).  The register bits refer to the first 
eight of these addressing modes (0..7).

0- A        10- d,F        20- d,X        30- d,Y 1- X        
11- a,F        21- a,X        31- a,Y 2- Y        12- 
n(d,F)     22- a,FX       32- a,FY 3- S        13- n(a,F)     
23- l,X        33- l,Y 4- F        14- n[d,F]     24- (X)        
34- (Y) 5- TOS      15- n[a,F]     25- (d,X)      35- 
(d),Y 6- Imm      16- (d,F)      26- n(d,X)     36- 
n(d),Y 7- d        17- [d,F]      27- [d,X]      37- 
[d],Y 8- a        18- (a,F)      28- n[d,X]     38- 
n[d],Y 9- l        19- [a,F]      29- n(d,FX)    39- 
n(d,F),Y A- d,S      1A- P          2A- n[a,FX]    3A- 
n[a,F],Y B- (d,S),Y  1B- D          2B- [a,FX]     3B- 
[a,F],Y C- (d)      1C- ABR        2C- (a,X)      3C- 
(d),Y+ D- [d]      1D- SBR        2D- n(a,X)     3D- 
(d),-Y E- n(d)     1E- DBR        2E- [a,X]      3E- 
[d],Y+ F- n[d]     1F- PBR        2F- n[a,X]     3F- 
[d],-Y

where:

A, X, Y, S, F, P, D, ABR, SBR, DBR, and PBR are the 
corresponding 65c820 registers. Imm refers to an 
immediate operand. d refers to an eight-bit value, 
usually (but not always) a direct page address. a refers 
to a 16-bit absolute address. l refers to a 24-bit long 
address n is a displacement of the form    one byte, +/- 
64 if the H.O. bit is zero.    two bytes, H.O. byte 
first, +/- 16383 if the H.O. bit is one.

All addressing mode containing F, FX, or FY are relative 
to the SBR register. Any "d" address appearing in such an 
addressing mode is simply an 8-bit displacement relative 
to the frame pointer.  FX means add F and X and use the 
result as the frame pointer.  FY is the same, but using 
the Y register.

Y+ and -Y are autoincrement and autodecrement addressing 
modes.  For autoincrement, the Y register is incremented 
after the value contained in Y is used.  For auto- 
decrement, the Y register is decremented before the value 
is used.  

Addressing modes of the form  n[---]-- compute the 
effective address specified by the indirect operation and 
then add the specified offset to the effective address to 
obtain the true effective address.  For example, if Y 
contains 5 and location $00 points at $1000 in the DBR, 
then  4(0),y refers to location $1009.

The TOS addressing mode refers to the Top Of Stack, more 
on this later.


General Instructions:

The general instructions (opcodes $00..$11) all take the 
form:

                      Instr  Source, Dest

Where Instr is the instruction mnemonic,  Source is the 
address of a source operand, and Dest is the address of a 
destination operand.  At least one of the two operands 
must be a "register" addressing mode.  The register 
addressing modes are the first eight addressing modes 
listed above.  If the source operand is a register 
addressing mode, then the direction bit in the 
instruction is zero, otherwise it is one. If the source 
addressing mode is the immediate addressing mode, the 
flags are set by the result of the operation, but nothing 
else is changed.  For example, MOVB #0,#n  sets the zero 
flag since a zero bit is moved, but the zero isn't 
actually moved anywhere.  Note that a "B" or "W" suffix 
is used on the mnemonics to specify the instruction size.

Three important register addressing mode greatly enhance 
the capabilities of the 65c820 processor: the TOS, Imm, 
and d register addressing modes.  Since d is a register 
addressing mode, any direct page memory location can be 
used as a "register".  This greatly enhances the 
flexibility of the 65c820.  This effectively gives you 
256 registers to play around with.  

    The Imm addressing mode, since it is a register 
addressing mode, lets you perform operations between any 
register or memory location in the machine (addressable 
by a single instruction) with an immediate operand.  For 
example, CMPB  #5,2[3,D],Y   is perfectly legal.  For a 
few instructions, immediate operands don't make much 
sense, such instructions will cause an illegal 
instruction trap (for example, you cannot load the 
effective address of an immediate operand into a 
register).  

    The TOS addressing mode is extremely powerful.  If 
you've looked ahead at the expansion instructions, you'd 
have noticed that there aren't any specific push or pop 
instructions (unless you count ENTER, EXIT, SAVE, and 
RESTORE).  The TOS addressing mode handles all of this 
for you.  You want to push the accumulator onto the 
stack?  No problem,  MOVW A,TOS will do the job.  You 
want to pop the X register off of the stack?  Use MOVW 
TOS,X.  You want to add the item on the top of stack to 
the item below it on the stack (a VERY common operation 
performed by compilers), just use ADDW TOS,TOS.  This 
instruction will pop two words off of the stack, add 
them, and push their sum back onto the stack (leaving two 
bytes on the stack rather than the original four).   With 
the TOS addressing mode, you can push (or pop) any value 
anywhere in addressable memory onto the stack with a 
single instruction.


Special (but not expansion) Instructions:

There are seven groups of instructions in this category: 
ADDQ, CMPQ, Scc, Ccc, Icc, and the branch instructions.

ADDQ (add quick) appears in place of the ubiquitous INC 
and DEC instructions. ADDQ lets you add a four-bit signed 
value to any addressable item.  The register bits, along 
with the direction bit, let you specify a signed four-bit 
value.  This value is added to the specfied address.  The 
immediate operand MUST be the source operand.

The CMPQ (compare quick) is similar except a compare 
operation is performed rather than an addition.  
Furthermore, the immediate operand is the destination 
operand rather than the source operand.

The Scc (set on condition), Ccc (clear on condition), and 
Icc (invert on condition) instructions are used to set 
boolean values based on the condition codes.  These go 
hand in hand with the branch instructions so I'll 
describe them all at once.

There are 16 possible conditions, the register and 
direction bits are used to specify the condition.  These 
conditions are

0- RA/A      4- HI       8- GT        C- PL 1- CC/LO     
5- LT       9- EQ        D- VC 2- CS/HS     6- GE       
A- NE        E- VS 3- LS        7- LE       B- MI        
F- SR/N

LO (lower) = unsigned less than HS (higher/same) = 
unsigned greater than or equal LS (lower/same) = unsigned 
less than or equal HI (higher) = unsigned greater than LT 
= signed less than GE = signed greater than or equal LE = 
signed greater than or equal GT = signed greater than

The Scc, Ccc, and Icc instructions take the form:

         Scc{b|w}  #Imm, Dest          Ccc{b|w}  #Imm, 
Dest          Icc{b|w}  #Imm, Dest

If the immediate operand is zero, then Scc will store a 
one into the specified location if the condition code is 
met, otherwise a one will be stored.  Ccc does just the 
opposite, it stores a zero if the condition is met, one 
otherwise. The Icc instruction will complement the 
specified location (logical NOT) if the condition code is 
met.  If the immediate operand is not zero, the the Scc 
in- struction will set the specified bits in the 
destination operand if the condition code is met, the Scc 
instruction will have no effect if the condition is not 
met. The Ccc instruction clears the specified bits in the 
destination operand.  The Icc instruction inverts the 
specified bits.  The destination bits are specified with 
ones in the immediate operand.  For example, SCS #$88, 
$00 will set bits three and seven in memory location zero 
if the carry flag is set,  location $00 will be 
unaffected if the carry flag is clear.

The SA/CA/IA (Set always, Clear always, Invert always) 
instructions always perform the specified operation.  The 
SN/CN/IN (set never, clear never, invert never) behave as 
though the condition code was not met.

The branch instructions are unusual compared to those 
encountered thus far.  The instruction is only one byte 
long.  It takes the form:

  7  6  5  4  3  2  1  0   o  o  o  --1C  or 1D-- 

If the opcode is $1C, then the three "o" bits represent 
condition codes 0..7 above. Note that the BRA instruction 
uses opcode bits %000.

If the opcode is $1D, then the three "o" bits represent 
condition codes 8..$F above. There is no BN instruction,  
Opcode %111 is the BSR (branch to subroutine) 
instruction.

Unlike the 65c816, branches are not limited to +/- 128 
bytes.  A displacement value, similar to the used by the 
general addressing modes allows a one-byte displacement 
of +/- 64 bytes or +/- 16383 bytes.  More than enough for 
most cases.



Math expansion instructions:

The math expansion instructions (opcode $14) use the 
three register bits as an opcode expansion yield eight 
additional instructions.  The instruction format is

 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0   a  a  
a  a  a  a  s  d  o  o  o  1  0  1  0  0   where "aaaaaa" 
is a general addressing mode, "s" is the size (B/W), "d" 
is the direction (load/store), and "ooo" is the sub-
opcode, decode as follows:

0- MUL    1- DIV    2- MOD   3- REM   4- INDX    5- CHK    
6- DIVS   7- FPexp

Sub-opcode 7 is reserved for floating point expansion via 
a coprocessor.  If the FPC bit in the SYSPSW is not set, 
then executing this opcode will cause an illegal 
instruction trap.  If the FPC bit is set, then an 
additional eight bit opcode follows this instruction.  
This opcode value plus the physical address provided by 
the addressing mode, bounds registers, and applicable 
prefix(es) are passed along to the coprocessor.

    All of these instructions use the 65c820 accumulator 
as the register operand. MULW performs an unsigned 16x16 
multiply, leaving the 32-bit result in A, AX. MULB 
performs an unsigned 8x8 multiply, leaving the result in 
A. DIVW performs an unsigned 32/16 division.  The value 
in (A,AX) is divided by the specified operand and the 
quotient is left in (A,AX).  DIVB divides the 16-bit 
accumulator by an eight bit value, leaving the result in 
A.  DIVS{W|B} perform signed divisions.  These two 
instructions operate on the 16-bit accumulator or 8-bit 
accumulator ONLY.  The AX register is not used. MOD and 
REM compute the modulo and remainder functions (MOD is 
unsigned, REM is signed).  Their register usage is 
identical to DIV/DIVS.  There is no need for a signed 
multiply instruction since signed and unsigned 
multiplication produces the same result, assuming you 
ignore the value in AX.

    The INDX and CHK instructions are used to perform 
array computations.  The operand of these two 
instructions points at a pair of bytes or words.  The 
INDX instruction multiplies the accumulator by the first 
value and then adds the second value to the accumulator.  
The direction bit in the opcode is ignored. The INDX 
instruction takes two forms: INDXB and INDXW.

    The CHK instruction compares the value in the 
accumulator against the first and second values.  If the 
accumulator lies within these two values (inclusive) then 
the overflow flag is cleared.  If the accumulator is 
outside the range of these two values, then the  overflow 
flag is set.  The direction flag in the opcode is used to 
determine whether a signed or unsigned comparison is 
used. The CHK instruction takes four forms: CHKSB, CHKSW, 
CHKUB, and CHKUW.  The "U" and "S" specify unsigned or 
signed.



String expansion instructions:

    Opcode $15 is used for string operations.  The 65c820 
processor provides four basic string operations: MOVS 
(move), CMPS (compare), XLATS (translate), and FILLS 
(fill).  The instruction format is as follows:

 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0   s  s  
s  d  d  d  l  l  l  o  o  1  0  1  0  1

where "sss" is the source address, "ddd" is the 
destination address, "lll" is the length address, and 
"oo" is the opcode.  Since all of the addresses are three 
bits, they must be register addresses.  The source 
address is a sixteen bit value taken from one of the 
register addressing modes.  The sixteen-bit value 
obtained at said address is the start of the string 
within the data bank (i.e., relative to the DBR 
register).  The destination address is also a sixteen-bit 
register addressing mode value, specifying the start of 
the destination address within the auxillary bank (i.e., 
relative to the ABR register).  The length value is a 
sixteen-bit quantity obtained directly from the register 
addressing mode location.  Prefix bytes (described later 
on) are not allowed in front of a string instruction.

Opcode assignments:

0- MOVS 1- CMPS 2- XLATS 3- FILLS

  The direction of the string operation is specified by 
the "dir" bit in the USRPSW register.  If the bit is 
clear, then the source and destination operands are 
incremented after each string operation.  If the "dir" 
bit is clear, then these operands are decremented after 
each string operation.

    The string instructions take the form:

              MNEMONIC   src, dest, len

where src, dest, and len are any of  A, X, Y, S, F, TOS, 
#value, or a direct page address.   For the these 
operands, the sixteen bit value specified by one of these 
addresses is used, relative to the DBR, as the address 
(or length) of the specified block.  An absolute address 
can be specified by an immediate operand.  The direct 
page address is the address of the 16-bit value within 
the direct page, it does not mean that the address of the 
block is that address in the direct page.  Same with the 
TOS, the value on the top of stack contains the address, 
the top of stack is not the block itself.  The len 
operand is always a byte count.  Unless an immediate 
operand is specified, the operands are always updated to 
reflect their new value at the termination of the block 
operation.

    The MOVS instruction is used to move a string of 
bytes from one location to another.  A block of "len" 
bytes specified by DBR/src is moved to ABR/dest.

    The MOVS operation is an example of an instruction 
that does not exactly mirror its 65c816 counterpart.  It 
may take two (or more) instructions to perform the same 
operation as the 65c816 MVN and MVP instructions, since 
the direction flag may require adjustment before 
performing a MOVS instruction. Futhermore, the ABR and 
DBR registers may need adjustment before and after the 
MOVS instruction to simulate the MVN and MVP 
instructions.  Finally, the actual count is specified by 
length, not count-1 (as on the 65c816), so this may 
require some adjustment if you are translating 65c816 
code instruction by instruction.

Example:

	MVN  0,1

  can be simulated by

	MOVW #0,DBR         MOVW #1,ABR 	ADDQ #1, 
A    ;Since MVN assumes A contains count-1         MOVS 
X,Y,A 	MOVW #1,DBR


    The CMPS operation compares the two specified 
strings.  It does a byte by byte comparison until length 
bytes are compared or a character in the source string is 
not equal to the corresponding character in the 
destination string. The condition codes are set to 
reflect the ordinality of the two strings (so you can use 
any of the branch, Scc, Ccc, or Icc instructions to test 
the results). If the z flag is returned set, then the two 
strings are equal (through the specified length), 
otherwise the source and destination operands are updated 
to point at the differing chars and the length operand is 
updated to show the number of character processed thus 
far (assuming, of course, that these operands weren't 
immediate, in which case they would be ignored).

    The XLATS instruction is used to translate values in 
a string.  The source operand points at a table in the 
DBR.  Each character in the dest string is used as an 
index into this table and the value fetched from the 
table is stored over the original character in the 
destination string.

    The FILLS instruction is used to initialize a string 
with a fixed value.  The source operand is an eight-bit 
value.  It is stored in successive locations at ABR/dest 
for len bytes.  If an immediate value is specified, a 
sixteen-bit value is encoded into the instruction, but 
only the L.O. eight bits are used.




Single byte expansion instructions:

These instructions take the form:

 7  6  5  4  3  2  1  0  o  o  o  1  0  1  1  0

Where "ooo" is decoded as:

0- NOP 1- COP 2- BRK 3- SVC 4- RTS 5- RTL 6- RTI 7- EXIT

    SVC is the "supervisor call" instruction.  Its 
intended use is for making operating system calls.  It is 
similar in function to the COP instruction.

    EXIT is used to deallocate local variables in a 
procedure.  It undoes the actions of the ENTER 
instruction.  Basically it performs the following 
operations:

		MOV F,S                 MOV TOS, F

    The remaining instructions in this group are 
identical to their 65c816 counterparts, so they don't 
require any futher elaboration.




Single byte w/displacement expansion instructions:

These instructions take the form:

 7  6  5  4  3  2  1  0  o  o  o  1  0  1  1  1

Where "ooo" is decoded as:

0- SAVE n 1- RESTORE n 2- reserved 3- reserved 4- RTS n 
5- RTL n 6- ADJSP n 7- ENTER n

    The "n" value immediately following these 
instructions is a displacement value.  If bit seven of 
the first byte following the opcode is zero, then the 
remaining six bits are used to specify a signed value in 
the range +/- 64.  If bit seven is one, then the 
following 15 bits are used to specify a value in the 
range +/-16383. Except possibly for ADJSP, none of these 
instructions should ever require more than a single byte 
displacement.

    SAVE is used to quickly push registers from the set 
[A,AX,X,Y,F,D,P] onto the stack.  The instruction is 
followed by a single byte with bits 0..6 cor- responding 
to these registers.  Bit seven must always be zero.

    RESTORE does just the opposite of SAVE, it pops the 
specified registers off of the stack.

    RTS n and RTL n perform the specified return from 
subroutine operations and then add the specified 
displacement to the stack pointer after the return 
address has been popped.  This provides a convenient 
mechanism whereby parameters can be removed from the 
stack.

    The ADJSP n instruction adds the displacement value 
to the stack pointer. This is a shorter version of the 
ADD #value,S instruction.  A special case was created for 
this instruction because it gets used all the time in 
languages like "C" or "SDL/65" which allow a variable 
number of parameters.

    The ENTER n instruction is used to set up an 
activation record when a procedure is initially entered.  
It performs the following operations:

	MOVW  F,TOS 	MOVW  S, F 	ADJSP n

The EXIT instruction can be used to undo the effects of 
this instruction.



Prefix expansion instructions:

These instructions take the form:

 7  6  5  4  3  2  1  0  o  o  o  1  1  0  0  0

where "ooo" is decoded as:

0-ABR prefix 1-SBR prefix 2-PBR prefix 3-word index 
prefix 4-dword index prefix 5-qword index prefix 6-
XBA/SWA 7-EMU

    XBA and EMU aren't true prefix bytes, they're just 
single byte instructions that didn't conveniently fit 
anywhere else.  So I'll describe them first.  XBA is 
identical to its 65c816 counterpart, it swaps the bytes 
in the accumulator. EMU switches from 65c820 native mode 
to 65c816 emulation mode.  EMU is a privileged 
instruction and will cause a privileged instruction trap 
if executed from the user mode.

   The first three prefix bytes are used to modify the 
bank used for data accesses. Addressing modes that 
normally access memory through the data bank register 
(which are all memory references except direct, long, 
TOS, and those involving F) can be "tweaked" to access 
memory through the auxillary, stack, or program bank 
registers by prefixing the address with the appropriate 
prefix.  For example,

		MOVW  #275, ABR:$1000

stores 275 into location $1000 in the auxillary bank 
register rather than the data bank register.  Indirect 
addresses of the form (a,X) and n(a,X)  present a minor 
problem.  Does the prefix specify the bank address of the 
absolute operand or the effective address?  I've opted 
for requiring that the absolute operand reside in the 
data bank and the prefix byte determines the effective 
address bank.

    Any addressing mode utilitizing the frame pointer 
register (F) is always relative to the stack bank 
register.  Prefixes are only allowed for the following 
frame-based addressing modes: n(d,F),  n(a,F),  (d,F),  
(a,F), n(d,FX), and n(d,F),Y.  The indirect address 
always comes out of the stack bank, the prefix applies to 
the computed effective address.

    Although the ABR:/SBR:/PBR: lexemes immediately 
precede the address expression to which they apply (on 
the source line), in the object code, the prefix byte 
always precedes the instruction to which the prefix 
applies.  If more than one prefix byte precedes an 
instruction, only the last one is used.  If a prefix byte 
precedes an instruction to which the prefix doesn't make 
sense (a branch, for example), then the prefix byte is 
ignored.  Finally, the prefix byte will be ignored if 
there isn't an applicable addressing mode in the current 
instruction. E.G.: 		byt  $18  ;ABR prefix 
byte 		MOVW A,X  ;ABR prefix has no meaning 
here.


    Three additional prefix bytes apply to the X and Y 
index registers.  These are the word index prefix, dword 
index prefix, and qword index prefix.  These prefix bytes 
provide scaled indexed addressing modes for the 65c820.  
Without one of these prefixes, the X and Y registers are 
always byte offsets.  That is, when used as an index 
register, the contents of X or Y is added directly to the 
effective address being computed.  When accessing words, 
pointers (double words), or eight byte values (e.g., 
floating point) you have to manually adjust the index 
registers by a factor of 2, 4, or 8.  The scaled index 
addressing prefix bytes let you avoid this problem.  The 
word prefix multiplies the X or Y register value by two 
before using it in the effective address computation.  
Likewise, the dword and qword prefixes multiply X or Y by 
4 or 8 before using the value.  In the source code, these 
prefix bytes are specified by the ":W", ":D", and ":Q" 
suffixes:

		MOVW   A,LBL,X:W 		MOVW   
$0,(PTR),Y:D                 MOVW   $2, 2(PTR),Y:D 	
	MOVW   F,(TBL,X:W) 		MOVW   A,LBL,Y:Q

If multiple prefixes appear, only the last one is used.  
If the prefix doesn't apply to the next instruction, it 
is ignored.



Single operand expansion instructions:

    The $1E expansion instructions are dedicated to 
instructions which require a single operand.  The format 
for the opcodes is as follows:

 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0   a  a  
a  a  a  a  s  o  o  o  o  1  1  1  1  0

where "aaaaaa" is a general addressing mode, "s" is the 
size (B/W), and "oooo" is one of the following opcodes:

0- NOT                  8- LAX    (load AX register) 1- 
NEG                  9- SAX    (store AX register) 2- ABS                  
A- XAX    (exchange AX register) 3- BOOL (0->0 else->1)  
B- LLB    (load LBound register) 4- SEX                  
C- LHB    (load HBound register) 5- ZEX                  
D- SLB    (store LBound register) 6- JMP                  
E- SHB    (store HBound register) 7- JSR                  
F- VAL    (validate memory location)


All of these instructions are followed by a single 
general address expression. Immediate operands are not 
allowed for any of these instructions.

NOT- logically compliments the specified value. NEG- 
takes the two's complement of the specified value. ABS- 
takes the absolute value of the specified location. BOOL- 
If the specified location is not zero, a one is stored 
into it.

SEX- (that's sign extension, not what you think).  SEXB 
checks the high order bit of the specified byte and 
copies it into the H.O. byte of the corresponding 
address.  For example, if X contains $0082 then SEXB X 
will store $FF82 into X. If X contains $0002, then SEXB X 
will store $0002 into X.   SEXW sign extends the 
specified location into the AX register.

ZEX- zero extends the specified value.  ZEXW simply 
stores a zero into AX. ZEXB stores a zero into the H.O. 
byte of the specified word.

JMP and JSR are like their 65c816 counterparts except any 
valid addressing mode can be used.  Note that, unlike 
most other instructions, the result is assumed to be in 
the current program bank unless a long addressing mode is 
specified.

LAX, SAX, and XAX allow you to load, store, and exchange 
the contents of the AX register.  Note that these three 
instructions plus SEX, ZEX, MUL, DIV, and MOD are the 
only instructions that deal with the AX register.

LLB, LHB, SLB, and SHB let you load and save the contents 
of the bounds registers. These are privileged 
instructions which will cause a privilege trap if 
executed from the user mode.

VAL- This instruction is used to validate a memory 
location.  That is, it tests the specified memory 
location to see if it lies within the range specified by 
the bounds register.  The address is a physical address, 
not a translated address. The overflow flag is set if a 
bounds violation would occur.  Note that the M bit in the 
SYSPSW need not contain a particular value when using 
this instruction. This is a privileged instruction which 
will cause a privilege violation if executed in the user 
mode.



BIT expansion instructions:

These instructions take the form:

 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0   a  a  
a  a  a  a  s  o  o  o  o  1  1  1  1  1

"aaaaaa" is the destination addressing mode. "s" is the 
size (applicable only to MAND, MOR, and MXOR). "oooo" is 
the sub-opcode, decoded as:

0- INS  dest, start, len 1- EXT  dest, start, len 2- FFS  
dest, start, len 3- FFC  dest, start, len 4- MAND dest, 
mask 5- MOR  dest, mask 6- MXOR dest, mask

7..F- reserved.

INS is used to insert a value into a bit field.  The 
value in the accumulator is shifted to the left "start" 
bits  and the the "len" following bits are stored into 
the specified memory location.  For example, if memory 
location $00 contains $F0 and the accumulator contains 
$3,  then INS $00,2,4 would leave location $00 containing 
$CC.  Note that you needn't specify byte or word size as 
this is intrinsic from the length.

EXT- extracts a bit field from some location and stores 
the right justified value into the accumulator (zeroing 
out any unused bits).  For example, if memory location 
$00 contains $CC and the accumulator contains $FFFF, then 
EXT $0,2,4 would leave the accumulator containing 3 and 
location $00 containing $CC.


FFS finds the first set bit in the specified location.  
The bit position is returned in the accumulator.  If 
there were no set bits, the accumulator contains "len"+1.

FFC finds the first clear bit in a manner identical to 
FFS.

Some notes:  These four instructions are followed by a 
single byte.  The low order four bits contain the start 
value, the high order four bits contain the length-1. 
"start" + "len" must always be less than or equal to 15.  
FFS and FFC use the direction bit in the USRPSW to 
determine which way to progress in the bit field when 
searching for the set or clear bit.


The MAND, MOR, and MXOR (masked AND, OR, and XOR) will 
AND, OR, or XOR the accumulator into the specified memory 
location.  The difference between these three 
instructions and the standard AND, OR, and XOR 
instructions is that they are followed by a byte or word 
(depending on the instruction size) which contains a mask 
for the operation.  Wherever a one bit appears in the 
mask, the logical operation will take place,  wherever a 
zero bit appears, the destination's bits will be 
unaffected.


_________________________________________________________
__________________

That wraps up my proposed instruction set for the 65c820.  
I'll be happy to discuss my design decisions with anyone 
who's interested.  The next step is to try and convince 
someone to actually build this thing!  In the mean time, 
I might try writing an interpreter and assembler for it.

By the way.  Many of you have probably recognized certain 
instructions from this processor or that processor 
sprinkled throughout.  To set the record straight,  most 
of my ideas have come from my own frustrations with the 
65c816, the 8086 family, and the National Semiconductor 
32000 family.  Despite that fact that a lot of you think 
that Intel's parts stink because they're used by IBM, 
don't let that prejudice you against many of the design 
issues here.  The 8086 does have a resonable 
archetecture, given the compromises it had to face.  It's 
certainly better than the 65c816.  I've incorporated a 
lot of the better ideas (like segment prefixes) into the 
design of the 65c820.  Once again,  don't downplay these 
powerful features just because you don't like IBM.

*** Randy Hyde
 