-
Notifications
You must be signed in to change notification settings - Fork 7
/
Copy pathArchitecture_Specification.mw
748 lines (519 loc) · 32.9 KB
/
Architecture_Specification.mw
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
This page contains information relating to the OpenRISC 1000 architecture and the specification document.
= OpenRISC 1000 architecture 1.1 =
* Adds support for l.lwa and l.swa atomic operations
== Download OR1K 1.1 Specification ==
Architecture specification document: [http://opencores.org/websvn,filedetails?repname=openrisc&path=%2Fopenrisc%2Ftrunk%2Fdocs%2Fopenrisc-arch-1.1-rev0.pdf PDF]
= OpenRISC 1000 architecture 1.0 =
This is the first update to OR1K architecture for many years. It addresses architectural and documentation issues in the manual. A summary of the main features follows:
* jump/branch delay slot is now optional
* improved version tracking
* relocatable exception vector space
* correction of arithmetic overflow and carry detection
* improvements to MAC and integer multiply behaviour
* various clarifications regarding hardware and software behaviour
See the [[Architecture Specification 1.0 Changes Summary]] page for a more extensive summary and list of suggested updates for implementations. See table 1-2 in [http://opencores.org/websvn,filedetails?repname=openrisc&path=%2Fopenrisc%2Ftrunk%2Fdocs%2Fopenrisc-arch-1.0-rev0.pdf the manual] for a full list.
== Download OR1K 1.0 Specification ==
Architecture specification document: [http://opencores.org/websvn,filedetails?repname=openrisc&path=%2Fopenrisc%2Ftrunk%2Fdocs%2Fopenrisc-arch-1.0-rev0.pdf PDF]
== Old specifications ==
These documents are archived. For now it's just the "original" spec, it can be downloaded below:
Old Architecture manual (April 2006): [http://opencores.org/websvn,filedetails?repname=openrisc&path=%2Fopenrisc%2Ftrunk%2Fdocs%2Farchive%2Fopenrisc_arch.pdf PDF]
= Proposed changes =
This section is to keep track of proposed alterations to the OpenRISC 1000.
Feel free to add ideas and comment (please sign your name.) It also helps to post to the [[OR1K:Community_Portal#Mailing_lists|mailing list]] outlining your proposal.
The spec update process will occur from time to time and all proposals largely agreed to by the community will make it in.
The current working copy of accepted proposals for architecture version 1.2 can be found [https://github.com/openrisc/doc/blob/arch-1.2-proposal/openrisc-arch-1.2-rev0.odt here].
== Core Identifier and Number of Cores (P1) ==
To enable multicore systems, a Special Purpose Register 'Core ID' is needed. Although it is principally not necessary, but allows for a self-contained solution, I furthermore propose a 'Number of Cores' register, which contains the number of cores in a SMP cluster.
Proposal: Use system status register address space 128+ for multicore specifics. I think there may further things come up, so that it may make sense to reserve some space.
This is already in mor1kx..
Therefore, in table on page 24 add:
Grp #: 0
Reg #: 128
Reg Name: COREID
USER MODE: -
SUPV MODE: R
Description: Core Identifier Register
Grp #: 0
Reg #: 129
Reg Name: NUMCORES
USER MODE: -
SUPV MODE: R
Description: Number of Cores Register
Add 4.12, explaining those.
== Optional user-mode support (P2) ==
As some processors designed for embedded applications won't necessarily run software with a user/supervisor-mode split of operation, it's not necessary for them to implement user-mode features (eg. protect from unprivileged access to SPRs and the like.) Obviously this lack of support for user mode should be indicated via a bit in the CPUCFGR. So I propose a bit is added so that implementations without user-mode support can be detected by software which requires user-mode.
''Comment:'' I think this is maybe too much of generalization. Except there is a specific plan to also implement a core without it, I don't think we should over-complicate the spec. [[User:Wallento|Wallento]] 11:51, 3 March 2015 (CET)
== SPR access updates (P3) ==
There has been a discussion about different SPRs and the possibility to access them from user space.
The topic can be split in two aspects: Defining privileges on instruction and defining the behavior for insufficient privileges.
=== SPR privileges ===
Essentially the idea is to allow user space access to certain configuration registers: VR, UPR, CPUCFGR, DMMUCFGR, IMMUCFGR, DCCFGR, ICCFGR, VR2, AVR. (Potentially add COREID and NUMCORES)
During discussion it turned out, that we might consider splitting the user space access to SPRs into configuration and state registers. We thereby allow to expose information about the core, but not about the current execution state.
The proposal is to add an extra field CSUMRA to SR for the configuration registers.
=== Accessing SPRs with insufficient privileges ===
At present there is no comment on what should happen in the event that l.mtspr/l.mfspr are used on SPRs which require supervisor mode to access, or if l.rfe is used in user mode.
There are currently two proposals for handling this:
The first is to leave the behavior in these cases implementation defined or completely undefined. If this option is chosen, the manual should be changed to state this explicitly.
The first proposal (by Julius) is for a l.mtspr (write to an SPR) requiring write in supervisor mode while in user mode makes that instruction equivalent to an l.nop. Also that l.mfspr (read from an SPR) with insufficient privileges returns 0, just like an unimplemented SPR would.
The second proposal, by firefalcon and pgavin, is that we keep in line with other ISAs and make the ISA fully virtualizable by throwing an exception in this case.
Some elaboration by pgavin: I think we should actually *use* the definition of "privileged" as given on page 15 of the manual (section 1.6). It is defined, but never used in the manual, from what I've found. Additionally, I think we should raise an exception whenever a privileged instruction has been executed while in user mode. For example, the l.rfe instruction should always raise this exception when executed in user mode. Additionally, in user mode, an attempt to read or write an SPR that is not permitted as indicated on Table 4-2 of the manual should raise this exception. An instruction that accesses a privileged resource in user mode should raise a privileged instruction exception. We can either reuse the current illegal instruction for this purpose, or add a new exception type.
l.rfe should always be privileged. It seems to me that allowing this instruction from user mode makes little sense, and could, in fact, be dangerous. l.mtspr would be privileged whenever the address does not have a W in the "user mode" column of table 4-2, and l.mfspr would be privileged whenever the address does not have an R in that column. The descriptions for these instructions would need to be changed to indicate that they may cause exceptions; it currently says that they do not.
Incidentally, some bits in the SR register (F, CY, OV) are not accessible from user mode, and so these bits should be mirrored in a separate SPR that is R/W from user mode. Or alternatively, we could add new instructions to read/write them.
Also incidentally, I also think l.mtspr/l.mfspr on an unimplemented register should raise an illegal instruction exception. The presence of SPRs that are optional should be indicated by a bit in the UPR, and so this should be checked prior to accessing them. If some registers are not covered by presence bits, the bits should be added. However, the behavior currently defined by the standard is not a problem for virtualization, as far as I can tell, provided that l.mtspr/l.mfspr behave identically on unimplemented registers whether in user or supervisor mode.
=== Summarized proposals ===
Proposal A:
* Add CSUMRA bit (17) to SR: Allow user-mode access to configuration SPRs
* Set of configuration SPRs: VR, UPR, CPUCFGR, DMMUCFGR, IMMUCFGR, DCCFGR, ICCFGR, VR2, AVR
* Add exception at 0xf00: Access violation from user mode
* Add exception 0xf00 to l.mfspr, l.mtspr and l.rfe
== l.lw assembly mnemonic (P4) ==
Add the l.lw assembly mnemonic, which encodes as a l.lwz instruction.
The l.lwz definition page in the spec should have:
Format:
l.lwz rD,I(rA)
or
l.lw rD,I(rA)
''Commentary:'' The downside of this is that it is potentially confusing in a 64-bit implementation, where you want to be explicit about sign extension. If we do go down this route, then for consistency, we should also have l.lh and l.lb as synonyms for l.lhz and l.lbz. [[User:Jeremybennett|Jeremybennett]] 11:32, 16 May 2012 (CEST)
rdiez: For the 32-bit architecture, in order to reduce confusion it would be best to remove the l.lws opcode, and remove the l.lws and l.lwz mnemonics, then add an "l.lw" mnemonic which maps to the old l.lwz opcode. or1200 hasn't implemented (or didn't implement) the l.lws opcode for a long time anyway. By the way, l.extws and l.extwz should also be marked as obsolete, as l.ori can achieve the same results. For the 64-bit architecture, l.lw shouldn't be a valid mnemonic, so the programmer must think whether he wants the l.lwz or the l.lws behaviour.
== Instruction Classes (P5) ==
At present, there are class I and II instructions. Class I must always be implemented. Class II may be optionally implemented.
There are a few problems with the current scheme.
# There is no register where class II instructions are indicated as present or not
# Some fundamental instructions, such as compare-and-set-flag-against-immediate (<code>l.sf*i</code>) are class II and should really be I
# Software multi-lib support with such a configurable instruction set is difficult
Reorganising these will make it clearer which functionality should be tested and expected to be present in an implementation. It will also make it easier for software libraries to be prepared for a particular combination of supported instructions.
=== Current ORBIS32 Class II instructions ===
As of revision 0 of the specification, the following are marked as ORBIS32 class II:
* l.cmov
* l.csync
* l.cust1-8
* l.div*
* l.ext[bhw]*
* l.ff1
* l.fl1
* l.mac*,l.msb
* l.mul*
* l.psync
* l.ror,l.rori
* l.sf*i (l.sfeqi,l.sfgesi etc.)
* l.trap
=== Proposed ORBIS Classifications ===
Class I should remain mandatory to implement.
A new classification is proposed:
* Class II - Optional Maths: l.div*, l.mul* (in OR1200 on Virtex 5, serial l.div, full mult costs 265FF, 550LUT - respectively 71FF, 173LUT and 194FF, 377 LUT)
* Class III - Optional Bit Manipulation: l.ext[bwh]*, l.ff1, l.fl1, l.ror, l.rori (in OR1200 on Virtex 5 they cost 183 LUT)
* Class IV - MAC Instructions - l.mac*, l.msb
* Class V - Remaining Optional Instructions: l.cmov, l.csync, l.msync, l.psync, l.cust1-8
Classes II-IV will be all-or-nothing support classes, with class V individually implementable.
l.sf*i and l.trap should be made class I.
[rth: l.cmov is much more important to good code generation than all of the class III instructions. I would not move it out of class II. l.msync at least should be moved to class I, as it needs to be used in conjunction with l.lwa and l.swa. I see no reason why l.csync and l.psync couldn't be class I as well, since in the simplest implementations they should be nops.]
=== Presence bits ===
Presence bits in a new register should be added for classes II-IV, with a bit for each instruction in class V.
The CPUCFGR should be extended beyond bit 9 to contain:
[25] l.cust1 instruction supported
[24] l.cust2 instruction supported
[23] l.cust3 instruction supported
[22] l.cust4 instruction supported
[21] l.cust5 instruction supported
[20] l.cust6 instruction supported
[19] l.cust7 instruction supported
[18] l.cust8 instruction supported
[17] l.cmov instruction supported
[16] l.csync instruction supported
[15] l.msync instruction supported
[14] l.psync instruction supported
[13] Class IV instruction supported
[12] Class IV instruction supported
[11] Class III instruction supported
[10] Class II instruction supported
== ORFPX32/ORFPX64 ==
=== lf.madd (P6) ===
The description of these instructions needs to make it clear that the result is computed without
intermediate rounding. See the ISO C 99 functions fma and fmaf.
With no other changes, this would suggest a definition of
<pre>
rD <- rD + rA * rB
</pre>
However, there's nearly room in the instruction set to make this a full 3-operand insn.
The bits at fault are l.cust1.s, l.cust1.d, whose opcode spills over into bit 6. As I doubt there
are any users of these opcodes, I propose removing them. One can always use one of the l.cust*
major opcodes to interact with your custom fp hardware.
This lets us define a family of madd instructions as
<pre>
31 26 21 16 11 6 0
[ 0x32 | D | A | B | C | opc ]
Mnemonic NegA NegP BaseOpc
-----------------------------
lf.madds 0 0 0b000111
lf.msub 1 0 0b100000
lf.nmadd 0 1 0b100001
lf.nmsub 1 1 0b100011
Define inv(x,y) to return x with the sign-bit xor'd with y
rD <- (rA * inv(rB,NegP)) + inv(rC,NegA)
</pre>
=== lf.sf* (P7) ===
The current definitions of comparisons do not allow easily testing the "unordered" property, which is
important in the presence of NaNs. There are four conditions for which we might want to test, which
when combined produce all of the possible results. Referencing the ISO C 99 <math.h> functions:
<pre>
31 26 21 16 11 6 0
[ 0x32 | res | rA | rB | res | opc ]
Setup
Mnemonic Opc U E L G X
---------------------------
Existing lf.sfeq.s 0x08 0 1 0 0 0
lf.sfne.s 0x09 1 0 1 1 0
lf.sfgt.s 0x0a 0 0 0 1 1
lf.sfge.s 0x0b 0 1 0 1 1
lf.sflt.s 0x0c 0 0 1 0 1
lf.sfle.s 0x0d 0 1 1 0 1
New lf.sfun.s 0x28 1 0 0 0 0 /* isunordered(rA,rB) */
lf.sfult.s 0x29 1 0 1 0 0 /* !isgreaterequal(rA,rB) */
lf.sfule.s 0x2a 1 1 1 0 0 /* !isgreater(rA,rB) */
lf.sfueq.s 0x2b 1 1 0 0 0 /* !islessgreater(rA,rB) */
un <- isnan(rA) | isnan(rB)
eq <- rA == rB
lt <- rA < rB
gt <- rB < rA
SR[F] <- (u & un) | (e & eq) | (l & lt) | (g & gt)
if (X & un)
raise INVALID_OPERAND exception
</pre>
The hardware that produces the existing comparisons must be able to
produce all of the proper intermediate conditions. It's merely a
matter of exposing all of the conditions we need to be able to test
in order to implement <math.h> properly.
If we drop backward compatibility, i.e. drop lf.sfgt and lf.sfge,
there are only 8 comparisons and we can fit them into 0b0-1xxx.
Which has a nice bit of symmetry to it.
== ORBIS64/ORFPX64 ==
=== Additions ===
==== l.lda, l.sda (P8) ====
We need 64-bit atomic memory ops, akin to the l.lwa and l.swa instructions for ORBIS32.
Use major opcodes 0x3a (l.lda) and 0x3b (l.sda).
==== l.adrp (P9) ====
We need an efficient method of forming 64-bit addresses. The fact that this will improve
-fpic code generation in 32-bit mode too is an added bonus.
<pre>
31 26 21 0
[ 0x03 | rD | N ]
rD <- exts(Immediate << 13) + (InsnAddr & -8192)
</pre>
That is, it computes the address of the page of the target from the address of the page of the insn.
The balance of the address is the 13 bit page offset, computable at link time.
This can be added to the page in several ways:
<pre>
l.adrp r3, foo
l.ori r4, r3, po(foo)
l.lbz r5, po(foo)(r3)
l.sb po(foo)(r3), r6
</pre>
Where the assembler function "po" (page offset) produces the new relocations.
==== l.movl, l.addl (P10) ====
We need an efficient method of forming 64-bit constants. Currently it takes 5 insns and two registers,
or 6 insns and one register to form a full 64-bit constant. Propose extending opcode 0x06 (l.movhi/l.macrc)
to be able to form a 64-bit constant (or add a 64-bit displacement!) in only 4 insns:
<pre>
31 26 21 20 19 18 17 16 0
[ 0x06 | rD | A | U | L | SH1 | SH2 | K ]
shift = (SH1:~SH2) << 4
lower = (L << shift) - 1
const = exts(U:K) << shift | lower
input = (A ? rD : 0) /* l.addl vs l.movl */
rD <- input + const
</pre>
The strange encoding of the shift count (SH1:~SH2) is due to being backward compatible with l.movhi.
By representing the low bit of the shift inverted, we map (0b00) -> 16.
The U ("upper") and L ("lower") bits produce all ones above and below the shifted K, respectively.
Note that the concatenation of U and K essentially forms a 17-bit signed constant.
This allows the easy formation of both large negative numbers and powers-of-two-minus-one. E.g.
<pre>
l.movl r3, 0xfffffffff0600000 (U=1,L=0,SH=16,K=0xf060 : -262144000)
l.movl r3, 0x00007fffffffffff (U=0,L=1,SH=32,K=0x7fff)
l.movl r3, 0xffff0000ffffffff (U=1,L=1,SH=32,K=0x0000)
</pre>
Note that because of the conflict with the existing l.macrc instruction, which overlaps with the SH2 bit,
one must ensure that one of A, U, or L is set when encoding SH=0 (0b00001). Setting L is trivial, as it
makes no change to the constant produced; with a zero shift, "lower" is 0.
One can now form arbitrary 64-bit constants with one l.movl and three l.addl.
<pre>
l.movl r3, 0x1234000000000000 (SH=48,K=0x1234)
l.addl r3, 0x0000567800000000 (SH=32,K=0x5678)
l.addl r3, 0x000000009abc0000 (SH=16,K=0x9abc)
l.addl r3, 0x000000000000def0 (SH=00,K=0xdef0)
</pre>
Note that since the intent of l.addl is both constant formation and address arithmetic, no overflow
and carry bits will be computed. If overflow or carry is desired, form the 64-bit constant into a
register and then use l.add.
==== l.stod, l.dtos (P11) ====
The ORFPX64 extension is missing instructions to convert between single and double precision.
==== ACC instructions (P12) ====
At present, the SPRs are defined to be 32-bits wide, which restricts the width of the MACHI:MACLO
accumulator register. This is emphasized by the definition of the l.macrc instruction. Further,
user-space access to MACHI and MACLO SPRs may be disabled by the kernel via SR[SUMRA]. Better to
introduce another method of performing double-word arithmetic and accumulation that does not depend
on SPRs at all.
For the purposes of discussion, name this set of instructions "ACC" (Alternate aCCumulate).
<pre>
31 26 21 16 11 6 5 4 0
[ 0x14 | rD | rA | rB | rC | SOV | SCY | opc ]
Mnemonic Opc Setup
---------------------------------------------
l.aadd 0000 a=D:A, m=0, c=0, s=0
l.asub 0001 a=D:A, m=0, c=1, s=1
l.aadc 0010 a=D:A, m=0, c=SR[CY], s=0
l.asbb 0011 a=D:A, m=0, c=~SR[CY], s=1
l.amul 0100 a=0, m=1, e=1, c=0, s=0
l.amulu 0101 a=0, m=1, e=0, c=0, s=0
l.amac 0110 a=D:A, m=1, e=1, c=0, s=0
l.amacu 0111 a=D:A, m=1, e=0, c=0, s=0
l.amsb 1000 a=D:A, m=1, e=1, c=1, s=1
l.amsbu 1001 a=D:A, m=1, e=0, c=1, s=1
if (m)
x = (e ? exts(rB) : extz(rB))
y = (e ? exts(rC) : extz(rC))
p = x * y
else
p = rB:rC
rD:rA <- a + (s ? ~p : p) + c
if (~SCY)
SR[CY] <- (s ? ~carry-out : carry-out) (unsigned overflow)
if (~SOV)
SR[OV] <- signed overflow
</pre>
The circuits needed to implement these insns are virtually identical to
that needed to implement the MAC instructions. Only the sources and
destinations differ.
This uses pairs of registers to perform the same double-word arithmetic
that's currently possible with the MAC instructions, but leaves the result
in general registers. This makes it simple to organize coding of general
multi-word arithmetic (libgmp), the performance of which is crucial for
implementing public-key encryption.
Note the l.aadc and l.asbb instructions include the carry-in, and so can
be strung together to produce two words of a multi-word addition each.
Note that unlike l.mac vs l.macu, these instructions produce both SR[CY]
and SR[OV] from the addition stage. Whether or not the multiply stage is
zero-extended or not should not imply anything about the signed-ness of
the addition.
Note the SOV and SCY bits to "suppress" generation of either SR[OV] or
SR[CY] from the addition when they're not desired by the programmer.
The inverted sense of these bits (set to suppress rather than set to
generate) will tie in to another proposal for normal arithmetic.
=== Corrections (P13) ===
==== l.ftoi.s ====
The 64-bit version of this instruction should produce a 64-bit signed integer result, rather
than truncating to a 32-bit unsigned number.
==== l.itof.s ====
The 64-bit version of this instruction should use a 64-bit signed integer as input, rather
than reading only the low 32 bits. This does mean that 64-bit code wanting to convert a
real 32-bit integer to single-precision must sign-extend beforehand, but that is also the
case for more fundamental instructions like l.sfeq, and so should not be a burden.
==== l.ext* ====
These should be class I mandatory for ORBIS64, not relegated to class III optional.
In particular, there are no comparison instructions that ignore the high 32-bits, and one
cannot simply use l.andi to discard the high 32 bits, so l.extw* will be used often prior
to comparisons.
== ORFPX64A32 ==
Adaptation double precision floating point ISA for 32-bit OR1K architecture.
The idea is analogue to other 32-bit architectures. Namely use paired GPRs for perform operations with double precision floating point data.
Example:
<pre>
lf.add.d rD,rA,rB
32-bit Implementation (ORFPX64A32):
{rD[31:0],(rD+1)[31:0]} ← {rA[31:0],(rA+1)[31:0]} + {rB[31:0],(rB+1)[31:0]}
64-bit Implementation (ORFPX64):
rD[63:0] ← rA[63:0] + rB[63:0]
</pre>
If R2 is not used for frame pointer, than the following GPRs could be paired:
<pre>
{R2,R3} … {R6,R7}, {R11,R12} … {R29,R30}
</pre>
''As far as I understand it is the case of current implementation in GCC. GCC operates with 64-bit data as structured type and uses stack to transmit such parameters into a function.''
For case when R2 is used for frame pointer:
<pre>
{R3,R4} … {R7,R8}, {R11,R12} … {R29,R30}
</pre>
''For the case we also able to extend binary interface by allow for a compiler to transmit up to three 64-bit values as function parameters through R3 … R8. Personally, I would like to vote for this variant. Even so, the approach leads to binary backward incompatibility; I don’t think that could be a big problem for such open source project as OpenRISC and its application.''
[[User:BAndViG|BAndViG]] 15:35, 12 March 2015 (CET)
== Fine grained control of SR[OV] and SR[CY] (P14) ==
With more precise control over which instructions set these status bits, the compiler can do a better
job of producing multi-word arithmetic. At present the compiler must represent multi-word arithmetic
as a single indivisible unit until quite late in the compilation process.
The primary factor is that the register allocator must be able to insert instructions to compute addresses
at any point in the function. This happens as the allocator runs out of registers, inserts a spill to
(or fill from) the local stack frame, and grows the size of the stack frame beyond the range addressable
from the stack or frame pointers.
A secondary factor is freedom to schedule instructions without unnecessary constraints. If most arithmetic
modifies the carry flag, then the window in which we can schedule a block of multi-word arithmetic is small.
However if most arithmetic is not interested in carry or overflow, and is able to avoid modifying those
flags, the scheduling window grows significantly.
A third factor unrelated to multi-word arithmetic is being able to indicate to the SR[OVE] and AECR exceptions
which arithmetic insns are signed and which are unsigned. Thus an add with OV disabled, but CY enabled, must
indicate an unsigned add. While the reverse would indicate a signed add.
There are five major opcodes that modify these bits: 0x13 (maci), 0x27 (addi), 0x28 (addic), 0x31 (mac), 0x38 (alu).
The MAC and ALU major opcodes both currently have bits 4 and 5 reserved, and thus currently 0-filled.
As a backward-compatible change, propose decoding bit 4 as SCY (Suppress CY), and bit 5 as SOV (Suppress OV).
Propose changing the assembly to use "/modifier" mnemonic suffixes: "/c", "/co", "/o" to indicate setting of these bits.
<pre>
l.add r2,r3,r4 OV and CY set
l.addc r2,r3,r4
l.sub r2,r3,r4
l.mac r3,r4 OV set
l.msb r3,r4
l.mul r2,r3,r4
l.muld r3,r4
l.div r2,r3,r4
l.add/c r2,r3,r4
l.addc/c r2,r3,r4
l.sub/c r2,r3,r4
l.macu r3,r4 CY set
l.msbu r3,r4
l.mulu r2,r3,r4
l.divu r2,r3,r4
l.muldu r3,r4
l.add/o r2,r3,r4
l.addc/o r2,r3,r4
l.sub/o r2,r3,r4
l.mac/o r3,r4 No bits set
l.msb/o r3,r4
l.mul/o r2,r3,r4
l.muld/o r3,r4
l.div/o r2,r3,r4
l.macu/c r3,r4
l.msbu/c r3,r4
l.mulu/c r2,r3,r4
l.muldu/c r3,r4
l.divu/c r2,r3,r4
l.add/co r2,r3,r4
l.addc/co r2,r3,r4
l.sub/co r2,r3,r4
</pre>
The syntax isn't pretty, but all other alternatives seemed worse. At least the syntax is clear, and backward
compatible with existing assembly.
Note that this introduces some overlap between l.mul and l.mulu, so it might be less confusing to create aliases
<pre>
l.mul/c (l.mul with SOV and SCY clear)
l.mul/o (l.mulu with SOV and SCY clear)
l.mul/co (l.mul with SOV set)
</pre>
and then deprecate the l.mulu mnemonic.
For the the three insns using immediates: l.maci, l.addi, l.addic, all bits are decoded and so we cannot make
any adjustments. However, addition is important enough to register allocation and address formation to make
it worth while to add a "l.addi/co" like l.addi except that both CY and OV are suppressed. The alternative
is to spill yet another register in which to load the offset and then use l.add/co. Propose reserving major
opcode 0x07 for "l.addi/co".
The change in assembly syntax would call for the "l.addl" instruction proposed above to be spelled "l.addl/co".
== FPCSR Extension ==
The proposal is to extend FPCSR (Floating Point Control and Status Register) with mask bits to have an ability to enable/disable each FPU related exception flag independently.
The additional bits:
<pre>
[12] Enable (1) / Disable (0) Overflow Flag
[13] Enable (1) / Disable (0) Underflow Flag
....
[20] Enable (1) / Disable (0) Division by Zero Flag
</pre>
The bit [0] "FPU Exception Enable (FPEE)" keeps it's global behavior.
<pre>
if FPEE == 0 than all FPU related exceptions are disabled
if FPEE == 1 than FPU related exceptions are controlled by Enable / Disable Flags listed above
</pre>
<s>The proposed default value:</s>
<s><pre>
0x1C5001
</pre></s>
* <s>deactivates minimally useful flags ("Quiet NaN", "Zero", "Inexact" and "Underflow")</s>
* <s>activates most useful exceptions ( "Div by zero", "Inf", "Invalid", "Signaling NaN" and "Overflow")</s>
* <s>set mode "round to nearest even"</s>
* <s>enables exceptions listed above as "most useful exceptions"</s>
[rth: The default value should almost certainly remain 0: all exceptions disabled and round to nearest.
That's the same default as virtually every other cpu, including x86. Anything else would be surprising.]
Edited: remove non-zero default value as Richard Henderson (?) stays that “0x0” is unofficial industry standard.
[[User:BAndViG|BAndViG]] 15:35, 12 March 2015 (CET)
== Work in Progress ==
Remove "1.4 Work in Progress". Reason: The architecture revision should be stable and the remaining parts are essentially community stuff, not for the spec.
== Typos, Clarifications ==
The following section should be used to list proposed typos and clarifications which need to be made to future revisions of the architecture spec document.
=== Misspellings, Typos ===
* <s>''Synchronization'' is misspelled as ''Syncronization'' on pages 44, 82, 92, 284, and 340.</s> (fixed in proposal, [[User:Wallento|Wallento]] 11:35, 3 March 2015 (CET))
* <s>In Table 4-7 on page 31, "EEA" label should be "ESR".</s> (fixed in proposal, [[User:Wallento|Wallento]] 11:35, 3 March 2015 (CET))
* <s>At present the fixed-one bit in the supervision register (<tt>SR[FO]</tt>) is listed as readable ''and'' writable. It should be listed as readable only.</s> (fixed in proposal, [[User:Wallento|Wallento]] 11:37, 3 March 2015 (CET))
* Move the section about GPRs from the NPC section (4.10, page 31) into its own section. Better explain how GPRs from other contexts are accessed (section 6.4.3, page 257).
* <s>PPC should be read-only (table 4-2, page 24).</s> (fixed in proposal, [[User:Wallento|Wallento]] 11:58, 3 March 2015 (CET))
* For backwards compatibility, the AECR register (section 15.14, page 326) should be initialized to a state that has behavior identical to a machine without AECR support, instead of the current "0". I believe this would be the value (OVADDE|OVMULE|OVMACADDE).
* ESR (section 4.9, page 30) should have the same bit layout as SR0.
* <s>There exist a typo in the name of l.msbu. The current name is "Multiply and Subtract Signed" but it should be "Multiply and Subtract Unsigned".</s> (fixed in proposal, [[User:Wallento|Wallento]] 13:01, 3 March 2015 (CET))
* <s>There exist a typo in the format of l.ff1 and l.fl1. They both include the rB register, even though it is not part of the instruction.</s> (fixed in proposal, [[User:Wallento|Wallento]] 13:01, 3 March 2015 (CET))
<s>
Format:
before: l.ff1 rD,rA,rB
after: l.ff1 rD,rA
before: l.fl1 rD,rA,rB
after: l.fl1 rD,rA
</s>
* <s>There exist a typo in the bit representation of l.ff1 and l.fl1. They both include the rB register, even though it is not part of the instruction.</s> (fixed in proposal, [[User:Wallento|Wallento]] 13:01, 3 March 2015 (CET))
<s>
Format: l.ff1 rD,rA
bits before: 111000DDDDDAAAAABBBBB-00----1111
bits after: 111000DDDDDAAAAA------00----1111
Format: l.fl1 rD,rA
bits before: 111000DDDDDAAAAABBBBB-01----1111
bits after: 111000DDDDDAAAAA------01----1111
</s>
* <s>There exist a typo in the bit representation of l.muld and l.muldu. They both include the rD register, even though it is not part of the instruction.</s> (fixed in proposal, [[User:Wallento|Wallento]] 13:01, 3 March 2015 (CET))
<s>
Format: l.muld rA,rB
bits before: 111000DDDDDAAAAABBBBB-11----0111
bits after: 111000-----AAAAABBBBB-11----0111
Format: l.muldu rA,rB
bits before: 111000DDDDDAAAAABBBBB-11----1100
bits after: 111000-----AAAAABBBBB-11----1100
</s>
==== Cache Registers Marked as Optional ====
From [http://bugzilla.opencores.org/show_bug.cgi?id=79 bug #79].
Changes to the spec document to make it clearer which cache control registers are optional.
Mark the following registers as optional to implement, as they have a "present" bit in the I/DCCFGR:
* Data Cache Control Register
* Instruction Cache Control Register
* Data Cache Block Write-Back Register
Instruction and data block invalidate and data block flush registers are mandatory, and their presence bits in the I/DCCFGR should be marked as always '1'.
=== Consistency ===
The following section should be used to list proposed consistency issues that should be discussed for future revisions of the architecture specification document. This may refer to naming conventions but also more trivial matters such as use of whitespace and formatting.
==== Operand representations ====
Currently there exist 16 different variations of operand representations for
instructions (excluding l.custN instructions) in the OpenRISC 1000 architecture
specification.
Some of these variations have identical or very similar meaning. For instance
the variation which has a destination register (rD), source register (rA) and an
immediate value (I, K or L).
From a consistency stand point it would be nice if all of these instructions
agreed on using the same letter to represent an immediate value.
<s>For instance why should the opperand representation of l.andi and l.ori differ
from the operand representation of l.xori?</s>
[rth: Because l.andi and l.ori have unsigned immediates and l.xori has a signed immediate.]
Format: l.andi rD,rA,K
Format: l.ori rD,rA,K
Format: l.xori rD,rA,I
The goal is to reach a consensus on how to make the operand representations
consistent. Which letter should be used for immediate values (I, K, L or N) and
which register name should be used in the various cases.
[rth: We're currently fairly consistent in using I for signed 16 bit, K for unsigned 16 bit, and L for the 6-bit shift count.]
The following page contains a list of all instructions arranged by their operand representations:
http://opencores.org/or1k/Operand_representation
==== Naming conventions ====
The naming of l.addi and l.addic are not consistent with the naming of l.andi, l.ori and l.xori.
Currently there exist three naming patterns.
* Instructions which always take halfword immediate values and thus make them implicit in the name and the mnemonic.
l.addi: Add Immediate
l.addic: Add Immediate and Carry
* Instructions which always take halfword immediate values but still makes them explicit in the name, although implicit in the mnemonic.
l.andi: And with Immediate Half Word
l.ori: Or with Immediate Half Word
l.xori: Exclusive Or with Immediate Half Word
* Related instructions which handle values of varying size and thus make them explicit in the name and the mnemonic.
l.lbs: Load Byte and Extend with Sign
l.lhs: Load Half Word and Extend with Sign
l.lws: Load Single Word and Extend with Sign
==== Whitespace ====
To be consistent with l.add and l.addc an empty new line should be inserted between the two paragraphs in the description of l.addi and l.addic.
= Previous proposals =
The suggestions which were accepted, in one form or another, into the 1.0 release of the architecture specification can be found here: [[Architecture_Specification_1.0_proposals]].