review of 0.9.1 #15

sorear · 2024-01-31T07:47:11Z

However, as page-based virtual memory (MMU) is usually unavailable on IoT devices, it is hard to isolate the S-mode OSes (e.g., RTOS) and user-mode applications.

Strange statement. I believe the largest relevant market for this feature is automotive. Many IoT devices are not safety critical or hard real time and do not require a nested PMP system; many of the devices that most strongly need SPMP for security aren't connected to the internet at all.

Like PMP, the granularity of SPMP access control settings is platform-specific and, within a platform, may vary by physical memory region.

Where did this come from? The explanatory note at the end of 3.7.1.1 specifically calls the PMP granularity a global property which software can discover without reference to an address, and I can't find anything elsewhere in the privileged or Smepmp spec contradicting this.

The number of SPMP entries can vary by implementation, and up to 64 SPMP entries are supported in the standard.

The PMP spec is designed to allow the number of physically implemented entries (with a cfg and addr that aren't read-only zero, and connected to a comparator) to vary smoothly. This was clearer in the originally ratified specification, the update that bumped the maximum PMP size from 16 to 64 confused some of the wording.

We need clear terminology to differentiate whether an entry is physically implemented, exists as a pair of CSRs but is read-only zero, or the CSR doesn't exist at all. I recommend requiring that all SPMP CSRs exist in all implementations of the Spmp extension, since we have no compatibility requirement with a max-16-SPMP spec and Spmp is optional, so it does not constrain implementations that do not claim conformance to Spmp.

Assuming that we require all 64+8/XLEN+2 CSRs to be decoded in all Spmp implementations, we need a term for physically implemented SPMP entries. Hereafter in this review, "writable SPMP entries".

I am strongly supportive of ARC's recommendation that a single pool of address registers and comparators be dynamically shared between PMP and SPMP. Has a mechanism for this been decided? I have most of a proposal if you need one.

Fewer address bits may be implemented for specific reasons, e.g., systems with smaller physical address space.

Is the number of address bits the same for all writable SPMP entries?

Implemented address bits must be contiguous and go from lower to higher bits.

Clarify that implemented address bits must extend to the LSB, except as otherwise permitted by the PMP granularity rules, which are in the "Address Matching" section and therefore incorporated by reference.

The layout of SPMP configuration registers is the same as PMP configuration registers, as shown in Figure 5. The register is WARL.

PMP configuration is WARL on a field, not register, basis, with RWX as a single field: trying to write a reserved value to the RWX field, or write A=NA4 when not allowed by the granularity, will not disturb other fields. Is the divergence intentional?

The number of SPMP entries: Implementations may implement zero, 16, or 64 SPMP entries.

Remove this. The PMP spec provides a choice of 0, 16, or 64 decoded CSRs to legitimize implementations of the earlier 16-PMP spec and those that predate PMPs entirely (or wish to trim a dozen gates from the CSR decoder). Spmp is both a new spec and optional as a whole so there's no reason not to require decoding the entire CSR range.

This has nothing to do with the number of writable SPMP entries, which can be 0..64 inclusive.

The reset state: On system reset, the A field of spmp[i]cfg should be zero.

Resetting spmpswitch[i] would be half as many resets for the same effect.

SPMP CSRs should be allocated contiguously starting with the lowest CSR number.

It is unclear whether this refers to decoded SPMP entries or writable SPMP entries. In either case, we don't have an earlier spec to be compatible with, so it should be a "must" (it is a "must" in the PMP spec).

Figure 6. SPMP Encoding Table

The row ordering of this table makes little sense. I would suggest simple numerical order on RWX; I will also accept "numerical order, but with -W- and -WX, in that order, moved to the end" and "order matching the Smepmp table".

We re-use the sstatus.SUM (allow Supervisor User Memory access) bit of modifying the privilege with which S-mode loads and stores access to physical memory.

S-mode should be able to remain unaware of invisible traps to M-mode for instruction or misaligned access emulation. If a misaligned access trap (or illegal instruction trap if mtval=0) is taken into M-mode and the emulation handler needs to read the instruction with MXR=1 MPRV=1, MXR must be able to override execute-only SPMPs at least if the actual execution mode is M.

This is much less of an issue with execute-only PMPs since if M-mode software installs an emulating trap handler and also execute-only PMPs, it has only succeeded in breaking itself. SPMP turns this into a compatibility issue between software in different modes.

M-mode accesses are always considered to pass SPMP checks.

Clarify that this is the effective (after MPRV) mode.

The same behavior may manifest for floating-point stores wider than XLEN bits (e.g., the FSD instruction in RV32D)

Delete "floating-point" to match PMP behavior.

If the privilege mode of the access is U and no SPMP entry matches, but at least one SPMP entry is implemented, the access is denied;

Backward compatibility (important, for instance, if Spmp is added to a core supporting the RVA or RVB profiles) requires entering S-mode in a state where all of physical memory (not forbidden by PMP) is RWX in both U and S mode. Such a state is not expressible with the SRWX rules, so some other mechanism must be added to preserve backward compatibility.

shr RWX regions make very little sense outside of the specific backward compatibility use case, so it is probably not useful to support them in general. Instead, recommend finding a bit somewhere (menvcfg/henvcfg or a new SPMP control CSR) for U-mode SPMP default deny (USDD). A S-mode SPMP default deny (SSDD) bit would save use of a SPMP entry in many common situations and parallel the existing mseccfg.MMWP.

Another bit is needed in the presence of Smstateen, this time in mstateen0/hstateen0, since Spmp adds new state readable and writable in S-mode. If the bit is clear all access to Spmp CSRs will cause an illegal or virtual instruction exception.

It is allowed if it satisfies the permission checking with the S, R, W, or X bit corresponding to the access type.

Not an accurate description since the permission checking is a joint function of SRWX, not one bit per type.

If page-based virtual memory is not implemented, or when it is disabled, memory accesses check the SPMP settings synchronously, so no fence is needed.

This is a major divergence from the PMP spec with major implementation consequences!

A PMP can be added cheaply to any implementation with a TLB by performing PMP checks on page table fetches, and on physical addresses prior to installing them in the TLB. Such an implementation will have a natural PMP granularity of one page, and slightly worse WCET than a no-TLB implementation but far better than the worst case of a page table walk, especially if two stages of paging are in use. It will naturally only have one copy of the PMP comparison logic, as opposed to two or more required for an implementation that lacks a TLB, so it would be difficult and costly for it to switch into a mode where all PMP checks are immediate on satp.MODE = Bare.

Requiring sfence.vma x0, x0 after PMP reconfiguration improves ecosystem security by making it easier for implementations with virtual memory support to add a PMP for optional use.

If this is an intentional change based on implementation experience, it needs a better documented rationale.

Since this is a new extension we can simplify this for software by adding a requirement that sfence.vma be implemented for Spmp conformance. It is always a no-op if satp.MODE = Bare.

2.8. Context Switching Optimization

This will be very helpful for "small" RTOS environments where the total number of memory regions in all tasks is less than the number of available hardware registers.

I don't think we can assume this will always be the case; it would represent a substantial amount of waste if hardware was designed with large margins to prevent SPMP swapping. I want SPMP overflow and swapping to result in graceful performance degradation to the extent possible. The main performance pitfall is that since SPMP is active in S-mode, defined to take effect immediately, and can cause instruction fetch to trap, writes to SPMP need to check if the next instruction is valid to fetch with the new SPMP configuration. This will typically involve refetching the next instruction after a SPMP reconfiguration that affects S-mode instruction fetch.

Defining S-stage memory protection to require a sfence.vma x0, x0 regardless of satp.MODE would provide exactly the required semantics, since after sfence.vma the next instruction must be retranslated. I would prefer this if the memory model people are amenable to it.

Otherwise, recommend that software disable the PMP entries that need to be reprogrammed using spmpswitch, update spmpaddr and spmpcfg, then re-enable them at the end. This will result in a total of typically two flushes if the implementation recognizes that changing spmpaddr and spmpcfg cannot affect instruction execution if the PMP entries being modified are not enabled at the time.

Rename to spmpen[i]; CSR names are nouns, not verbs.

Indirect CSR access

I wonder if it would be better to use use 16 siselect slots, dividing sireg into sireg1 for 8-16 spmpcfgN, sireg[2-5] for the 64 spmpaddrN, and sireg6 for spmpswitch and future expansion? The hardware implications are probably not huge but defining anything in terms of division by six is unpleasant.

Chapter 4. Interaction with other proposals

Smepmp and H are no longer "proposals". The interaction with Smepmp is trivial enough that I don't think it needs to be specified. My thoughts on sharing resources between PMP and SPMP are heavily influenced by H-mode support, and I don't think it can afford to be an afterthought.

Spmp adds new S-mode writable state, and as such it needs to allocate a bit in mstateen0/hstateen0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

review of 0.9.1 #15

review of 0.9.1 #15

sorear commented Jan 31, 2024

review of 0.9.1 #15

review of 0.9.1 #15

Comments

sorear commented Jan 31, 2024