[Draft] GPA Protection Implementation #35

Taogle2018 · 2018-03-21T22:52:28Z

Hi, Ning,

This is the latest implementation of GPA_PROT call. The whole implementation is split into two. One covers protection using EPT while the other covers protection from access from driver itself. The second part is not tested.
It could be used as a base for discussion.
BTW: I did not finish mac part. But it does not sound a road blocker now.

Haitao

Taogle2018 · 2018-03-21T22:59:52Z

https://android-review.googlesource.com/q/topic:%22On-demand+Snapshot+Windows%22+(status:open%20OR%20status:merged)
This is emulator side patch. However, it is not up-to-date. At least, gpa protection bit definition needs to be updated in order to make it work with driver side changes.

raphaelning

Thanks, Haitao! In general this patch looks pretty clean. So far my comments are mostly about cosmetic issues. But there are also a couple of more important issues that I'd like to give more thought to. If you can explain your design decisions, that would be great.

raphaelning · 2018-03-22T14:53:04Z

core/include/vm.h

@@ -109,7 +109,8 @@ enum exit_status {
    HAX_EXIT_HLT,
    HAX_EXIT_STATECHANGE,
    HAX_EXIT_PAUSED,
-    HAX_EXIT_FAST_MMIO
+    HAX_EXIT_FAST_MMIO,
+    HAX_EXIT_GPAPROT


How about HAX_EXIT_PAGEFAULT? I think that sounds more intuitive and corresponds to HVF's EXIT_REASON_EPT_FAULT. One could argue that we don't allow the user space to handle every EPT page fault, but probably nor does HVF.

Sounds good!

raphaelning · 2018-03-22T14:57:04Z

include/hax_interface.h

@@ -169,6 +174,7 @@ struct hax_module_version {
 #define HAX_CAP_64BIT_RAMBLOCK     (1 << 3)
 #define HAX_CAP_64BIT_SETRAM       (1 << 4)
 #define HAX_CAP_TUNNEL_PAGE        (1 << 5)
+#define HAX_CAP_GPA_PROTECTION     (1 << 6)


How about HAX_CAP_RAM_PROTECTION? We use RAM to refer to the guest RAM in many other places, and so does QEMU.

OK.
The name came from my original design thought about the API. At that time, I was struggling what was used for protection index, hva or gpa. Later I decided to use GPA, although QEMU needs to maintain slots information in this case. But this sounds right direction to go.

We could actually return both the GPA and HVA to QEMU at every HAX_EXIT_PAGEFAULT, and also allow the new IOCTL to be called either with a GPA range or HVA range (e.g. distinguished by a bit in the flags parameter). That way, we wouldn't even need the gpa2hva or hva2gpa conversion functions in QEMU, but it would require some API changes to the current snapshot support code in Android Emulator...

raphaelning · 2018-03-22T15:02:56Z

include/hax_interface.h

@@ -236,6 +242,14 @@ struct hax_set_ram_info2 {
    uint64_t reserved2;
 } PACKED;

+#define HAX_GPA_PROT_MASK    0x7   // one bit each for r/w/e
+#define HAX_GPA_PROT_ALL     0x7   // disable r/w/e all


I was confused at first whether "disable all" means to disallow or allow all accesses. So how about

// No access (R/W/X) is allowed #define HAX_RAM_PERM_NONE 0x0 // All accesses (R/W/X) are allowed #define HAX_RAM_PERM_RWX 0x7 #define HAX_RAM_PERM_MASK 0x7

raphaelning · 2018-03-22T15:05:55Z

include/hax_interface.h

@@ -236,6 +242,14 @@ struct hax_set_ram_info2 {
    uint64_t reserved2;
 } PACKED;

+#define HAX_GPA_PROT_MASK    0x7   // one bit each for r/w/e
+#define HAX_GPA_PROT_ALL     0x7   // disable r/w/e all
+struct hax_gpa_prot_info {


hax_ram_prot_info?

raphaelning · 2018-03-22T15:08:05Z

windows/hax_entry.h

@@ -161,4 +161,8 @@ extern PDRIVER_OBJECT HaxDriverObject;
 #define HAX_VM_IOCTL_NOTIFY_QEMU_VERSION \
        CTL_CODE(HAX_DEVICE_TYPE, 0x910, METHOD_BUFFERED, FILE_ANY_ACCESS)

+/* API version 3.0 */


The current HAXM API version is actually 4. For backward compatibility, since there's already a new capability flag that QEMU can check, do we need to upgrade the API version this time?

I forget why I added this. :( I actually do not know how haxm API version is defined and used.
Perhaps it is something I forget to clean up. Let me remove it.

raphaelning · 2018-03-22T15:37:03Z

core/include/memory.h

+int gpa_space_adjust_prot_bitmap(struct hax_gpa_space *gpa_space,
+                                 uint64 max_gpfn);
+
+int gpa_space_test_prot_bitmap(struct hax_gpa_space *gpa_space, uint64 gfn);


Return type: int => bool, since it returns either true or false.

How about renaming it to gpa_space_is_page_protected? If we ever implement fine-grained protection, we can add is_page_readable, is_page_writable, etc.

raphaelning · 2018-03-22T15:37:18Z

core/include/memory.h

+                                 uint64 max_gpfn);
+
+int gpa_space_test_prot_bitmap(struct hax_gpa_space *gpa_space, uint64 gfn);
+int gpa_space_chunk_protected(struct hax_gpa_space *gpa_space, uint64 gfn,


Return type: int => bool

raphaelning · 2018-03-22T16:02:38Z

include/windows/hax_windows.h

@@ -116,10 +116,10 @@ static inline void hax_mutex_free(hax_mutex lock)
 }

 /* Return true if the bit is set already */
-static int hax_test_and_set_bit(int bit, uint64_t *memory)
+static int hax_test_and_set_bit(uint64 bit, uint64_t *memory)


I have concerns about this change. Is it safe to use a 64-bit integer as an array index (base) on a 32-bit OS? (We still support 32-bit Windows.) We could change int to long, but long is 32 bits on Windows. If the caller could adapt to the current interface, that would be great. After all, the protection bitmap shouldn't be very long, right? If max_gpa is 8GB, the bitmap size will be 256KB.
Anyway, since this change is self-contained, if it's really necessary, we should make a separate patch for it.

raphaelning · 2018-03-22T16:20:17Z

core/gpa_space.c

+    return hax_test_bit(gfn, (uint64 *)pbm->bitmap);
+}
+
+int gpa_space_chunk_protected(struct hax_gpa_space *gpa_space, uint64 gfn,


I see the QEMU patch make use of HAX_CHUNK_SIZE which is currently set to 2MB. When we introduced chunks with EPT2, we considered it an internal implementation detail and didn't expect to expose it to user space. If it's absolutely necessary, we should add an API to allow the user space to query the chunk size.
My question is, is making QEMU aware of HAXM chunks a necessity or an optimization? Admittedly, right now, at each non-MMIO EPT violation, we pin a whole chunk (instead of just the faulting page) in host RAM, and set up multiple EPT entries, so as to reduce the number of future EPT violations. But it's possible to make ept_tree_create_entries() skip the GFNs that are protected, so if the guest accesses one of them, we'll still get a VM exit. If we do that, is there any more reason for QEMU to know about chunks?

In theory, (I mean I did not test it), QEMU does not need to know the chunk size. It can resolve only one page one time, or 1M, etc. It simply means in that case, QEMU would receive more EPT_FAULT_EXIT at the same place with different gpa reported. I did not design it in a way that user space has to rely on the exact chunk size.
For QEMU side patch, I have to agree I write it that way because I have knowledge on HAXM.
I think as long as driver does not make assumptions on user-space implementation, it should be good.

So basically it's an optimization. I'd prefer that we process/fill one guest page per exit to QEMU, which is the same page that hardware (EPT violation) gave us, because

Having user space depend on a driver implementation detail is not a good practice, since the interface is not stable, although it's not as bad as having the driver depend on user space.

Chunks are created in HVA space (by dividing a RAM block), not GPA space. So technically, rounding a GPA down to the nearest 2MB boundary is not guaranteed to give you a GPA that maps to the same chunk.

Some chunks are smaller than 2MB, e.g. in case of a tiny RAM block, or the last chunk in a RAM block, so __gpa < gpa + 0x200000 doesn't always work.

HVF also fills one page at a time and takes this many VM exits. If we fill too many pages at a time, we may be doing kind of eager loading rather than on-demand loading (there's a trade-off).

If the performance is not good enough, we could add the chunk-based optimization in a later patch.

raphaelning · 2018-03-22T16:24:15Z

core/ept2.c

@@ -70,7 +70,8 @@ void ept_handle_mapping_changed(hax_gpa_space_listener *listener,
 }

 int ept_handle_access_violation(hax_gpa_space *gpa_space, hax_ept_tree *tree,
-                                exit_qualification_t qual, uint64 gpa)
+                                exit_qualification_t qual, uint64 gpa,
+                                uint64 *fault_gfn)


So this is the intrusive change you were talking about :) It didn't look very intrusive to me until I peeked at the second patch. I need to spend more time tomorrow to understand the rationale better and think about possible alternatives. But maybe I can ask you this for now: if we don't have to worry about chunks (see my other long comment below), do we still need this extra parameter?

Ah, yes.
I also think about something like:
__try {
hax_vcpu_run();
--------> call RaiseException when we failed to pass protection check
} __except (xxxx) {
//fill related htun and exit
}
But I have two concerns and I do not know the answer:
1> it seems too windows-specific, is it possible for mac (I did not work on mac so far). I believe it is OK not to implement this on mac from google point of view.
2> I am not sure what exception I should raise when checking for protection bitmap. My concern is if I choose existing exceptions defined (page fault, for example) and later a real page fault happened due to bug in code, it could be interpreted as our ram protection fault. And I do not know whether user-defined exception code is acceptable.

My question was probably not clear enough, so you must have mistaken it for something else - but thanks to that, I get to see this very interesting idea in your answer :)

I totally understand that Google's priority is to bring on-demand snapshot loading to Windows, so it's fine to leave out Mac for the QEMU (Android Emulator) side patch. But I still hope the new API we add this time works on both Windows and Mac, with the same behavior, as all the existing HAXM APIs do. So I agree with you that any Windows-specific API design should be avoided.

Back to my original question: I know this new fault_gfn parameter is a result of the call to gpa_space_chunk_protected(). My point is, if we don't worry about chunks and just stick to processing (filling) one page per exit to QEMU (via HAX_EXIT_PAGEFAULT), the pagefault GPA we return to QEMU will be the same as gpa, so there's no need to call gpa_space_chunk_protected() just because it might return a different GPA.

By the way, I've thought about handling HAXM's own accesses to guest RAM, and your "intrusive" approach makes more sense to me. But I also came to realize how complicated the problem is - for example, since it's possible for HAXM to access a protected GPA in the middle of its MMIO or PIO handler (instruction emulation) code, we have to make sure the code before the GPA access does nothing "irreversible", because as soon as we realize the GPA is protected, we'll abort the handler and return to QEMU with the new exit reason (instead of HAX_EXIT_FAST_MMIO or HAX_EXIT_IO), and must be able to emulate the same MMIO/PIO instruction again when QEMU re-executes it.

Taogle2018 added 2 commits March 21, 2018 15:44

IOCTL_GPA_PROT implementation v1.

c48de3e

Protection of HAXM's accessing guest GPA.

c835f4f

raphaelning reviewed Mar 22, 2018

View reviewed changes

intel deleted a comment from xianxiaoyin Aug 23, 2019

intel deleted a comment from yiqisiben Aug 23, 2019

intel deleted a comment from xianxiaoyin Aug 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Draft] GPA Protection Implementation #35

[Draft] GPA Protection Implementation #35

Taogle2018 commented Mar 21, 2018

Taogle2018 commented Mar 21, 2018

raphaelning left a comment

raphaelning Mar 22, 2018

Taogle2018 Mar 23, 2018

raphaelning Mar 22, 2018

Taogle2018 Mar 23, 2018

raphaelning Mar 27, 2018

raphaelning Mar 22, 2018

Taogle2018 Mar 23, 2018

raphaelning Mar 22, 2018

Taogle2018 Mar 23, 2018

raphaelning Mar 22, 2018

Taogle2018 Mar 23, 2018

raphaelning Mar 22, 2018

Taogle2018 Mar 23, 2018

raphaelning Mar 22, 2018

raphaelning Mar 22, 2018

Taogle2018 Mar 26, 2018

raphaelning Mar 22, 2018

Taogle2018 Mar 26, 2018

raphaelning Mar 27, 2018

raphaelning Mar 27, 2018

raphaelning Mar 22, 2018

Taogle2018 Mar 26, 2018

raphaelning Mar 27, 2018

raphaelning Mar 27, 2018

[Draft] GPA Protection Implementation #35

Are you sure you want to change the base?

[Draft] GPA Protection Implementation #35

Conversation

Taogle2018 commented Mar 21, 2018

Taogle2018 commented Mar 21, 2018

raphaelning left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment