Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Oneplus 3T: Please fix broken off screen gestures #21

Open
dustin84 opened this issue Dec 2, 2016 · 13 comments
Open

Oneplus 3T: Please fix broken off screen gestures #21

dustin84 opened this issue Dec 2, 2016 · 13 comments

Comments

@dustin84
Copy link

dustin84 commented Dec 2, 2016

No description provided.

@qwtaming
Copy link

1st you must open the gestures
BLACK_ENBALE_PATH = "/proc/touchpanel/gesture_enable";
FileUtils.writeIntLine(BLACK_ENBALE_PATH, mBlackKeySettingState);
the every bit is for each function:
mDoubleScreenOn = getOffset(mBlackKeySettingState, 7) == 1;
mStartCamera = getOffset(mBlackKeySettingState, 6) == 1;
mMusic_control = getOffset(mBlackKeySettingState, 5) == 1;
mMusic_prev = getOffset(mBlackKeySettingState, 4) == 1;
mMusic_next = getOffset(mBlackKeySettingState, 3) == 1;
mMusic_pause = getOffset(mBlackKeySettingState, 2) == 1;
mMusic_play = getOffset(mBlackKeySettingState, 1) == 1;
mStartFlash = getOffset(mBlackKeySettingState, 0) == 1;
2sec you need too register listen keyevent process
#define KEY_DOUBLE_TAP 249 // double tap to wake
#define KEY_GESTURE_CIRCLE 250 // draw circle to lunch camera
#define KEY_GESTURE_TWO_SWIPE 251 // swipe two finger vertically to play/pause
#define KEY_GESTURE_V 252 // draw v to toggle flashlight
#define KEY_GESTURE_LEFT_V 253 // draw left arrow for previous track
#define KEY_GESTURE_RIGHT_V 254 // draw right arrow for next track

@dustin84
Copy link
Author

No. I'm not jumping through hoops that may or may not work. I have tried something similar and it didn't work. The code is broken and needs to be fixed. If devs on XDA can fix it, I don't see why OP can't.

@qwtaming
Copy link

OP offer two methodes open gesture. First u can open gesture by default in file kernel/msm-4.4/drivers/input/touchscreen/synaptics_driver_s3320.c .Second u can add the switch into the setting. OP use the second. u can banding which app too open,example v open flashlight or other app. so OP no too banding whith it. It‘s easy to change and use it.

@dustin84
Copy link
Author

You can change any file you want or add it to the settings, but they won't work with a broken kernel. And the kernel is broken because the source is broken.

@qwtaming
Copy link

The kernel is ok. Beforce release have test.I think you problem is no enable the gestrure.

@dustin84
Copy link
Author

It's broken. If you don't believe me go find any dev on XDA and ask them if it's broken. They will tell you it is. Or build it from source yourself, flash it to 3.5.3 or 3.5.4, and see for yourself.

@brucechiu1113
Copy link

@dustin84 Dev's solution for this issue is: http://pastebin.com/0Enek94f , right? If so, we could look into it for detailed information.

@dustin84
Copy link
Author

Yes, I think that is the same fix as this: engstk/op3t@1f8a6d0
The one I linked is the one I cherry-picked to fix it but I believe they are the same.

@amartinz
Copy link

amartinz commented Dec 30, 2016

Edit: scratch that, key handler uses scan code, overlooked that.
still somehow it does not fully like to work.

My fault was, it checked if CM setup wizard completed but i do not ship it...


on my device, i get the proper scan code, but it should get it as key code instead.

oneplus3:/ # getevent -l                                                   
add device 1: /dev/input/event7
  name:     "msm8996-tasha-mtp-snd-card Button Jack"
add device 2: /dev/input/event6
  name:     "msm8996-tasha-mtp-snd-card Headset Jack"
add device 3: /dev/input/event2
  name:     "fpc1020"
add device 4: /dev/input/event0
  name:     "qpnp_pon"
could not get driver version for /dev/input/mice, Not a typewriter
add device 5: /dev/input/event1
  name:     "tri-state-key"
add device 6: /dev/input/event4
  name:     "gpio-keys"
add device 7: /dev/input/event5
  name:     "synaptics"
add device 8: /dev/input/event3
  name:     "synaptics,s3320"
/dev/input/event3: EV_SYN       0004                 00000072            
/dev/input/event3: EV_SYN       0005                 0dcad92e            
/dev/input/event3: EV_SYN       SYN_REPORT           00000000            
/dev/input/event3: EV_KEY       00fc                 DOWN                
/dev/input/event3: EV_SYN       SYN_REPORT           00000000            
/dev/input/event3: EV_KEY       00fc                 UP                  
/dev/input/event3: EV_SYN       SYN_REPORT           00000000 

i did the (V) gesture, 0xFC -> 252

but the key handler at CM / Lineage based roms expects a key code, which in this case is then 0x00 :(

12-30 21:23:50.130  1278  1601 D InputDispatcher: notifyKey - eventTime=217835621000, deviceId=8, source=0x101, policyFlags=0x2, action=0x0, flags=0x8, keyCode=0x0, scanCode=0xfc, metaState=0x0, downTime=217835621000
12-30 21:23:50.135  1278  1600 D InputDispatcher: dispatchKey - eventTime=217835621000, deviceId=8, source=0x101, policyFlags=0x2000002, action=0x0, flags=0x48, keyCode=0x0, scanCode=0xfc, metaState=0x0, repeatCount=0, downTime=217835621000
12-30 21:23:50.135  1278  1601 D InputDispatcher: notifyKey - eventTime=217835652000, deviceId=8, source=0x101, policyFlags=0x2, action=0x1, flags=0x8, keyCode=0x0, scanCode=0xfc, metaState=0x0, downTime=217835621000
12-30 21:23:50.135  1278  1600 D InputDispatcher: Dropped event because policy consumed it.
12-30 21:23:50.137  1278  1600 D InputDispatcher: dispatchKey - eventTime=217835652000, deviceId=8, source=0x101, policyFlags=0x2000002, action=0x1, flags=0x48, keyCode=0x0, scanCode=0xfc, metaState=0x0, repeatCount=0, downTime=217835621000
12-30 21:23:50.137  1278  1600 D InputDispatcher: Dropped event because policy consumed it.


so i created a custom key layout for the synaptics driver synaptics_s3320.kl which goes into /system/usr/keylayout/

key 250   TV_INPUT_COMPONENT_2 VIRTUAL
key 251   TV_INPUT_VGA_1 VIRTUAL
key 252   TV_AUDIO_DESCRIPTION VIRTUAL
key 253   TV_AUDIO_DESCRIPTION_MIX_UP VIRTUAL
key 254   TV_AUDIO_DESCRIPTION_MIX_DOWN VIRTUAL

These key codes map to 250, 251 ... and so on.


Using that, it allows to detect those keys.
I get the haptic feedback vibration BUT it does not do anything else

adb shell dumpsys input

...

  Device 7: synaptics
    Generation: 6
    IsExternal: false
    HasMic:     false
    Sources: 0x00000101
    KeyboardType: 1
    Keyboard Input Mapper:
      Parameters:
        HasAssociatedDisplay: true
        OrientationAware: true
        HandlesKeyRepeat: false
      KeyboardType: 1
      Orientation: 0
      KeyDowns: 0 keys currently down
      MetaState: 0x0
      DownTime: 0
  Device 8: synaptics,s3320
    Generation: 20
    IsExternal: false
    HasMic:     false
    Sources: 0x00001103
    KeyboardType: 1
    Motion Ranges:
      X: source=0x00001002, min=0.000, max=1079.000, flat=0.000, fuzz=0.000, resolution=0.000
      Y: source=0x00001002, min=0.000, max=1919.000, flat=0.000, fuzz=0.000, resolution=0.000
      PRESSURE: source=0x00001002, min=0.000, max=1.000, flat=0.000, fuzz=0.000, resolution=0.000
      SIZE: source=0x00001002, min=0.000, max=1.000, flat=0.000, fuzz=0.000, resolution=0.000
      TOUCH_MAJOR: source=0x00001002, min=0.000, max=2202.907, flat=0.000, fuzz=0.000, resolution=0.000
      TOUCH_MINOR: source=0x00001002, min=0.000, max=2202.907, flat=0.000, fuzz=0.000, resolution=0.000
      TOOL_MAJOR: source=0x00001002, min=0.000, max=2202.907, flat=0.000, fuzz=0.000, resolution=0.000
      TOOL_MINOR: source=0x00001002, min=0.000, max=2202.907, flat=0.000, fuzz=0.000, resolution=0.000
    Keyboard Input Mapper:
      Parameters:
        HasAssociatedDisplay: true
        OrientationAware: true
        HandlesKeyRepeat: false
      KeyboardType: 1
      Orientation: 0
      KeyDowns: 0 keys currently down
      MetaState: 0x0
      DownTime: 217835621000
    Touch Input Mapper:
      Parameters:
        GestureMode: multi-touch
        DeviceType: touchScreen
        AssociatedDisplay: hasAssociatedDisplay=true, isExternal=false
        OrientationAware: true
      Raw Touch Axes:
        X: min=0, max=1079, flat=0, fuzz=0, resolution=0
        Y: min=0, max=1919, flat=0, fuzz=0, resolution=0
        Pressure: unknown range
        TouchMajor: min=0, max=255, flat=0, fuzz=0, resolution=0
        TouchMinor: min=0, max=255, flat=0, fuzz=0, resolution=0
        ToolMajor: unknown range
        ToolMinor: unknown range
        Orientation: unknown range
        Distance: unknown range
        TiltX: unknown range
        TiltY: unknown range
        TrackingId: min=0, max=65535, flat=0, fuzz=0, resolution=0
        Slot: min=0, max=9, flat=0, fuzz=0, resolution=0
      Calibration:
        touch.size.calibration: geometric
        touch.pressure.calibration: none
        touch.orientation.calibration: none
        touch.distance.calibration: none
        touch.coverage.calibration: none
      Affine Transformation:
        X scale: 1.000
        X ymix: 0.000
        X offset: 0.000
        Y xmix: 0.000
        Y scale: 1.000
        Y offset: 0.000
      Viewport: displayId=0, orientation=0, logicalFrame=[0, 0, 1080, 1920], physicalFrame=[0, 0, 1080, 1920], deviceSize=[1080, 1920]
      SurfaceWidth: 1080px
      SurfaceHeight: 1920px
      SurfaceLeft: 0
      SurfaceTop: 0
      SurfaceOrientation: 0
      Translation and Scaling Factors:
        XTranslate: 0.000
        YTranslate: 0.000
        XScale: 1.000
        YScale: 1.000
        XPrecision: 1.000
        YPrecision: 1.000
        GeometricScale: 1.000
        PressureScale: 0.000
        SizeScale: 0.004
        OrientationScale: 0.000
        DistanceScale: 0.000
        HaveTilt: false
        TiltXCenter: 0.000
        TiltXScale: 0.000
        TiltYCenter: 0.000
        TiltYScale: 0.000
      Last Raw Button State: 0x00000000
      Last Raw Touch: pointerCount=0
      Last Cooked Button State: 0x00000000
      Last Cooked Touch: pointerCount=0
      Stylus Fusion:
        ExternalStylusConnected: false
        External Stylus ID: -1
        External Stylus Data Timeout: 9223372036854775807
      External Stylus State:
        When: 9223372036854775807
        Pressure: 0.000000
        Button State: 0x00000000
        Tool Type: 0
  Configuration:
    ExcludedDeviceNames: []
    VirtualKeyQuietTime: 0.0ms
    PointerVelocityControlParameters: scale=1.000, lowThreshold=500.000, highThreshold=3000.000, acceleration=3.000
    WheelVelocityControlParameters: scale=1.000, lowThreshold=15.000, highThreshold=50.000, acceleration=4.000
    PointerGesture:
      Enabled: true
      QuietInterval: 100.0ms
      DragMinSwitchSpeed: 50.0px/s
      TapInterval: 150.0ms
      TapDragInterval: 300.0ms
      TapSlop: 20.0px
      MultitouchSettleInterval: 100.0ms
      MultitouchMinDistance: 15.0px
      SwipeTransitionAngleCosine: 0.3
      SwipeMaxWidthRatio: 0.2
      MovementSpeedRatio: 0.8
      ZoomSpeedRatio: 0.3

...

While on another device, using the same zip to install the rom, everything works without doing any changes.

Using the above patches linked here do not show any difference too.

NewEraCracker pushed a commit to NewEraCracker/android_kernel_oneplus_msm8996 that referenced this issue Jan 11, 2017
tests/generic/251 of fstest suit complains us with below message:

------------[ cut here ]------------
invalid opcode: 0000 [OnePlusOSS#1] PREEMPT SMP
CPU: 2 PID: 7698 Comm: fstrim Tainted: G           O    4.7.0+ OnePlusOSS#21
task: e9f4e000 task.stack: e7262000
EIP: 0060:[<f89fcefe>] EFLAGS: 00010202 CPU: 2
EIP is at write_checkpoint+0xfde/0x1020 [f2fs]
EAX: f33eb300 EBX: eecac310 ECX: 00000001 EDX: ffff0001
ESI: eecac000 EDI: eecac5f0 EBP: e7263dec ESP: e7263d18
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0: 80050033 CR2: b76ab01c CR3: 2eb89de0 CR4: 000406f0
Stack:
 00000001 a220fb7b e9f4e000 00000002 419ff2d3 b3a05151 00000002 e9f4e5d8
 e9f4e000 419ff2d3 b3a05151 eecac310 c10b8154 b3a05151 419ff2d3 c10b78bd
 e9f4e000 e9f4e000 e9f4e5d8 00000001 e9f4e000 ec409000 eecac2cc eecac288
Call Trace:
 [<c10b8154>] ? __lock_acquire+0x3c4/0x760
 [<c10b78bd>] ? mark_held_locks+0x5d/0x80
 [<f8a10632>] f2fs_trim_fs+0x1c2/0x2e0 [f2fs]
 [<f89e9f56>] f2fs_ioctl+0x6b6/0x10b0 [f2fs]
 [<c13d51df>] ? __this_cpu_preempt_check+0xf/0x20
 [<c10b4281>] ? trace_hardirqs_off_caller+0x91/0x120
 [<f89e98a0>] ? __exchange_data_block+0xd30/0xd30 [f2fs]
 [<c120b2e1>] do_vfs_ioctl+0x81/0x7f0
 [<c11d57c5>] ? kmem_cache_free+0x245/0x2e0
 [<c1217840>] ? get_unused_fd_flags+0x40/0x40
 [<c1206eec>] ? putname+0x4c/0x50
 [<c11f631e>] ? do_sys_open+0x16e/0x1d0
 [<c1001990>] ? do_fast_syscall_32+0x30/0x1c0
 [<c13d51df>] ? __this_cpu_preempt_check+0xf/0x20
 [<c120baa8>] SyS_ioctl+0x58/0x80
 [<c1001a01>] do_fast_syscall_32+0xa1/0x1c0
 [<c178cc54>] sysenter_past_esp+0x45/0x74
EIP: [<f89fcefe>] write_checkpoint+0xfde/0x1020 [f2fs] SS:ESP 0068:e7263d18
---[ end trace 4de95d7e6b3aa7c6 ]---

The reason is: with below call stack, we will encounter BUG_ON during
doing fstrim.

Thread A				Thread B
- write_checkpoint
 - do_checkpoint
					- f2fs_write_inode
					 - update_inode_page
					  - update_inode
					   - set_page_dirty
					    - f2fs_set_node_page_dirty
					     - inc_page_count
					      - percpu_counter_inc
					      - set_sbi_flag(SBI_IS_DIRTY)
  - clear_sbi_flag(SBI_IS_DIRTY)

Thread C				Thread D
- f2fs_write_node_page
 - set_node_addr
  - __set_nat_cache_dirty
   - nm_i->dirty_nat_cnt++
					- do_vfs_ioctl
					 - f2fs_ioctl
					  - f2fs_trim_fs
					   - write_checkpoint
					    - f2fs_bug_on(nm_i->dirty_nat_cnt)

Fix it by setting superblock dirty correctly in do_checkpoint and
f2fs_write_node_page.

Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>
NewEraCracker pushed a commit to NewEraCracker/android_kernel_oneplus_msm8996 that referenced this issue Apr 20, 2017
commit 45caeaa5ac0b4b11784ac6f932c0ad4c6b67cda0 upstream.

As Eric Dumazet pointed out this also needs to be fixed in IPv6.
v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.

We have seen a few incidents lately where a dst_enty has been freed
with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
dst_entry. If the conditions/timings are right a crash then ensues when the
freed dst_entry is referenced later on. A Common crashing back trace is:

 OnePlusOSS#8 [] page_fault at ffffffff8163e648
    [exception RIP: __tcp_ack_snd_check+74]
.
.
 OnePlusOSS#9 [] tcp_rcv_established at ffffffff81580b64
OnePlusOSS#10 [] tcp_v4_do_rcv at ffffffff8158b54a
OnePlusOSS#11 [] tcp_v4_rcv at ffffffff8158cd02
OnePlusOSS#12 [] ip_local_deliver_finish at ffffffff815668f4
OnePlusOSS#13 [] ip_local_deliver at ffffffff81566bd9
OnePlusOSS#14 [] ip_rcv_finish at ffffffff8156656d
OnePlusOSS#15 [] ip_rcv at ffffffff81566f06
OnePlusOSS#16 [] __netif_receive_skb_core at ffffffff8152b3a2
OnePlusOSS#17 [] __netif_receive_skb at ffffffff8152b608
OnePlusOSS#18 [] netif_receive_skb at ffffffff8152b690
OnePlusOSS#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
OnePlusOSS#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
OnePlusOSS#21 [] net_rx_action at ffffffff8152bac2
OnePlusOSS#22 [] __do_softirq at ffffffff81084b4f
OnePlusOSS#23 [] call_softirq at ffffffff8164845c
OnePlusOSS#24 [] do_softirq at ffffffff81016fc5
OnePlusOSS#25 [] irq_exit at ffffffff81084ee5
OnePlusOSS#26 [] do_IRQ at ffffffff81648ff8

Of course it may happen with other NIC drivers as well.

It's found the freed dst_entry here:

 224 static bool tcp_in_quickack_mode(struct sock *sk)↩
 225 {↩
 226 ▹       const struct inet_connection_sock *icsk = inet_csk(sk);↩
 227 ▹       const struct dst_entry *dst = __sk_dst_get(sk);↩
 228 ↩
 229 ▹       return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩
 230 ▹       ▹       (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩
 231 }↩

But there are other backtraces attributed to the same freed dst_entry in
netfilter code as well.

All the vmcores showed 2 significant clues:

- Remote hosts behind the default gateway had always been redirected to a
different gateway. A rtable/dst_entry will be added for that host. Making
more dst_entrys with lower reference counts. Making this more probable.

- All vmcores showed a postitive LockDroppedIcmps value, e.g:

LockDroppedIcmps                  267

A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
regardless of whether user space has the socket locked. This can result in a
race condition where the same dst_entry cached in sk->sk_dst_entry can be
decremented twice for the same socket via:

do_redirect()->__sk_dst_check()-> dst_release().

Which leads to the dst_entry being prematurely freed with another socket
pointing to it via sk->sk_dst_cache and a subsequent crash.

To fix this skip do_redirect() if usespace has the socket locked. Instead let
the redirect take place later when user space does not have the socket
locked.

The dccp/IPv6 code is very similar in this respect, so fixing it there too.

As Eric Garver pointed out the following commit now invalidates routes. Which
can set the dst->obsolete flag so that ipv4_dst_check() returns null and
triggers the dst_release().

Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.")
Cc: Eric Garver <[email protected]>
Cc: Hannes Sowa <[email protected]>
Signed-off-by: Jon Maxwell <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
@maddalakum
Copy link

Can any one help the how exactly fix this issue

@dustin84
Copy link
Author

dustin84 commented Jun 10, 2017

@maddalakum I have submitted a pull request. Just look there.

@maddalakum
Copy link

Can you please tell this path didnt exist

/android/kernel/drivers/input/touchscreen/synaptic_s3320.c

@dorimanx
Copy link

Try drivers/input/touchscreen/synaptics_driver_s3320.c

NewEraCracker pushed a commit to NewEraCracker/android_kernel_oneplus_msm8996 that referenced this issue Jul 13, 2017
commit 4dfce57db6354603641132fac3c887614e3ebe81 upstream.

There have been several reports over the years of NULL pointer
dereferences in xfs_trans_log_inode during xfs_fsr processes,
when the process is doing an fput and tearing down extents
on the temporary inode, something like:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
PID: 29439  TASK: ffff880550584fa0  CPU: 6   COMMAND: "xfs_fsr"
    [exception RIP: xfs_trans_log_inode+0x10]
 OnePlusOSS#9 [ffff8800a57bbbe0] xfs_bunmapi at ffffffffa037398e [xfs]
OnePlusOSS#10 [ffff8800a57bbce8] xfs_itruncate_extents at ffffffffa0391b29 [xfs]
OnePlusOSS#11 [ffff8800a57bbd88] xfs_inactive_truncate at ffffffffa0391d0c [xfs]
OnePlusOSS#12 [ffff8800a57bbdb8] xfs_inactive at ffffffffa0392508 [xfs]
OnePlusOSS#13 [ffff8800a57bbdd8] xfs_fs_evict_inode at ffffffffa035907e [xfs]
OnePlusOSS#14 [ffff8800a57bbe00] evict at ffffffff811e1b67
OnePlusOSS#15 [ffff8800a57bbe28] iput at ffffffff811e23a5
OnePlusOSS#16 [ffff8800a57bbe58] dentry_kill at ffffffff811dcfc8
OnePlusOSS#17 [ffff8800a57bbe88] dput at ffffffff811dd06c
OnePlusOSS#18 [ffff8800a57bbea8] __fput at ffffffff811c823b
OnePlusOSS#19 [ffff8800a57bbef0] ____fput at ffffffff811c846e
OnePlusOSS#20 [ffff8800a57bbf00] task_work_run at ffffffff81093b27
OnePlusOSS#21 [ffff8800a57bbf30] do_notify_resume at ffffffff81013b0c
OnePlusOSS#22 [ffff8800a57bbf50] int_signal at ffffffff8161405d

As it turns out, this is because the i_itemp pointer, along
with the d_ops pointer, has been overwritten with zeros
when we tear down the extents during truncate.  When the in-core
inode fork on the temporary inode used by xfs_fsr was originally
set up during the extent swap, we mistakenly looked at di_nextents
to determine whether all extents fit inline, but this misses extents
generated by speculative preallocation; we should be using if_bytes
instead.

This mistake corrupts the in-memory inode, and code in
xfs_iext_remove_inline eventually gets bad inputs, causing
it to memmove and memset incorrect ranges; this became apparent
because the two values in ifp->if_u2.if_inline_ext[1] contained
what should have been in d_ops and i_itemp; they were memmoved due
to incorrect array indexing and then the original locations
were zeroed with memset, again due to an array overrun.

Fix this by properly using i_df.if_bytes to determine the number
of extents, not di_nextents.

Thanks to dchinner for looking at this with me and spotting the
root cause.

[nborisov: backported to 4.4]

Signed-off-by: Eric Sandeen <[email protected]>
Reviewed-by: Brian Foster <[email protected]>
Signed-off-by: Dave Chinner <[email protected]>
Signed-off-by: Nikolay Borisov <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Martinusbe referenced this issue in GZR-Kernels/kernel_oneplus_msm8996 Jul 24, 2017
commit 45caeaa5ac0b4b11784ac6f932c0ad4c6b67cda0 upstream.

As Eric Dumazet pointed out this also needs to be fixed in IPv6.
v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.

We have seen a few incidents lately where a dst_enty has been freed
with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
dst_entry. If the conditions/timings are right a crash then ensues when the
freed dst_entry is referenced later on. A Common crashing back trace is:

 #8 [] page_fault at ffffffff8163e648
    [exception RIP: __tcp_ack_snd_check+74]
.
.
 #9 [] tcp_rcv_established at ffffffff81580b64
#10 [] tcp_v4_do_rcv at ffffffff8158b54a
#11 [] tcp_v4_rcv at ffffffff8158cd02
#12 [] ip_local_deliver_finish at ffffffff815668f4
#13 [] ip_local_deliver at ffffffff81566bd9
#14 [] ip_rcv_finish at ffffffff8156656d
#15 [] ip_rcv at ffffffff81566f06
#16 [] __netif_receive_skb_core at ffffffff8152b3a2
#17 [] __netif_receive_skb at ffffffff8152b608
#18 [] netif_receive_skb at ffffffff8152b690
#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
#21 [] net_rx_action at ffffffff8152bac2
#22 [] __do_softirq at ffffffff81084b4f
#23 [] call_softirq at ffffffff8164845c
#24 [] do_softirq at ffffffff81016fc5
#25 [] irq_exit at ffffffff81084ee5
#26 [] do_IRQ at ffffffff81648ff8

Of course it may happen with other NIC drivers as well.

It's found the freed dst_entry here:

 224 static bool tcp_in_quickack_mode(struct sock *sk)↩
 225 {↩
 226 ▹       const struct inet_connection_sock *icsk = inet_csk(sk);↩
 227 ▹       const struct dst_entry *dst = __sk_dst_get(sk);↩
 228 ↩
 229 ▹       return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩
 230 ▹       ▹       (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩
 231 }↩

But there are other backtraces attributed to the same freed dst_entry in
netfilter code as well.

All the vmcores showed 2 significant clues:

- Remote hosts behind the default gateway had always been redirected to a
different gateway. A rtable/dst_entry will be added for that host. Making
more dst_entrys with lower reference counts. Making this more probable.

- All vmcores showed a postitive LockDroppedIcmps value, e.g:

LockDroppedIcmps                  267

A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
regardless of whether user space has the socket locked. This can result in a
race condition where the same dst_entry cached in sk->sk_dst_entry can be
decremented twice for the same socket via:

do_redirect()->__sk_dst_check()-> dst_release().

Which leads to the dst_entry being prematurely freed with another socket
pointing to it via sk->sk_dst_cache and a subsequent crash.

To fix this skip do_redirect() if usespace has the socket locked. Instead let
the redirect take place later when user space does not have the socket
locked.

The dccp/IPv6 code is very similar in this respect, so fixing it there too.

As Eric Garver pointed out the following commit now invalidates routes. Which
can set the dst->obsolete flag so that ipv4_dst_check() returns null and
triggers the dst_release().

Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.")
Cc: Eric Garver <[email protected]>
Cc: Hannes Sowa <[email protected]>
Signed-off-by: Jon Maxwell <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

Signed-off-by: Martinusbe <[email protected]>
Martinusbe referenced this issue in GZR-Kernels/kernel_oneplus_msm8996 Jul 24, 2017
commit 4dfce57db6354603641132fac3c887614e3ebe81 upstream.

There have been several reports over the years of NULL pointer
dereferences in xfs_trans_log_inode during xfs_fsr processes,
when the process is doing an fput and tearing down extents
on the temporary inode, something like:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
PID: 29439  TASK: ffff880550584fa0  CPU: 6   COMMAND: "xfs_fsr"
    [exception RIP: xfs_trans_log_inode+0x10]
 #9 [ffff8800a57bbbe0] xfs_bunmapi at ffffffffa037398e [xfs]
#10 [ffff8800a57bbce8] xfs_itruncate_extents at ffffffffa0391b29 [xfs]
#11 [ffff8800a57bbd88] xfs_inactive_truncate at ffffffffa0391d0c [xfs]
#12 [ffff8800a57bbdb8] xfs_inactive at ffffffffa0392508 [xfs]
#13 [ffff8800a57bbdd8] xfs_fs_evict_inode at ffffffffa035907e [xfs]
#14 [ffff8800a57bbe00] evict at ffffffff811e1b67
#15 [ffff8800a57bbe28] iput at ffffffff811e23a5
#16 [ffff8800a57bbe58] dentry_kill at ffffffff811dcfc8
#17 [ffff8800a57bbe88] dput at ffffffff811dd06c
#18 [ffff8800a57bbea8] __fput at ffffffff811c823b
#19 [ffff8800a57bbef0] ____fput at ffffffff811c846e
#20 [ffff8800a57bbf00] task_work_run at ffffffff81093b27
#21 [ffff8800a57bbf30] do_notify_resume at ffffffff81013b0c
#22 [ffff8800a57bbf50] int_signal at ffffffff8161405d

As it turns out, this is because the i_itemp pointer, along
with the d_ops pointer, has been overwritten with zeros
when we tear down the extents during truncate.  When the in-core
inode fork on the temporary inode used by xfs_fsr was originally
set up during the extent swap, we mistakenly looked at di_nextents
to determine whether all extents fit inline, but this misses extents
generated by speculative preallocation; we should be using if_bytes
instead.

This mistake corrupts the in-memory inode, and code in
xfs_iext_remove_inline eventually gets bad inputs, causing
it to memmove and memset incorrect ranges; this became apparent
because the two values in ifp->if_u2.if_inline_ext[1] contained
what should have been in d_ops and i_itemp; they were memmoved due
to incorrect array indexing and then the original locations
were zeroed with memset, again due to an array overrun.

Fix this by properly using i_df.if_bytes to determine the number
of extents, not di_nextents.

Thanks to dchinner for looking at this with me and spotting the
root cause.

[nborisov: backported to 4.4]

Signed-off-by: Eric Sandeen <[email protected]>
Reviewed-by: Brian Foster <[email protected]>
Signed-off-by: Dave Chinner <[email protected]>
Signed-off-by: Nikolay Borisov <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

Signed-off-by: Martinusbe <[email protected]>
nathanchance pushed a commit to android-linux-stable/op3 that referenced this issue May 30, 2018
[ Upstream commit 2bbea6e117357d17842114c65e9a9cf2d13ae8a3 ]

when mounting an ISO filesystem sometimes (very rarely)
the system hangs because of a race condition between two tasks.

PID: 6766   TASK: ffff88007b2a6dd0  CPU: 0   COMMAND: "mount"
 #0 [ffff880078447ae0] __schedule at ffffffff8168d605
 #1 [ffff880078447b48] schedule_preempt_disabled at ffffffff8168ed49
 #2 [ffff880078447b58] __mutex_lock_slowpath at ffffffff8168c995
 OnePlusOSS#3 [ffff880078447bb8] mutex_lock at ffffffff8168bdef
 OnePlusOSS#4 [ffff880078447bd0] sr_block_ioctl at ffffffffa00b6818 [sr_mod]
 OnePlusOSS#5 [ffff880078447c10] blkdev_ioctl at ffffffff812fea50
 OnePlusOSS#6 [ffff880078447c70] ioctl_by_bdev at ffffffff8123a8b3
 OnePlusOSS#7 [ffff880078447c90] isofs_fill_super at ffffffffa04fb1e1 [isofs]
 OnePlusOSS#8 [ffff880078447da8] mount_bdev at ffffffff81202570
 OnePlusOSS#9 [ffff880078447e18] isofs_mount at ffffffffa04f9828 [isofs]
OnePlusOSS#10 [ffff880078447e28] mount_fs at ffffffff81202d09
OnePlusOSS#11 [ffff880078447e70] vfs_kern_mount at ffffffff8121ea8f
OnePlusOSS#12 [ffff880078447ea8] do_mount at ffffffff81220fee
OnePlusOSS#13 [ffff880078447f28] sys_mount at ffffffff812218d6
OnePlusOSS#14 [ffff880078447f80] system_call_fastpath at ffffffff81698c49
    RIP: 00007fd9ea914e9a  RSP: 00007ffd5d9bf648  RFLAGS: 00010246
    RAX: 00000000000000a5  RBX: ffffffff81698c49  RCX: 0000000000000010
    RDX: 00007fd9ec2bc210  RSI: 00007fd9ec2bc290  RDI: 00007fd9ec2bcf30
    RBP: 0000000000000000   R8: 0000000000000000   R9: 0000000000000010
    R10: 00000000c0ed0001  R11: 0000000000000206  R12: 00007fd9ec2bc040
    R13: 00007fd9eb6b2380  R14: 00007fd9ec2bc210  R15: 00007fd9ec2bcf30
    ORIG_RAX: 00000000000000a5  CS: 0033  SS: 002b

This task was trying to mount the cdrom.  It allocated and configured a
super_block struct and owned the write-lock for the super_block->s_umount
rwsem. While exclusively owning the s_umount lock, it called
sr_block_ioctl and waited to acquire the global sr_mutex lock.

PID: 6785   TASK: ffff880078720fb0  CPU: 0   COMMAND: "systemd-udevd"
 #0 [ffff880078417898] __schedule at ffffffff8168d605
 #1 [ffff880078417900] schedule at ffffffff8168dc59
 #2 [ffff880078417910] rwsem_down_read_failed at ffffffff8168f605
 OnePlusOSS#3 [ffff880078417980] call_rwsem_down_read_failed at ffffffff81328838
 OnePlusOSS#4 [ffff8800784179d0] down_read at ffffffff8168cde0
 OnePlusOSS#5 [ffff8800784179e8] get_super at ffffffff81201cc7
 OnePlusOSS#6 [ffff880078417a10] __invalidate_device at ffffffff8123a8de
 OnePlusOSS#7 [ffff880078417a40] flush_disk at ffffffff8123a94b
 OnePlusOSS#8 [ffff880078417a88] check_disk_change at ffffffff8123ab50
 OnePlusOSS#9 [ffff880078417ab0] cdrom_open at ffffffffa00a29e1 [cdrom]
OnePlusOSS#10 [ffff880078417b68] sr_block_open at ffffffffa00b6f9b [sr_mod]
OnePlusOSS#11 [ffff880078417b98] __blkdev_get at ffffffff8123ba86
OnePlusOSS#12 [ffff880078417bf0] blkdev_get at ffffffff8123bd65
OnePlusOSS#13 [ffff880078417c78] blkdev_open at ffffffff8123bf9b
OnePlusOSS#14 [ffff880078417c90] do_dentry_open at ffffffff811fc7f7
OnePlusOSS#15 [ffff880078417cd8] vfs_open at ffffffff811fc9cf
OnePlusOSS#16 [ffff880078417d00] do_last at ffffffff8120d53d
OnePlusOSS#17 [ffff880078417db0] path_openat at ffffffff8120e6b2
OnePlusOSS#18 [ffff880078417e48] do_filp_open at ffffffff8121082b
OnePlusOSS#19 [ffff880078417f18] do_sys_open at ffffffff811fdd33
OnePlusOSS#20 [ffff880078417f70] sys_open at ffffffff811fde4e
OnePlusOSS#21 [ffff880078417f80] system_call_fastpath at ffffffff81698c49
    RIP: 00007f29438b0c20  RSP: 00007ffc76624b78  RFLAGS: 00010246
    RAX: 0000000000000002  RBX: ffffffff81698c49  RCX: 0000000000000000
    RDX: 00007f2944a5fa70  RSI: 00000000000a0800  RDI: 00007f2944a5fa70
    RBP: 00007f2944a5f540   R8: 0000000000000000   R9: 0000000000000020
    R10: 00007f2943614c40  R11: 0000000000000246  R12: ffffffff811fde4e
    R13: ffff880078417f78  R14: 000000000000000c  R15: 00007f2944a4b010
    ORIG_RAX: 0000000000000002  CS: 0033  SS: 002b

This task tried to open the cdrom device, the sr_block_open function
acquired the global sr_mutex lock. The call to check_disk_change()
then saw an event flag indicating a possible media change and tried
to flush any cached data for the device.
As part of the flush, it tried to acquire the super_block->s_umount
lock associated with the cdrom device.
This was the same super_block as created and locked by the previous task.

The first task acquires the s_umount lock and then the sr_mutex_lock;
the second task acquires the sr_mutex_lock and then the s_umount lock.

This patch fixes the issue by moving check_disk_change() out of
cdrom_open() and let the caller take care of it.

Signed-off-by: Maurizio Lombardi <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants