Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drops off network #13

Open
Rolzzz opened this issue Jun 20, 2022 · 24 comments
Open

drops off network #13

Rolzzz opened this issue Jun 20, 2022 · 24 comments

Comments

@Rolzzz
Copy link

Rolzzz commented Jun 20, 2022

Hello there, thank you for providing the drivers.
Unfortunately if I do large file transfers (40GB vmdk file) my esxi host drops off the network.
I have to unplug the network cable and then back in for connectivity to resume... tried other cables and switches but the issue persists.

esxcli network nic get -n vmnic0

Advertised Auto Negotiation: true
Advertised Link Modes: 10BaseT/Half, 10BaseT/Full, 100BaseT/Half, 100BaseT/Full, 1000BaseT/Full, 2500BaseX/Full
Auto Negotiation: true
Cable Type: Twisted Pair
Current Message Level: 51
Driver Info:
Bus Info: 0000:02:00.0
Driver: r8125
Firmware Version:
Version: 9.007.01-NAPI
Link Detected: true
Link Status: Up
Name: vmnic0
PHYAddress: 0
Pause Autonegotiate: true
Pause RX: true
Pause TX: true
Supported Ports: TP
Supports Auto Negotiation: true
Supports Pause: true
Supports Wakeon: true
Transceiver: internal
Virtual Address: 00:50:56:5a:e2:95
Wakeon: MagicPacket(tm)

@KajLehtinen
Copy link

Hi!

I have the same issue, what machine are you running your ESXi on? I've tried drivers up to 9.009.01 running on ASUS PN51 - i've read somewhere that a 2.5 GB network card with realtek, although USB based, had heat issues and started dropping connections and throttling down when heat rises. Might be a culprit since its when there is lots of transfer that it happens for me.

/Kaj

@Rolzzz
Copy link
Author

Rolzzz commented Jul 17, 2022

yeah I decided to move away from the onboard nic in my ASUS PN50-E1 so I went with an external usb nic with the
ASIX AX88179 chipset (https://flings.vmware.com/usb-network-native-driver-for-esxi) not ideal, but I wanted a stable esxi box... funnily even though usb this https://www.amazon.com.au/gp/product/B00AQM8586 was actually faster throughput too than native nic with this driver... before it would drop off of course... if driver gets updated I'd be all to happy to test again.

@KajLehtinen
Copy link

KajLehtinen commented Jul 18, 2022

And you have tried the version located here: https://github.com/lengfwang/r8125-esxi6.7 - which seems to be the newest someone has compiled and put up here.

@Rolzzz
Copy link
Author

Rolzzz commented Jul 18, 2022

And you have tried the version located here: https://github.com/lengfwang/r8125-esxi6.7 - which seems to be the newest someone has compiled and put up here.

I have not seen this one and will have to try it 👍

@Rolzzz
Copy link
Author

Rolzzz commented Jul 19, 2022

And you have tried the version located here: https://github.com/lengfwang/r8125-esxi6.7 - which seems to be the newest someone has compiled and put up here.

I have not seen this one and will have to try it 👍

@KajLehtinen sadly still same issue

@Haxiboy
Copy link

Haxiboy commented Aug 16, 2022

And you have tried the version located here: https://github.com/lengfwang/r8125-esxi6.7 - which seems to be the newest someone has compiled and put up here.

I have not seen this one and will have to try it 👍

@KajLehtinen sadly still same issue

I have the same issue, is it really overheating?

@Rolzzz
Copy link
Author

Rolzzz commented Aug 16, 2022

be surprised if an overheating hardware issue, we'd hear more from the normal Windows users if that were the case.

@Haxiboy
Copy link

Haxiboy commented Aug 30, 2022

be surprised if an overheating hardware issue, we'd hear more from the normal Windows users if that were the case.

I tought my issue has gone with lengfwang's fork but it happened today. It could be heat as i noticed it only happens when i put heavy workload on the NIC, after i took off my rack's side panel i had to wait 2 weeks for the issue to happen again. (I turned off the climate in the room next to the rack).
I'll borrow a thermal camera and i'll monitor what's happening around the NIC and the controller, maybe a small heat sink will solve the problem.

@Rolzzz
Copy link
Author

Rolzzz commented Aug 30, 2022

be surprised if an overheating hardware issue, we'd hear more from the normal Windows users if that were the case.

I tought my issue has gone with lengfwang's fork but it happened today. It could be heat as i noticed it only happens when i put heavy workload on the NIC, after i took off my rack's side panel i had to wait 2 weeks for the issue to happen again. (I turned off the climate in the room next to the rack). I'll borrow a thermal camera and i'll monitor what's happening around the NIC and the controller, maybe a small heat sink will solve the problem.

I can get to crash every time I send a 80gb vmdk file over via WinSCP... then have to unplug nic from switch, wait, then plug in and it starts working again... until after x min and my continuation of the WinSCP makes it fall over again.

be interested to hear if you can replicate that...

here is where mine sits in my study... I don't think heat related.
image

@Haxiboy
Copy link

Haxiboy commented Aug 31, 2022

Mine is in a standard 4u rack with a ton of noctua fans. I had issue with an overheating Intel NIC before. But i have a dual gigabit NIC lying around i'll try with that too. Or maybe some load balancing would work. Strange is that we watch movies all day and torrents downloading 24/7 but got the issue only when downloading via sonarr. But it could be a coincidence.

@Sushifix
Copy link

I have the same issue when using the this driver under esxi 6.7 on ASUSTOR AS6702t. Due to this issue, it is not possible to use esxi on the device. Under Windows with this device this do not happen! So I expect an driver issue or configuration issue within this driver. Looking forward to solutions that are found

@jakubsuchybio
Copy link

I have the same problem.
My setup is that i have pfsense inside my esxi host. One intel NIC passthrough into the pfsense, second realtek NIC (onboard) managed by esxi host.
Same behaviour. After heavy load (downloading tens of GBs from steam) connection drops randomly. Only fix is to disconnect network cable and reconnect.
Not gonna wait for fix from drivers side. Will buy new pcie NIC with intel chip and do it that way...

@mcr-ksh
Copy link

mcr-ksh commented Mar 24, 2023

unfortunately the same issue here. drops every time there is load. disable/enable interface on switch reconnects it. I've build my own custom driver for the latest 9.011.00 and the issue persist.

@Rolzzz
Copy link
Author

Rolzzz commented Mar 25, 2023

gave up with the onboard nic... POS for esxi. Got a usbc one and been rock solid ever since.

@chrisp250
Copy link

gave up with the onboard nic... POS for esxi. Got a usbc one and been rock solid ever since.

Can you get a USB NIC without the CPU penalty? I ready somewhere that USB based NICs don't have access to DMA and therefore they load the CPU.

@Rolzzz
Copy link
Author

Rolzzz commented Mar 25, 2023

gave up with the onboard nic... POS for esxi. Got a usbc one and been rock solid ever since.

Can you get a USB NIC without the CPU penalty? I ready somewhere that USB based NICs don't have access to DMA and therefore they load the CPU.

on my home system, I haven't noticed any extra unknown cpu load under normal use... I see cpu go up when I'm downloading some big (high seed) torrent files, but I saw that also on physical boxes before virtualised my torrenting machine.

@mcr-ksh
Copy link

mcr-ksh commented Mar 25, 2023

In the release there are a few scripts mentioned which I cannot find anywhere, nor do I know how to properly turn on/off these settings.
Anyone tried/found them?

/opt/r8125/temp.sh : Show NIC chipset temperature.
/opt/r8125/tx-off.sh: Turn off Tx offloading, when you cannot open guest openwrt web page, or lagging Windows network neighbor file copy.
/opt/r8125/tx-on.sh: Turn on Tx offloading, default.
/opt/r8125/tso-off.sh: Turn off TSO, default.
/opt/r8125/tso-on.sh: Turn on TSO, try this when you have a nice host PC.

@chrisp250
Copy link

gave up with the onboard nic... POS for esxi. Got a usbc one and been rock solid ever since.

Can you get a USB NIC without the CPU penalty? I ready somewhere that USB based NICs don't have access to DMA and therefore they load the CPU.

on my home system, I haven't noticed any extra unknown cpu load under normal use... I see cpu go up when I'm downloading some big (high seed) torrent files, but I saw that also on physical boxes before virtualised my torrenting machine.

No worries. I switched to Proxmox and haven't had an issue since. The driver seems to be a lot more stable in Debian.

@Rolzzz
Copy link
Author

Rolzzz commented Mar 26, 2023

In the release there are a few scripts mentioned which I cannot find anywhere, nor do I know how to properly turn on/off these settings. Anyone tried/found them?

/opt/r8125/temp.sh : Show NIC chipset temperature.
/opt/r8125/tx-off.sh: Turn off Tx offloading, when you cannot open guest openwrt web page, or lagging Windows network neighbor file copy.
/opt/r8125/tx-on.sh: Turn on Tx offloading, default.
/opt/r8125/tso-off.sh: Turn off TSO, default.
/opt/r8125/tso-on.sh: Turn on TSO, try this when you have a nice host PC.

no I haven't see these

@mcr-ksh
Copy link

mcr-ksh commented Mar 30, 2023

I think I just found the issue. I'm currently doing a full re-write of the driver. I was able to nail it down to DAC.
image
[http://gauss.ececs.uc.edu/Courses/c4029/lectures/dma.pdf]

TSO doesn't work with DAC and maybe 6.7 doesn't properly support it. Until i'm going to release mine it can be tested via:
vmkload_mod r8125 enable_tso=1 enable_tx_csum=1 eee_enable=0 hwoptimize=1 tx_no_close_enable=1 enable_double_vlan=1 use_dac=0 autoneg_mode=1

@mcr-ksh
Copy link

mcr-ksh commented May 8, 2023

https://github.com/mcr-ksh/r8125-esxi/releases/tag/net-r8125-9.011.00

@rustiferch
Copy link

rustiferch commented Sep 29, 2024

Hi Team,
Did this network drop-off issue ever get solved? Is it something that can be worked around?

@Rolzzz
Copy link
Author

Rolzzz commented Sep 29, 2024

Hi Team, Did this network drop-off issue ever get solved? Is it something that can be worked around?

after the Broadcom acquisition, I moved my home VM lab over to Proxmox. RIP VMware

@mcr-ksh
Copy link

mcr-ksh commented Sep 30, 2024

Hi Team,

Did this network drop-off issue ever get solved? Is it something that can be worked around?

Hi, I pretty much got it under control after disabling ipv6. I found the screenshot dumps were mainly related mld and disabled ipv6 and the crashes stopped or became very seldom. On top of that with the settings of my own driver implementation I'm quite stable now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants