个人技术空间

e1000e网卡驱动频繁报告“Detected Hardware Unit Hang”错误

最近在使用Nuc做软路由时,经常出现e1000e网卡挂起,导致网络闪断的情况。

环境信息为

系统版本:Ubuntu 24.04 LTS
主机:intel Nuc
网卡:e1000e

查看系统日志,出现了大量的e1000e网卡挂起情况,出现了Detected Hardware Unit Hang的信息。

2024-07-12T18:03:29.990617+08:00 i3-5010u kernel: PCI Status             <10>
2024-07-12T18:03:31.269512+08:00 i3-5010u kernel: e1000e 0000:00:19.0 enp0s25: NETDEV WATCHDOG: CPU: 3: transmit queue 0 timed out 7657 ms
2024-07-12T18:03:31.269578+08:00 i3-5010u kernel: e1000e 0000:00:19.0 enp0s25: Reset adapter unexpectedly
2024-07-12T18:03:34.942432+08:00 i3-5010u kernel: e1000e 0000:00:19.0 enp0s25: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
2024-07-12T18:03:50.982493+08:00 i3-5010u kernel: e1000e 0000:00:19.0 enp0s25: Detected Hardware Unit Hang:
2024-07-12T18:03:50.982530+08:00 i3-5010u kernel:   TDH                  <1d>
2024-07-12T18:03:50.982535+08:00 i3-5010u kernel:   TDT                  <37>
2024-07-12T18:03:50.982537+08:00 i3-5010u kernel:   next_to_use          <37>
2024-07-12T18:03:50.982540+08:00 i3-5010u kernel:   next_to_clean        <19>
2024-07-12T18:03:50.982542+08:00 i3-5010u kernel: buffer_info[next_to_clean]:
2024-07-12T18:03:50.982546+08:00 i3-5010u kernel:   time_stamp           <fffc09cf>
2024-07-12T18:03:50.982549+08:00 i3-5010u kernel:   next_to_watch        <1d>
2024-07-12T18:03:50.982552+08:00 i3-5010u kernel:   jiffies              <fffc1300>
2024-07-12T18:03:50.982556+08:00 i3-5010u kernel:   next_to_watch.status <0>
2024-07-12T18:03:50.982558+08:00 i3-5010u kernel: MAC Status             <80083>
2024-07-12T18:03:50.982594+08:00 i3-5010u kernel: PHY Status             <796d>
2024-07-12T18:03:50.982597+08:00 i3-5010u kernel: PHY 1000BASE-T Status  <3800>
2024-07-12T18:03:50.982600+08:00 i3-5010u kernel: PHY Extended Status    <3000>
2024-07-12T18:03:50.982603+08:00 i3-5010u kernel: PCI Status             <10>
2024-07-12T18:03:52.966493+08:00 i3-5010u kernel: e1000e 0000:00:19.0 enp0s25: Detected Hardware Unit Hang:
2024-07-12T18:03:52.966531+08:00 i3-5010u kernel:   TDH                  <1d>
2024-07-12T18:03:52.966536+08:00 i3-5010u kernel:   TDT                  <37>
2024-07-12T18:03:52.966539+08:00 i3-5010u kernel:   next_to_use          <37>
2024-07-12T18:03:52.966542+08:00 i3-5010u kernel:   next_to_clean        <19>
2024-07-12T18:03:52.966544+08:00 i3-5010u kernel: buffer_info[next_to_clean]:
2024-07-12T18:03:52.966547+08:00 i3-5010u kernel:   time_stamp           <fffc09cf>
2024-07-12T18:03:52.966549+08:00 i3-5010u kernel:   next_to_watch        <1d>
2024-07-12T18:03:52.966552+08:00 i3-5010u kernel:   jiffies              <fffc1ac0>
2024-07-12T18:03:52.966555+08:00 i3-5010u kernel:   next_to_watch.status <0>
2024-07-12T18:03:52.966558+08:00 i3-5010u kernel: MAC Status             <80083>
2024-07-12T18:03:52.966593+08:00 i3-5010u kernel: PHY Status             <796d>
2024-07-12T18:03:52.966598+08:00 i3-5010u kernel: PHY 1000BASE-T Status  <3800>
2024-07-12T18:03:52.966600+08:00 i3-5010u kernel: PHY Extended Status    <3000>
2024-07-12T18:03:52.966603+08:00 i3-5010u kernel: PCI Status             <10>
2024-07-12T18:03:55.014492+08:00 i3-5010u kernel: e1000e 0000:00:19.0 enp0s25: Detected Hardware Unit Hang:
2024-07-12T18:03:55.014530+08:00 i3-5010u kernel:   TDH                  <1d>
2024-07-12T18:03:55.014535+08:00 i3-5010u kernel:   TDT                  <37>
2024-07-12T18:03:55.014538+08:00 i3-5010u kernel:   next_to_use          <37>
2024-07-12T18:03:55.014540+08:00 i3-5010u kernel:   next_to_clean        <19>
2024-07-12T18:03:55.014543+08:00 i3-5010u kernel: buffer_info[next_to_clean]:
2024-07-12T18:03:55.014545+08:00 i3-5010u kernel:   time_stamp           <fffc09cf>
2024-07-12T18:03:55.014548+08:00 i3-5010u kernel:   next_to_watch        <1d>
2024-07-12T18:03:55.014551+08:00 i3-5010u kernel:   jiffies              <fffc22c0>
2024-07-12T18:03:55.014555+08:00 i3-5010u kernel:   next_to_watch.status <0>
2024-07-12T18:03:55.014557+08:00 i3-5010u kernel: MAC Status             <80083>
2024-07-12T18:03:55.014600+08:00 i3-5010u kernel: PHY Status             <796d>
2024-07-12T18:03:55.014604+08:00 i3-5010u kernel: PHY 1000BASE-T Status  <3800>
2024-07-12T18:03:55.014607+08:00 i3-5010u kernel: PHY Extended Status    <3000>
2024-07-12T18:03:55.014610+08:00 i3-5010u kernel: PCI Status             <10>
2024-07-12T18:03:56.357539+08:00 i3-5010u kernel: e1000e 0000:00:19.0 enp0s25: NETDEV WATCHDOG: CPU: 3: transmit queue 0 timed out 7707 ms
2024-07-12T18:03:56.359427+08:00 i3-5010u kernel: e1000e 0000:00:19.0 enp0s25: Reset adapter unexpectedly
2024-07-12T18:04:00.026401+08:00 i3-5010u kernel: e1000e 0000:00:19.0 enp0s25: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx

一番搜索后,解决方案如下。

首先查看网卡的配置信息

root@i3-5010u:~# ethtool -k enp0s25
Features for enp0s25:
rx-checksumming: on
tx-checksumming: on
	tx-checksum-ipv4: off [fixed]
	tx-checksum-ip-generic: on
	tx-checksum-ipv6: off [fixed]
	tx-checksum-fcoe-crc: off [fixed]
	tx-checksum-sctp: off [fixed]
scatter-gather: on
	tx-scatter-gather: on
	tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
	tx-tcp-segmentation: on
	tx-tcp-ecn-segmentation: off [fixed]
	tx-tcp-mangleid-segmentation: off
	tx-tcp6-segmentation: on
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-tunnel-remcsum-segmentation: off [fixed]
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: off [fixed]
tx-gso-list: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]
rx-gro-list: off
macsec-hw-offload: off [fixed]
rx-udp-gro-forwarding: off
hsr-tag-ins-offload: off [fixed]
hsr-tag-rm-offload: off [fixed]
hsr-fwd-offload: off [fixed]
hsr-dup-offload: off [fixed]

可以看到rx-checksumming和tx-checksumming是on的,就是因为这个功能和当前系统不兼容导致的。

执行下面的命令将其关闭后,网络就正常了

root@i3-5010u:~# ethtool -K enp0s25  tx off rx off
Actual changes:
tx-checksum-ip-generic: off
tx-tcp-segmentation: off [not requested]
tx-tcp6-segmentation: off [not requested]
rx-checksum: off

需要注意的是使用ethtool命令修改的网卡配置,系统重启后会失效。

所以要将其加到开机启动里: https://www.aliencn.net/archives/413

CreateBy:2024-07-12T18:04:00+08:00,UpdateBy:2024-07-12T18:29:23+08:00
版权声明:署名-非商业性使用-禁止演绎 3.0 未本地化版本 (CC BY-NC-ND 3.0)
留言板开发中,站长邮箱:admin@aliencn.net。欢迎交流。