Tuesday, December 1, 2009

Troubleshooting Linux networking||Module||Driver problems


Network-related problems on your Linux machine can be hard to resolve because they go beyond the trusted environment of your Linux box. But, as a Linux administrator, you can help your network administrator by applying the right technologies. In this article you'll learn how to troubleshoot network related driver problems.
It's easy to determine that a problem you're encountering is a network problem -- if your computer can't communicate with other computers, something is wrong on the network. But, it may be harder to find the source of the problem. You need to begin by analyzing the chain of elements involved in network communication.
If your host needs to communicate with another host in the network, the following conditions need to be met:
  1. The network card is installed and available in the operating system, i.e., the correct driver is loaded.
  2. The network card has an IP address assigned to it.
  3. The computer can communicate with other hosts in the same network.
  4. The computer can communicate with other hosts in other networks.
  5. The computer can communicate with other hosts using their host names
Troubleshooting network driver issues
To communicate with other computers on the network, your computer needs a network interface. The method your computer uses to obtain such a network interface is well designed. During the system boot, the kernel probes the different interfaces that are available and typically on the PCI bus, finds a network card. Next, it determines which driver is needed to address the network card and if the driver is available, it will address the network card. Following that, the udev daemon (udevd) is started in the initial boot phase of your computer and it creates the network device for you. In a simple computer with one network interface only, this will typically be the eth0 device but as you will read later, other interfaces can also be used. Once the interface has been loaded, the next stage can be passed in which the network card gets an IP address.
As was just discussed, there are some items involved to load the driver for the network card correctly.
  1. The kernel probes the PCI bus.
  2. Based on the information it finds on the PCI bus, a driver is loaded.
  3. Udev creates the network interface which you need to actually use the network interface.
To fix network card problems, begin by determining if the network card was really found on the PCI-bus. To do that, use the lspci command. Here is an example output of lspci:
JBO:~ # lspci 00:00.0 Host bridge: Intel Corporation 440BX/ZX/DX - 82443BX/ZX/DX Host bridge (rev 01) 00:01.0 PCI bridge: Intel Corporation 440BX/ZX/DX - 82443BX/ZX/DX AGP bridge (rev 01) 00:07.0 ISA bridge: Intel Corporation 82371AB/EB/MB PIIX4 ISA (rev 08) 00:07.1 IDE interface: Intel Corporation 82371AB/EB/MB PIIX4 IDE (rev 01) 00:07.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08) 00:07.7 System peripheral: VMware Inc Virtual Machine Communication Interface (rev 10) 00:0f.0 VGA compatible controller: VMware Inc Abstract SVGA II Adapter 00:10.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 01) 02:00.0 USB Controller: Intel Corporation 82371AB/EB/MB PIIX4 USB 02:01.0 Ethernet controller: Advanced Micro Devices [AMD] 79c970 [PCnet32 LANCE] (rev 10) 02:02.0 Multimedia audio controller: Ensoniq ES1371 [AudioPCI-97] (rev 02) 02:03.0 USB Controller: VMware Inc Abstract USB2 EHCI Controller JBO:~ #
Here, at PCI address 02:01.0 an Ethernet network card is found. The network card is an AMD 79c970 and (between square brackets) the PCnet32 kernel module is needed to address this network card.
The next step is to check the hardware configuration as reflected in the /sys tree. Every PCI device has it's configuration stored in there, and for the network card in this example, it is stored in the directory /sys/bus/pci/devices/0000:02:01.0, which reflects the address of the device on the PCI bus. Here is an example of the contents of this directory:
JBO:/sys/bus/pci/devices/0000:02:01.0 # ls -l total 0 -rw-r--r-- 1 root root 4096 Oct 18 07:08 broken_parity_status -r--r--r-- 1 root root 4096 Oct 17 07:50 class -rw-r--r-- 1 root root 256 Oct 17 07:50 config -r--r--r-- 1 root root 4096 Oct 17 07:50 device lrwxrwxrwx 1 root root 0 Oct 17 07:51 driver -> ../../../../bus/pci/drivers/pcnet32 -rw------- 1 root root 4096 Oct 18 07:08 enable lrwxrwxrwx 1 root root 0 Oct 18 07:08 firmware_node -> ../../../LNXSYSTM:00/device:00/PNP0A03:00/device:06/device:08 -r--r--r-- 1 root root 4096 Oct 17 07:50 irq -r--r--r-- 1 root root 4096 Oct 18 07:08 local_cpulist -r--r--r-- 1 root root 4096 Oct 18 07:08 local_cpus -r--r--r-- 1 root root 4096 Oct 17 07:53 modalias -rw-r--r-- 1 root root 4096 Oct 18 07:08 msi_bus drwxr-xr-x 3 root root 0 Oct 17 07:50 net -r--r--r-- 1 root root 4096 Oct 18 07:08 numa_node drwxr-xr-x 2 root root 0 Oct 18 07:08 power -r--r--r-- 1 root root 4096 Oct 17 07:50 resource -rw------- 1 root root 128 Oct 18 07:08 resource0 -r-------- 1 root root 65536 Oct 18 07:08 rom lrwxrwxrwx 1 root root 0 Oct 17 07:50 subsystem -> ../../../../bus/pci -r--r--r-- 1 root root 4096 Oct 17 07:51 subsystem_device -r--r--r-- 1 root root 4096 Oct 17 07:51 subsystem_vendor -rw-r--r-- 1 root root 4096 Oct 17 07:51 uevent -r--r--r-- 1 root root 4096 Oct 17 07:50 vendor JBO:/sys/bus/pci/devices/0000:02:01.0 #
The most interesting item for troubleshooting is the symbolic link to the driver directory. In this example it points to the pcnet32 driver and using the information that lspci provided, we know this is the correct driver.
In most cases, the driver that Linux installs will work fine. In some cases it doesn't. When configuring a Dell server with a Broadcom network card, I have seen severe problems, where a ping command that used a jumbo frame packet was capable of causing kernel panic. One of the first things to suspect in that case, is the same kernel driver for the network card. A nice troubleshooting approach is to start by finding out which version of the driver you are using. You can accomplish this by using the modinfo command on the driver itself. Here is an example of modinfo on the pcnet32 driver:
JBO:/ # modinfo pcnet32 filename: /lib/modules/2.6.27.19-5-pae/kernel/drivers/net/pcnet32.ko license: GPL description: Driver for PCnet32 and PCnetPCI based ethercards author: Thomas Bogendoerfer srcversion: 261B01C36AC94382ED8D984 alias: pci:v00001023d00002000sv*sd*bc02sc00i* alias: pci:v00001022d00002000sv*sd*bc*sc*i* alias: pci:v00001022d00002001sv*sd*bc*sc*i* depends: mii supported: yes vermagic: 2.6.27.19-5-pae SMP mod_unload modversions 586 parm: debug:pcnet32 debug level (int) parm: max_interrupt_work:pcnet32 maximum events handled per interrupt (int) parm: rx_copybreak:pcnet32 copy breakpoint for copy-only-tiny-frames (int) parm: tx_start_pt:pcnet32 transmit start point (0-3) (int) parm: pcnet32vlb:pcnet32 Vesa local bus (VLB) support (0/1) (int) parm: options:pcnet32 initial option setting(s) (0-15) (array of int) parm: full_duplex:pcnet32 full duplex setting(s) (1) (array of int) parm: homepna:pcnet32 mode for 79C978 cards (1 for HomePNA, 0 for Ethernet, default Ethernet (array of int)
The modinfo command will give you different useful information for each module. If a version number is included, check for available updated versions and download and install them.
When working with some hardware, you should also check what kind of module is used. If the module is open source, in general it's fine as open source modules are thoroughly checked by the Linux community. If the module is proprietary, there may be incompatibilities between the kernel and the particular module. If this is the case, your kernel is flagged as "tainted." A tainted kernel is a kernel that has some modules loaded that are not controlled by the Linux kernel community. To find out if this is the case on your system, you can check the contents of the /proc/sys/kernel/tainted file. If this file has a 0 as its contents, no proprietary modules are loaded. If it has a 1, proprietary modules are loaded and you may be able to fix the situation if you replace the proprietary module with an open source module.
The information in this article should help you in fixing driver related issues.