representation image of network that writes "General Network Troubleshooting" on top

Network Troubleshooting Guide: Fix Issues at Every Layer

Network issues can be frustrating and disruptive, especially when you don’t know where to start troubleshooting. This guide is designed to break down the complexities of network troubleshooting into manageable steps using the OSI and updated TCP/IP models. Whether you're dealing with a faulty physical connection, addressing IP-related issues, or diagnosing DNS errors, this post walks you through practical, command-line-based solutions to pinpoint and resolve network problems efficiently. Whether you're a seasoned IT professional or a curious beginner, this guide will equip you with actionable tools and techniques to address common network challenges.

OSI and TCP/IP Model

Layer 1: Network Interface

Test your physical layer (loose connections can easily turn troubleshooting into a rabbit hole), so before we overthink things, lets test for this first.

We can start troubleshooting the physical layer we can use CLI commands such as:

ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
[...]
6: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN qlen 1000
    link/ether 00:15:5d:9e:17:3a brd ff:ff:ff:ff:ff:ff

This command will show the status of the physical interface to tell if it is up or if it is down. If an interface shows the indication of DOWN in the output, we can prompt the interface up with the command:

ip link set <interface> up

Example: ip link set eth0 up

We can then confirm the link status with the command:

ip -br link show
lo               UNKNOWN        00:00:00:00:00:00 <LOOPBACK,UP,LOWER_UP>
[...]
eth0             UP             00:15:5d:9e:17:3a <BROADCAST,MULTICAST,UP,LOWER_UP>

For a more detailed description, we can use the ethtool utility. This tool describes the interface speed and connectivity and can indicate incorrect cabling/hardware issues.

ethtool <interface>

Example:

ethtool eth0
Settings for eth0:
        Supported ports: [ ]
        Supported link modes:   Not reported
        Supported pause frame use: No
        Supports auto-negotiation: No
        Supported FEC modes: Not reported
        Advertised link modes:  Not reported
        Advertised pause frame use: No
        Advertised auto-negotiation: No
        Advertised FEC modes: Not reported
        Speed: 10000Mb/s
        Duplex: Full
        Port: Other
        PHYAD: 0
        Transceiver: internal
        Auto-negotiation: off
        Current message level: 0x000000f7 (247)
                               drv probe link ifdown ifup rx_err tx_err
        Link detected: yes

In the updated TCP/IP Model the data link layer is included in the Internet Layer, and is responsible for the local network connectivity.

An example here is where you can encounter errors with ARP (Address Resolution Protocol).

Address Resolution Protocol

What this does is it links MAC addresses to IP addresses, and if there is a mismatch you can run into problems.

We can check the entries to our ARP table with the command:

ip neighbor show
172.30.192.1 dev eth0 lladdr 00:15:5d:2a:6e:f6 REACHABLE

And what this will prompt is whether there is connectivity with ARP, and if there isn’t we will receive an error message of ‘FAILED’ in the CLI.

Layer 2: Internet Layer

This layer is heavily involved with IP addresses. One of the first things to look at is if you have a network connection.

Start by running a ping to the local IP. It verifies that your adapter is responding. Run the ping against an IP on your LAN as a second step. If that works, you can go further and check an IP on the internet.

We can test our connection using ICMP echos with the command: ping -c4 croit.io

ping <hostname>

Example:

ping -c4 croit.io
PING croit.io (2.59.44.8) 56(84) bytes of data.
64 bytes from lb.int.croit.io (2.59.44.8): icmp_seq=1 ttl=54 time=205 ms
64 bytes from lb.int.croit.io (2.59.44.8): icmp_seq=2 ttl=54 time=229 ms
64 bytes from lb.int.croit.io (2.59.44.8): icmp_seq=3 ttl=54 time=176 ms
64 bytes from lb.int.croit.io (2.59.44.8): icmp_seq=4 ttl=54 time=170 ms
--- croit.io ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3004ms
rtt min/avg/max/mdev = 170.219/195.190/229.116/23.577 ms

-c limits the number of ICMP echo requests made

Without -c you need to use the CTRL+C to end the ping command as it will otherwise go on endlessly.

Next we should ensure your interface is associated with an IP Address:

ip -br address show
lo               UNKNOWN        127.0.0.1/8 ::1/128
[...]
eth0             UP             172.30.205.236/20 fe80::215:5dff:fe9e:16df/64
# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:15:5D:9E:1B:5E
          inet addr:172.30.204.25  Bcast:172.30.207.255  Mask:255.255.240.0
          inet6 addr: fe80::215:5dff:fe9e:1b5e/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:5 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:930 (930.0 B)  TX bytes:656 (656.0 B)
lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

This produces an output tying the interface to it’s status of UP or DOWN, and also printing the IP address of the local machine to the interface it is using. The lack of an IP address in this situation could easily be the result of problems with DHCP configuration.

Another command to use is the tracepath command. This is a utility used in real-time network troubleshooting to find the path data packets take as they travel across the internet to their destination. It also shows an estimate of the time taken by packets as they move through intermediate routers, printing the routers and devices names and identities within the destination path. This can help you pinpoint where your network issue is occurring.

tracepath <hostname>

Example:

tracepath croit.io
 1?: [LOCALHOST]                      pmtu 1500
 1:  _gateway                                             44.866ms 
 1:  _gateway                                             47.798ms 
 2:  194.110.112.73                                       51.200ms 
 3:  193.27.15.25                                         53.795ms 
 [...]
16:  be2814.ccr42.fra03.atlas.cogentco.com               227.861ms asymm 15 
17:  149.14.210.66                                       167.909ms 
18:  149.14.210.66                                       204.027ms asymm 17 
19:  212.224.99.102                                      179.786ms asymm 10 
20:  lb.int.croit.io                                     169.923ms reached
     Resume: pmtu 1500 hops 20 back 11

We can also display the list of gateways for the routes stored in the routing table with the command:

ip route show table all
default via 172.30.192.1 dev eth0
172.30.192.0/20 dev eth0 proto kernel scope link src 172.30.204.25

The default gateway configured will be displayed, and a missing or incorrect default gateway is a common issue one can face.

Layer 3: Transport Layer

The transport layer consists of the TCP and UDP protocols , applications listen on sockets, which consist of the IP address and a port. To display on which port, an application is listening on the localhost (in order to troubleshoot a particular service on the machine, such as SSH or web server), we will use the ss command:

TCP/UDP errors are the most common errors we will encounter on this layer

ss -tunlp4
Netid      State       Recv-Q      Send-Q            Local Address:Port             Peer Address:Port      Process
udp        UNCONN      0           0                       0.0.0.0:68                    0.0.0.0:*
tcp        LISTEN      0           128                     0.0.0.0:897                   0.0.0.0:*

A common issue that occurs is when a service won’t start due to something else listening on that port, this previous command will display what is connected to which port in order to diagnose the issue.

We can use the utility nc. The netcat utility is capable just about anything involving TCP / UDP, including port scanning and clarifying if remote ports are closed or filtered.

nc <hostname> <port(s)>

Examples:

nc -z croit.io 80  # -z disables I/O and limits to port scanning
Connection to 1.1.1.1 80 port [tcp/http] succeeded!

nc -z croit.io 20-30
Connection to croit.io (2.59.44.8) 22 port [tcp/ssh] succeeded!
Connection to croit.io (2.59.44.8) 23 port [tcp/telnet] succeeded!
Connection to croit.io (2.59.44.8) 24 port [tcp/*] succeeded!

One of the reasons could be that for security practices, ICMP might be disabled. We can confirm that by disabling the firewall.

systemctl stop firewalld

and rerun the first command to verify if this fixes the issue.

ping croit.io

If this doesn’t fix your issue be sure to bring iptables back online

systemctl start firewalld

More information for this can be found at Firewalld - ArchWiki

Layer 4: Application Layer

DNS

One of the topics that falls under this layer is the DNS (Domain Name System), this is a system that translates IP addresses to hostnames; it allows us to type in a URL of www.duckduckgo.com rather than 40.89.244.232 to reach the website, with this we don't have to memorize each websites IP address to make the a connection, we simply need to know the name and DNS does the rest for us.

To find your local DNS/forwarder and some upstream DNS, look in your /etc/systemd/resolve.conf
more information can be found at systemd-resolved - ArchWiki

A great utility for this is the nslookup command. This utility allows you to query your DNS service and obtain a domain name via your CLI, receive IP address mapping details, and lookup DNS records.

nslookup <domain or host> <dns server to use>

Example:

 nslookup www.croit.io
Server:		9.9.9.9
Address:	9.9.9.9#53
Non-authoritative answer:
Name:	www.croit.io
Address: 2.59.44.8

dig www.croit.io 
; <<>> DiG 9.18.1 <<>> www.croit.io
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 28995
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;www.croit.io.			IN	A
;; ANSWER SECTION:
www.croit.io.		240	IN	A	2.59.44.8
;; Query time: 20 msec
;; SERVER: 9.9.9.9#53(9.9.9.9) (UDP)
;; WHEN: Wed Apr 06 09:16:12 CDT 2022
;; MSG SIZE  rcvd: 57

If you do not wish to use a specific DNS resolver, you can omit it and nslookup will use your system DNS (the default DNS server your system is configured to use):

nslookup croit.io
Server:		9.9.9.9
Address:	9.9.9.9#53
Non-authoritative answer:
Name:	croit.io
Address: 2.59.44.8

When troubleshooting DNS issues, we need to find out how big of a problem is going on, is it only one device on the network or is it more? This could incline that it would be client side if only one device is having issues, or network side if multiple devices are experiencing issues. With the nslookup utility we can verify if it is the local DNS or forwarding DNS that is presenting the issues. If everything is resolved internally but not externally we can pivot to troubleshooting the forwarding DNS server, this will be in most cases your ISP's DNS.

A few examples of DNS servers could be Google (8.8.8.8), Cloudflare (1.1.1.1), or Quad9 (9.9.9.9)

Additional information for DNS troubleshooting can be found at:
Domain name resolution - ArchWiki

Let’s have a look at a network connectivity error.

ping -c3 croit.io
Pinging croit.io with 32 bytes of data:
Request timed out.
Request timed out.
Request timed out.
Ping statistics for croit.io:
  Packets: Sent = 3, Received = 0, Lost = 4 (100% loss)

Possible reasons:

The remote host might be down
Incorrect Network Adapter or Firewall Settings on your PC
Loopback error on localhost caused by improper network adapter settings
A loopback error occurs when the keepalive packet is looped back to the port that sent the keepalive. A device can loop the packets back to the source interface, which usually occurs because there is a logical loop in the network that the spanning tree has not blocked.
The firewall installed on the destination host may be blocking your ICMP request
Your device did not have an Internet connection at the time of receiving replies

Let’s go ahead and check whether we can ping another address to establish if it is an issue with the remote host.

ping -c3 www.archlinux.org
Pinging 95.217.163.246 with 32 bytes of data:
Request timed out.
Request timed out.
Request timed out.
Ping statistics for archlinux.org:
  Packets: Sent = 3, Received = 0, Lost = 4 (100% loss)

With the second ping of a separate domain not get proper replies, we can assume the issue is is an internal network issue and not the case of the remote host being down.

To verify another way for domains network states we can check it through https://www.isitdownrightnow.com/

With both external domains not going through, let’s see if we can reach another device on our own network to help isolate the issue

ping -c3 192.168.1.135   
PING 192.168.1.135 (192.168.1.135): 56 data bytes
64 bytes from 192.168.1.135: icmp_seq=0 ttl=64 time=10.224 ms
64 bytes from 192.168.1.135: icmp_seq=1 ttl=64 time=8.914 ms
64 bytes from 192.168.1.135: icmp_seq=2 ttl=64 time=10.375 ms
--- 192.168.1.135 ping statistics ---
3 packets transmitted, 3 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 8.914/9.838/10.375/0.656 ms

In situation, we can further establish that we have internal connections working but not for external domains. At this point we can assume the culprit to be our DNS

To fix this, we need to restart the DNS server the system uses. In this case, we assume it to be systemd-resolved.
Restarting the service will flush the DNS cache.

> # systemctl restart systemd-resolved.service && systemctl status systemd-resolved.service

after this command, press 'CTRL+c'

Rerun the first command to verify if this fixes the issue

ping croit.io
PING croit.io (2.59.44.8): 56 data bytes
64 bytes from 2.59.44.8: icmp_seq=0 ttl=50 time=132.893 ms
64 bytes from 2.59.44.8: icmp_seq=1 ttl=50 time=136.201 ms
64 bytes from 2.59.44.8: icmp_seq=2 ttl=50 time=137.400 ms
--- croit.io ping statistics ---
3 packets transmitted, 3 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 132.893/135.498/137.400/1.906 ms

Effective network troubleshooting can feel like solving a complex puzzle, but with the right approach and tools, it becomes a systematic process rather than guesswork. By following the steps outlined in this guide, you’ll be well on your way to identifying and resolving issues across all layers of your network. Networking problems don’t have to be daunting—master the process, and they’ll soon become another solvable challenge in your IT toolkit.

Are you ready for the next step?

Contact