Network Troubleshooting Guide: Fix Issues at Every Layer
Network issues can be frustrating and disruptive, especially when you don’t know where to start troubleshooting. This guide is designed to break down the complexities of network troubleshooting into manageable steps using the OSI and updated TCP/IP models. Whether you're dealing with a faulty physical connection, addressing IP-related issues, or diagnosing DNS errors, this post walks you through practical, command-line-based solutions to pinpoint and resolve network problems efficiently. Whether you're a seasoned IT professional or a curious beginner, this guide will equip you with actionable tools and techniques to address common network challenges.
Layer 1: Network Interface
Test your physical layer (loose connections can easily turn troubleshooting into a rabbit hole), so before we overthink things, lets test for this first.
We can start troubleshooting the physical layer we can use CLI commands such as:
ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
[...]
6: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN qlen 1000
link/ether 00:15:5d:9e:17:3a brd ff:ff:ff:ff:ff:ff
This command will show the status of the physical interface to tell if it is up or if it is down. If an interface shows the indication of DOWN in the output, we can prompt the interface up with the command:
ip link set <interface> up
Example: ip link set eth0 up
We can then confirm the link status with the command:
ip -br link show
lo UNKNOWN 00:00:00:00:00:00 <LOOPBACK,UP,LOWER_UP>
[...]
eth0 UP 00:15:5d:9e:17:3a <BROADCAST,MULTICAST,UP,LOWER_UP>
For a more detailed description, we can use the ethtool utility. This tool describes the interface speed and connectivity and can indicate incorrect cabling/hardware issues.
ethtool <interface>
Example:
ethtool eth0
Settings for eth0:
Supported ports: [ ]
Supported link modes: Not reported
Supported pause frame use: No
Supports auto-negotiation: No
Supported FEC modes: Not reported
Advertised link modes: Not reported
Advertised pause frame use: No
Advertised auto-negotiation: No
Advertised FEC modes: Not reported
Speed: 10000Mb/s
Duplex: Full
Port: Other
PHYAD: 0
Transceiver: internal
Auto-negotiation: off
Current message level: 0x000000f7 (247)
drv probe link ifdown ifup rx_err tx_err
Link detected: yes
In the updated TCP/IP Model the data link layer is included in the Internet Layer, and is responsible for the local network connectivity.
An example here is where you can encounter errors with ARP (Address Resolution Protocol).
Address Resolution Protocol
What this does is it links MAC addresses to IP addresses, and if there is a mismatch you can run into problems.
We can check the entries to our ARP table with the command:
ip neighbor show
172.30.192.1 dev eth0 lladdr 00:15:5d:2a:6e:f6 REACHABLE
And what this will prompt is whether there is connectivity with ARP, and if there isn’t we will receive an error message of ‘FAILED’ in the CLI.
Layer 2: Internet Layer
This layer is heavily involved with IP addresses. One of the first things to look at is if you have a network connection.
Start by running a ping to the local IP. It verifies that your adapter is responding. Run the ping against an IP on your LAN as a second step. If that works, you can go further and check an IP on the internet.
We can test our connection using ICMP echos with the command: ping -c4 croit.io
ping <hostname>
Example:
ping -c4 croit.io
PING croit.io (2.59.44.8) 56(84) bytes of data.
64 bytes from lb.int.croit.io (2.59.44.8): icmp_seq=1 ttl=54 time=205 ms
64 bytes from lb.int.croit.io (2.59.44.8): icmp_seq=2 ttl=54 time=229 ms
64 bytes from lb.int.croit.io (2.59.44.8): icmp_seq=3 ttl=54 time=176 ms
64 bytes from lb.int.croit.io (2.59.44.8): icmp_seq=4 ttl=54 time=170 ms
--- croit.io ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3004ms
rtt min/avg/max/mdev = 170.219/195.190/229.116/23.577 ms
-c limits the number of ICMP echo requests made
Without -c
you need to use the CTRL+C to end the ping command as it will otherwise go on endlessly.
Next we should ensure your interface is associated with an IP Address:
ip -br address show
lo UNKNOWN 127.0.0.1/8 ::1/128
[...]
eth0 UP 172.30.205.236/20 fe80::215:5dff:fe9e:16df/64
# ifconfig
eth0 Link encap:Ethernet HWaddr 00:15:5D:9E:1B:5E
inet addr:172.30.204.25 Bcast:172.30.207.255 Mask:255.255.240.0
inet6 addr: fe80::215:5dff:fe9e:1b5e/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:5 errors:0 dropped:0 overruns:0 frame:0
TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:930 (930.0 B) TX bytes:656 (656.0 B)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
This produces an output tying the interface to it’s status of UP or DOWN, and also printing the IP address of the local machine to the interface it is using. The lack of an IP address in this situation could easily be the result of problems with DHCP configuration.
Another command to use is the tracepath command. This is a utility used in real-time network troubleshooting to find the path data packets take as they travel across the internet to their destination. It also shows an estimate of the time taken by packets as they move through intermediate routers, printing the routers and devices names and identities within the destination path. This can help you pinpoint where your network issue is occurring.
tracepath <hostname>
Example:
tracepath croit.io
1?: [LOCALHOST] pmtu 1500
1: _gateway 44.866ms
1: _gateway 47.798ms
2: 194.110.112.73 51.200ms
3: 193.27.15.25 53.795ms
[...]
16: be2814.ccr42.fra03.atlas.cogentco.com 227.861ms asymm 15
17: 149.14.210.66 167.909ms
18: 149.14.210.66 204.027ms asymm 17
19: 212.224.99.102 179.786ms asymm 10
20: lb.int.croit.io 169.923ms reached
Resume: pmtu 1500 hops 20 back 11
We can also display the list of gateways for the routes stored in the routing table with the command:
ip route show table all
default via 172.30.192.1 dev eth0
172.30.192.0/20 dev eth0 proto kernel scope link src 172.30.204.25
The default gateway configured will be displayed, and a missing or incorrect default gateway is a common issue one can face.
Layer 3: Transport Layer
The transport layer consists of the TCP and UDP protocols , applications listen on sockets, which consist of the IP address and a port. To display on which port, an application is listening on the localhost (in order to troubleshoot a particular service on the machine, such as SSH or web server), we will use the ss
command:
TCP/UDP errors are the most common errors we will encounter on this layer
ss -tunlp4
Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
udp UNCONN 0 0 0.0.0.0:68 0.0.0.0:*
tcp LISTEN 0 128 0.0.0.0:897 0.0.0.0:*
A common issue that occurs is when a service won’t start due to something else listening on that port, this previous command will display what is connected to which port in order to diagnose the issue.
We can use the utility nc
. The netcat utility is capable just about anything involving TCP / UDP, including port scanning and clarifying if remote ports are closed or filtered.
nc <hostname> <port(s)>
Examples:
nc -z croit.io 80 # -z disables I/O and limits to port scanning
Connection to 1.1.1.1 80 port [tcp/http] succeeded!
nc -z croit.io 20-30
Connection to croit.io (2.59.44.8) 22 port [tcp/ssh] succeeded!
Connection to croit.io (2.59.44.8) 23 port [tcp/telnet] succeeded!
Connection to croit.io (2.59.44.8) 24 port [tcp/*] succeeded!
One of the reasons could be that for security practices, ICMP might be disabled. We can confirm that by disabling the firewall.
systemctl stop firewalld
and rerun the first command to verify if this fixes the issue.
ping croit.io
If this doesn’t fix your issue be sure to bring iptables back online
systemctl start firewalld
More information for this can be found at Firewalld - ArchWiki
Layer 4: Application Layer
DNS
One of the topics that falls under this layer is the DNS (Domain Name System), this is a system that translates IP addresses to hostnames; it allows us to type in a URL of www.duckduckgo.com
rather than 40.89.244.232
to reach the website, with this we don't have to memorize each websites IP address to make the a connection, we simply need to know the name and DNS does the rest for us.
To find your local DNS/forwarder and some upstream DNS, look in your /etc/systemd/resolve.conf
more information can be found at systemd-resolved - ArchWiki
A great utility for this is the nslookup command. This utility allows you to query your DNS service and obtain a domain name via your CLI, receive IP address mapping details, and lookup DNS records.
nslookup <domain or host> <dns server to use>
Example:
nslookup www.croit.io
Server: 9.9.9.9
Address: 9.9.9.9#53
Non-authoritative answer:
Name: www.croit.io
Address: 2.59.44.8
OR
dig www.croit.io
; <<>> DiG 9.18.1 <<>> www.croit.io
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 28995
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;www.croit.io. IN A
;; ANSWER SECTION:
www.croit.io. 240 IN A 2.59.44.8
;; Query time: 20 msec
;; SERVER: 9.9.9.9#53(9.9.9.9) (UDP)
;; WHEN: Wed Apr 06 09:16:12 CDT 2022
;; MSG SIZE rcvd: 57
If you do not wish to use a specific DNS resolver, you can omit it and nslookup
will use your system DNS (the default DNS server your system is configured to use):
nslookup croit.io
Server: 9.9.9.9
Address: 9.9.9.9#53
Non-authoritative answer:
Name: croit.io
Address: 2.59.44.8
When troubleshooting DNS issues, we need to find out how big of a problem is going on, is it only one device on the network or is it more? This could incline that it would be client side if only one device is having issues, or network side if multiple devices are experiencing issues. With the nslookup
utility we can verify if it is the local DNS or forwarding DNS that is presenting the issues. If everything is resolved internally but not externally we can pivot to troubleshooting the forwarding DNS server, this will be in most cases your ISP's DNS.
A few examples of DNS servers could be Google (8.8.8.8), Cloudflare (1.1.1.1), or Quad9 (9.9.9.9)
Additional information for DNS troubleshooting can be found at:
Domain name resolution - ArchWiki
Let’s have a look at a network connectivity error.
ping -c3 croit.io
Pinging croit.io with 32 bytes of data:
Request timed out.
Request timed out.
Request timed out.
Ping statistics for croit.io:
Packets: Sent = 3, Received = 0, Lost = 4 (100% loss)
Possible reasons:
- The remote host might be down
- Incorrect Network Adapter or Firewall Settings on your PC
- Loopback error on localhost caused by improper network adapter settings
A loopback error occurs when the keepalive packet is looped back to the port that sent the keepalive. A device can loop the packets back to the source interface, which usually occurs because there is a logical loop in the network that the spanning tree has not blocked. - The firewall installed on the destination host may be blocking your ICMP request
- Your device did not have an Internet connection at the time of receiving replies
Let’s go ahead and check whether we can ping another address to establish if it is an issue with the remote host.
ping -c3 www.archlinux.org
Pinging 95.217.163.246 with 32 bytes of data:
Request timed out.
Request timed out.
Request timed out.
Ping statistics for archlinux.org:
Packets: Sent = 3, Received = 0, Lost = 4 (100% loss)
With the second ping of a separate domain not get proper replies, we can assume the issue is is an internal network issue and not the case of the remote host being down.
To verify another way for domains network states we can check it through https://www.isitdownrightnow.com/
With both external domains not going through, let’s see if we can reach another device on our own network to help isolate the issue
ping -c3 192.168.1.135
PING 192.168.1.135 (192.168.1.135): 56 data bytes
64 bytes from 192.168.1.135: icmp_seq=0 ttl=64 time=10.224 ms
64 bytes from 192.168.1.135: icmp_seq=1 ttl=64 time=8.914 ms
64 bytes from 192.168.1.135: icmp_seq=2 ttl=64 time=10.375 ms
--- 192.168.1.135 ping statistics ---
3 packets transmitted, 3 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 8.914/9.838/10.375/0.656 ms
In situation, we can further establish that we have internal connections working but not for external domains. At this point we can assume the culprit to be our DNS
To fix this, we need to restart the DNS server the system uses. In this case, we assume it to be systemd-resolved
.
Restarting the service will flush the DNS cache.
> # systemctl restart systemd-resolved.service && systemctl status systemd-resolved.service
after this command, press 'CTRL+c'
Rerun the first command to verify if this fixes the issue
ping croit.io
PING croit.io (2.59.44.8): 56 data bytes
64 bytes from 2.59.44.8: icmp_seq=0 ttl=50 time=132.893 ms
64 bytes from 2.59.44.8: icmp_seq=1 ttl=50 time=136.201 ms
64 bytes from 2.59.44.8: icmp_seq=2 ttl=50 time=137.400 ms
--- croit.io ping statistics ---
3 packets transmitted, 3 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 132.893/135.498/137.400/1.906 ms
Effective network troubleshooting can feel like solving a complex puzzle, but with the right approach and tools, it becomes a systematic process rather than guesswork. By following the steps outlined in this guide, you’ll be well on your way to identifying and resolving issues across all layers of your network. Networking problems don’t have to be daunting—master the process, and they’ll soon become another solvable challenge in your IT toolkit.