A Better Way to Test Network Connectivity

It’s pretty much inevitable: I run across Reddit posts, support requests at work, or just general complaints from friends that they can ping something but they still can’t connect to it or some similar story. People always seem confused by this when it happens. But it underscores a problematic teaching that I see perpetuated on a regular basis: the idea that ICMP connectivity is somehow valid for the “rest of the connectivity” for a device on the network. Don’t get me wrong; there’s a place for ICMP echo (ping) requests but, from my perspective, the results can be invalid for so many reasons that it seems there are better tools and ways to evaluate connectivity and stop wasting time on problems without evidence.

Problems with Using ICMP Pings

From my perspective, the core issue with using ICMP echo to test connectivity to remote systems is that it is most likely the wrong test for what someone is looking to achieve. Very often, the question that someone is looking to address is: “Can I connect to this remote system?” And within that question, you want to know whether or not TCP connectivity is working (because TCP is the most common Layer 4 protocol used for non-media connectivity between machines). Let’s take a look at this scenario:

  1. I would like to RDP into a remote system at 192.168.41.55. I try the RDP client of my choice and it fails (we’ll ignore why it fails right now because that’s what most people do).
  2. I open a shell interface and I run ping 192.168.41.55. I get no response.
  3. If this machine is across a large network, I might even be inclined to run traceroute 192.168.41.55 where I get a set of mixed * * * results with a few IP addresses mixed around.

At this point in time, I really haven’t learned much that was beneficial. It’s possible that you could have received some ICMP echo replies but, even then, you still haven’t really established useful information related to your cause. Largely, this statement is true in many practical contexts:

The success of your ICMP echo requests is largely unrelated to your desire to connect via TCP to system.

This is roughly the equivalent of making sure the trunk opens on your car in order to drive it. Yes, you do establish some knowledge that you might have appropriate Layer 3 connectivity to that system. However, firewalls treat ICMP requests differently (both automatically or because of poor configuration) rendering these results useless. In this post, I’ll present some better tools to help you answer your questions about connectivity (with a focus primarily on TCP).

So What’s Better?

If you’re looking to connect to a remote system over TCP, it would seem most useful to actually do your testing using TCP. This is relatively simple but I do find (very often) that the tools needed aren’t always included in many systems. Here are the two tools that I recommend for initial TCP troubleshooting:

  • netcat (called nc or ncat on several systems) - This tool is extremely useful for doing the initial/direct connectivity testing for TCP.
    • Think of this as the TCP-equivalent of using ping.
  • tcptraceroute (use tracetcp on Windows) - This tool is useful for determining how a connection routes (or fails) via TCP.
    • Think of this as the TCP-equivalent of using traceroute.

We can revise the scenario that I used as an example in the previous section to use these methods so that we take an informed approach to what we’re doing. Here’s a revision on the steps:

  1. I would like to RDP into a remote system at 192.168.41.55. I try the RDP client of my choice and it fails (we’ll ignore why it fails right now because that’s what most people do).
  2. I open a shell interface and I run nc -v 192.168.41.55 3389. I get no response (connection timed out). This tells me that this port is unavailable to me (my traffic is being dropped somewhere along the line).
    1. If I had actually received a rejection (connection refused), that would tell me that the machine or something in the path is actively refusing my connection.
  3. Since I know this machine is at a remote site, I would run tcptraceroute 192.168.41.55 3389. I’m able to see that we hit the router at the remote site but the traffic stops there.

I’m now at a point where I have practical information about the situation:

  • I can reach the remote site with my TCP traffic but only get as far as the router.
  • I know that my connectivity problem with the remote system is not in OSI Layers 5+ because I’m still stuck in Layer 4.
  • I am reasonably confident that there are no issues on my side in connecting to the remote system.

I’m now prepared knowing that I need to request assistance on the remote side and have the ability to determine if a change is made that improves conditions. Even if an improvement is made here, I still have the opportunity to experience a Layer 5+ issue but I can identify this easily and know it’s not part of the Layer <5 transport to the system. I’m now informed and better-aligned to provide guidance on this problem.

Practical Implementation

In general, this approach works well to replace your ICMP-based habits. However, there are some things to consider about testing with TCP tools versus ICMP:

  • TCP requires a handshake in order to be successful. This means that you must be able to complete the SYN-ACK-SYNACK process in order for an actual session to be established. Some testing tools, however, only validate the SYN-ACK process (and never attempt the SYNACK) and thus report invalid data about the readiness of a connection. The tool I’ve recommended for specific checks (netcat) will handle this properly but other tools (including tcptraceroute) will not.
    • There are also some major managed service network providers out there that specifically block a return SYNACK packet from machines. This results in tools like tcptraceroute reporting a success but netcat reporting a failure. I have no understanding of why this is done but it is one of the most frustrating things I’ve seen from a troubleshooting standpoint.
  • tcptraceroute is useful on its own without additional action. However, you can raise the usefulness of this tool if you know what the successful path to a remote system should look like ahead of time. This provides you with a point of comparison when a problem occurs so that you can actually pinpoint the likely point of failure (whereas, without this, you’ll only know the point ahead of the problem).
    • In a continuous monitoring context, I wrote scripts to integrate tcptraceroute results into the availability testing for remote systems and always had a good point of comparison for when it worked and when it didn’t.
    • This approach is not as useful in an environment where a multi-path routing scheme is used for traffic (so over the Internet, over complex networks, etc.) though I have found it useful when the multi-path has a broken path (because this helps identify the broken path).

Getting These Tools

Here are some places to get these tools:

  • Windows
    • There are currently no good, secure options available here. There are options available in a search but I decided against including those.
  • Linux
    • Use your OS package manager; both tools widely available but may be in additional repositories not enabled by default.
  • MacOS
    • netcat
      • BSD version included on MacOS by default.
    • tcptraceroute
      • Install using HomeBrew: brew install tcptraceroute

Summary

I hope this post has encouraged you to revise the tools you come to rely upon on a daily basis for checking connectivity. Changing habits to use tools that really inform our processes is very important for anything that actually requires manual work. As we move more and more into automation in our environments (and yes, this is applicable to automation as well), the time we spend doing tasks hands-on must be those that actually require it. We should spend as little time as possible fighting with tools and environments and ensure that we get the right information when we ask for it. Thanks for your time!