It has been interesting to tell people why network analysis is important. We go through some examples, but they often get hung up on thinking about the problems we describe (and that NetMRI detects). For a business person, the problems often don’t mean much – what’s the business impact?
For the network engineer, the problems are interesting, but need to be related to the business in order to communicate the importance to the business people.
While each problem is numbered, the numbers themselves don’t indicate relative ranking. They are simply a means by which we can reference them.
1. Configuration not saved:
Reboot will cause the new configuration to be lost. Due to a power outage on a network device, the operation of the network changes because the new configuration is replaced by the old one upon reboot.
2. Saved configurations don’t meet corporate policy:
Source of many problems, from performance to reliability to security. Corporate policy may be due to regulatory policies (PCI, HIPAA, SOX), or may be based on accepted best practices. Checking that they are consistently applied across hundreds of routers and switches is nearly impossible to do with manual processes.
3. Bloated firewall rule set, unused ACL entries:
Poor firewall performance; Open, unused rules, creating potential security problems. Identifying unused firewall rules makes understanding and maintaining firewall rule sets much easier, identifying unused rules that can be safely removed, resulting in improved network security.
4. Firewall connection count exceeded:
New connections via the firewall fail; Business applications exhibit intermittent failure at high firewall loads; VPNs begin to fail. When the connection count of a busy firewall is exceeded, new connections are refused. The applications experience intermittent network connectivity as the connection count is exceeded and then drops, making it difficult to troubleshoot so you end up needing Managed It Services rather than using in house people.
5. Link hog – downloading music or videos:
Slower application response, impacting user productivity. When one application or user is consuming most of the bandwidth on a link, it impacts the other applications and users of that link. NetMRI uses Getflowˇ to immediately collect netflow data on a link that’s suddenly running at high utilization to identify applications and users of the link, allowing the network engineer to quickly understand the cause of the slowdown to other applications and take action if necessary.
6. Interface traffic congestion:
Unpredictable application performance, impacting user productivity. When a router interface is congested, it starts discarding packets, so monitoring packet discards is an early indicator that the applications using the link need more bandwidth, or that a rogue application is now consuming bandwidth that’s needed by business applications.
7. Link problems & stability:
Physical or DataLink errors cause slow or intermittent application performance; Link or interface stability can impact routing and spanning tree (see 13, 14, 15, 16, 20). Whenever a link has high errors or is unstable, applications will have problems making effective use of the link. When routing or spanning-tree protocols are impacted, the effects may spread to other parts of the network, depending on the network’s design.
8. Environmental limits exceeded:
Fan failure, power supply problems, and high temperatures are indicators of problems that will likely cause a network device to reboot, affecting any applications relying on the device. Identifying and correcting environmental problems will make the network, and the applications that depend on it, more reliable.
9. Memory utilization increasing:
A bug in the device’s operating system is consuming more memory and when no free memory exists, the device will reboot, disrupting applications that are transiting the device. Imagine troubleshooting a network problem that occurs every two weeks as the device runs out of memory and reboots. We’ve seen this happen in production networks. The business impact depends on how often it occurs and what applications are affected.
10. Incorrect serial bandwidth setting:
Causes routing protocols to make non-optimum routing decisions. If the bandwidth is too low, it can affect the operation of the routing protocol itself, making routes unstable. Remote branches will experience unreliable application operation, which will be difficult to troubleshoot because you’ll have to catch it when it is happening. As applications begin using more link bandwidth, the routing protocol can become unstable.
If you need to alter network traffic paths, use policy based routing mechanisms instead of changing link bandwidth parameters. Also make sure tunnels have accurate bandwidth settings.
11. No QoS:
Important business applications are not prioritized, yielding unpredictable or poor performance during times of interface congestion. Applications like VoIP or SAP are susceptible to high jitter and packet loss when QoS is not used. Configurations that match corporate policy for QoS deployment are important (see 2).
12. QoS Queue Drops:
Important business applications are slow; Business needs have changed since the queue definitions were created. A network design for four concurrent VoIP calls will not perform well when more people are hired and the number of concurrent calls increases. Similar conditions exist for other applications. Queue drops are an early indicator of potential problems that require a network change.
13. Route flaps:
Poor application performance as packets take the wrong or inefficient paths in the network. It may be caused by unstable links or improperly configured routing protocol timers (see 2, 7). Packets may also arrive out of order, which some applications cannot tolerate. Varying paths will also cause high jitter, which affects time sensitive applications like VoIP and SAP. Studies have shown that people can deal with relatively high delay as long as the variance in delay is constant. But high variance in application response will drive people crazy.
Identifying and correcting these problems will allow your network to better service your business’ network requirements.