Saturday, November 23, 2024

Perf counters - Network

 From a strictly SQL Server perspective there are a few things that can be done to improve network performance.  The first tunable item is the size of the SQL Server network packet.  Ignore the incorrect advice of making this size the same as your network’s Maximum Transmission Unit (MTU) size, typically just 1500 bytes.  As it turns out, the TCP protocol, which SQL Server leverages, allows up to 64KB (with IPv4) chunks to be handed to it and it manages the task of dividing that up into smaller pieces for optimal transmission.  In fact, setting the SQL Server packet size to such a small value causes more network overhead because each packet must have its own TDS header and requires SQL Server to do the dividing and re-assembly of the data.  As a result, the maximum setting of 32 KB can be the best choice even if your environment only occasionally sends large query results. 

Other items you can do to impact performance of SQL Server network performance include: 

  • Using the very latest SQL driver available.  Microsoft has greatly improved the performance of its TDS drivers over time; therefore using the latest is important. 

  • Avoid using the connection string setting “Multiple Active Result Sets” because it is horribly inefficient.  Instead use multiple connections with connection pooling. 

  • Use asynchronous processing with threads in your applications. 

  • For connections that manipulate large datasets, request 32k packets 

 

Common Network Performance Problems 

1. High CPU Usage 

The CPU or Central Processing Unit is the key component of the computer that is responsible for receiving and processing instructions for systems and applications. High CPU usage on a network is a warning bell for slow network performance. The most common cause of high CPU usage is the network is being bogged down by enormous amounts of traffic. CPU usage spikes when processes require longer to execute or when many network packets are exchanges throughout the network. When the CPU is overused, latency, jitter and packet loss may increase which will result in the entire IT infrastructure to deteriorate.  The Fix There are devices such as switches that have hardware components (ASICs and NPUs) that can take some of the responsibilities off the CPU. They can take charge and process packets quickly.  

2. High bandwidth Usage 

Bandwidth is the network’s capacity to exchange data between devices within a given period. Higher bandwidth enables faster data exchange and allows more devices to be connected at once.  When someone, or something, is monopolizing the bandwidth by downloading gigabytes worth of data, it creates a congestion. When there is congestion, there isn’t enough bandwidth left for other parts or users, which then leads to problems like slow speeds.  The Fix Instead of increasing the bandwidth, it is best to first see what is eating away at it in the first place. If there is a faulty system which is eroding the bandwidth and you go ahead and increase it, it will continue to eat away leaving you with the same cyclical problem. Therefore, Monitoring the network to get to the root of the problem is always a good start.  

3. DNS Problems 

DNS, Domain Name System, is essentially a directory that matches domain names to IP addresses. Every website has its own IP address on the web, and computers can connect to other computers via the internet and look up websites using their IP addresses. DNS errors occur when you cannot connect to an IP address, signaling you may have lost your internet access or network. For instance, your site may appear online to you, but offline to your visitors.  The inability to access the internet or sites can have a significant impact on your business. It is thus very importance to find and fix problems as soon as possible.  The Fix Network monitoring solutions proactively monitor all devices, equipment, system and applications of a network. A complete network overview will allow easy spotting and fixing of DNS and other network problems. 

You can also identify these network problems by testing and measuring different operating parameters based on a variety of network performance metrics, such as: 

  • Latency 

  • Jitter 

  • Packet Loss 

  • Throughput 

  • Packet Duplication 

  • Packet Reordering 

 

Network Bytes Received/sec 

The Network Bytes Received/sec counter shows the byte rate at which information is received over each network adapter. The bytes used for data packet framing are also counted and included in the value 

The counter shows “how many bytes you get from the NIC. This is a measure of the inbound traffic” [1] 

You can use this counter to calculate the incoming data rate based on the total available network bandwidth, as this information is easier to understand and identify potential network clogging 

There is no specific threshold value. However, have in mind that Network Bytes Received/sec is a component of the Total bytes/sec counter, which should be less than 50% of total network bandwidth 

It’s recommended to watch this counter overtime and determine trends. Any unexpected peak in network activity should be investigated and the origin of the massive network traffic should be identified 

Network Bytes Sent/sec 

Similar to the Network Bytes Received/sec counter, the Network Bytes Sent/sec counter shows the rate at which bytes are sent over each network adapter. Again, it’s useful to calculate the outgoing data rate as a percentage of the total network bandwidth 

“This is how many bytes of data are sent to the NIC. This is a raw measure of throughput for the network interface. We are really measuring the information sent to the interface which is the lowest point we can measure. If you have multiple NIC, you will see multiple instances of this particular counter.” [1] 

The same as with Network Bytes Received/sec, there is no specific threshold, but you should have in mind the Total bytes/sec value 

Network Bytes Total/sec 

As expected, the sum of Bytes Received/sec and Bytes Sent/sec is equal to the Bytes Total/sec value. It is the byte rate at which data is received and sent over each network adapter. Framing bytes are included 

“This is simply a combination of the other two counters. This will tell you overall how much information is going in and out of the interface. Typically, you can use this to get a general feel, but will want to look at the Bytes Sent/sec and the Bytes Received/sec for a more exact detail of the type of traffic.” [1] 

When investigating high Bytes Total/sec values, check the %Disk Time counter for the physical disk and %Processor Time values. If the latter two are normal, this is an indication of a capacity problem that can be solved by network interface configuration or adding an additional network adapter. Also, network re-configuration and creating subnets can solve the problem. 

Current Network Bandwidth 

The network bandwidth shows how much data can be transferred through a network interface over time. Some other resources define it as the maximum rate of a network data transfer, or network capacity. The network bandwidth depends on the network infrastructure. What you can control and configure is the amount of network bandwidth used by your servers 

The Current Network Bandwidth metric “Shows an estimate of the current bandwidth of the network interface in bits per second (BPS). For interfaces that do not vary in bandwidth or for those where no accurate estimation can be made, this value is the nominal bandwidth.” [2] 

A network with a high bandwidth is a prerequisite for good SQL Server performance over network 

For performance monitoring, compare the Current Network Bandwidth value to Bytes Total/sec. The Bytes Total/sec and Current Network Bandwidth ratio is also called Network utilization 

The Bytes Total/sec value for all servers on a network should be less than 50% of the total network bandwidth. If constantly higher, additional investigation is required. The first step is to determine the applications that are saturating the network. Have in mind that some operations, such as creating SQL Server database backups on a remote storage, use a lot of network bandwidth 

The ratio values higher than 80% indicate very high network utilization that should be attended immediately. If the ratio is close to 100%, it means that you’re using almost maximum network capacity, and that additional traffic will cause bottlenecks, which will reflect in delays. This situation can be solved by increasing the bandwidth (which is sometimes not easy to achieve, or not possible at all), or segmenting the network. 

The most important events for measuring the round-trip are SQL:BatchCompleted and RPC:Completed events.  

 

No comments:

Post a Comment