Are you (like myself) a many-years-of-experience IT person? Have you been there, seen a lot, and been regularly faced with the onslaught of technical change to the point that getting new certifications seems pointless because it will be outdated within a year? If we could only get ahead of the curve, and the endless drone of customers not being happy, but it just goes to show that technology is not flawless and neither, frankly, are the people who use it, right?
I won’t tell you that I found the one magic bullet that solves it all. But what I will tell you is that after two decades in the business, I finally found my “V8” moment. I felt like an idiot that I didn’t realize it sooner, but as I checked around, most of my peers didn’t get it either -- which made me feel a little better. So, I’ve committed to sharing what I learned, and don’t worry, you don’t have to do sign up for anything or join another email list. It’s just this: “The performance of laptops, PCs, and servers depends way more on disk drive performance than they do on memory or the CPU.” You might think, “Well, duh!”, but I have to ask you – how often have you looked at a computer and concluded from its sluggishness that perhaps it needs more RAM? Maybe it does, but I’m here to tell you that your first thought should be, “What drive performance am I getting?”. Your second, third, and fourth thoughts should be about RAM, antivirus, and CPU.
And that’s it! My eureka moment from a few years ago. And frankly, you probably already have these instincts. After all, isn’t that why you switched from buying laptops with a spinning disk to SSDs? It’s not just because they are more popular now, it’s because the laptop drives of yesteryear were only rated for 50 IOPS (input/output per second). And I’ll bet that you already spend more money on drives for servers than for PCs – but why? Is it because more money means more reliability? Or is it because a SATA drive for a PC is capable of 75 IOPS and the SAS drives for your servers are 140-180 IOPS? Have you noticed that your virtualization efforts are not quite what you expected in terms of performance? It isn’t that you didn’t throw enough RAM at the problem, yet that third or fourth server you virtualized is definitely slowing the system down. Why? It’s the drives, my friend. Spend a little more on the controller and SSDs and you’ll be able to virtualize like a rockstar.
When network performance lags across the board, you have to look at the common denominator, and that means storage.
Maybe you already know all this… and if that’s the case, then why are you still using a SAN that has spinning disks, or isn’t at least a hybrid with a well-optimized balance of SSD and spinning disks? True, there are a lot of things that can contribute to poor workstation performance -- and I am going to systematically touch upon them in the rest of this eBook -- but the foregone conclusion is this: When network performance lags across the When network performance lags across the board, you have to look at the common denominator, and that means storage.
Whenever you get that dreaded, “My computer just seems way too slow” call, you have to make the obligatory house call, even if your Spidey Senses are leading you elsewhere. The first step in troubleshooting is to start locally and gradually work your way up the network chain. For an experienced IT person, the first thing you want to rule out is a PEBKAC error (Problem Exists Between Keyboard and Chair). In the case of a poor performance complaint, you will have to prove to yourself whether it’s justified or all in the user’s head. Once you eliminate that from the list of possibilities, you can move straight to the hardware.
As you well know, performance issues can be caused locally by insufficient resources, outdated drivers, software issues from a recent update, or even a badly fragmented hard drive. I like to use the OS’s diagnostics to verify that drivers are up to date, and all connected devices are functioning normally. While I’m in there, I verify there is enough free drive space to support virtual memory as well as sufficient RAM. I have found 8GB to be the optimal amount for a workstation that runs typical office applications, like databases, word processors, spreadsheets, etc. If the computer is used for high-end image editing, CAD, video or audio production, then 16GB should be the baseline. You can determine if a workstation is under excessive load by looking at the % Utilization of the CPU and the amount of Available Memory. If all the above checks out, then you need to consider network latency.
In networked environments, the “C” in PC stands for Client (or it should). The client relies exceedingly on the performance of the servers it communicates with, which connects back to storage, but let’s not get ahead of ourselves. In wired environments, don’t overlook the Ethernet adapter card. Make sure drivers are current and use the OS’s diagnostics to observe network activity and make sure there is no background application consuming bandwidth, memory resources, or clock cycles. On older computers, make sure the network adapter is capable of 1Gbps full-duplex operation. If the networking components check out, the problem is at the other end of the cable… but take a moment to check the cable itself.
In networked environments, the “C” in PC stands for Client (or it should).
Cabling issues are rare, but the 1-minute it takes to check the physical integrity of the cable and its connector will allow you to confidently check that box and move on. Verify the cable type (e.g. CAT 5e) is sufficient to handle the network speeds and distances for which it is purposed. If you have reason to suspect a cable issue, perform a continuity test across the full length of the cable. This can rule out a whole class of transmission issues caused by interference and the like. Check for sources of electromagnetic interference (EMI), which can result when a network cable runs parallel to a power line or a power cord. Once you have ruled that out, you can safely conclude that the problem is either with the servers, the storage, or somewhere in between.
If you have concluded that poor performance originates from the servers and storage, it should be affecting everyone on that network. If you haven’t been receiving many IT complaints, there is nothing wrong with doing a little investigating. Walk around the office and casually ask people how things are going. Let them know that there have been complaints about slow speed issues and you are curious if they have experienced it. Many people will not complain overtly (at least to IT) about performance issues if everything is basically still working, but when you open the door for them to have that conversation, they will usually provide some useful information.
I referenced this earlier, but it’s a point worth repeating. If your workplace is experiencing a pandemic of slow computing and you – the IT person -- are receiving a lot of complaints, resist that urge to placate them with quick fixes, like RAM upgrades. Poor performance across the board is an indicator of a network problem, and unless you have reason to expect a problem with switching, it is most likely caused by extensive latency between the servers and storage.
...resist that urge to placate them with quick fixes, like RAM upgrades
The only way to know for sure is to do some storage I/O benchmarking, using tools like IOMeter or an equivalent tool that runs on your OS. There are several tools that focus on stress testing different types of databases if you suspect that is your problem area, and there are also tools that specialize in testing virtualized environments. A quick Google search will get you there. If you are new to I/O testing you will need to investigate how to simulate common application workloads, and then know how to analyze the benchmark results. You cannot go by -- or compare your results to -- vendor performance claims, which are always based on “optimal conditions” that don’t reflect the workplace environment.
When analyzing storage performance, you are looking specifically for sources of latency. In general, the cause for this is too many data requests all trying to fit through the same front-end port on the storage array. But requests can also get stacked up at the controllers and disk drives too, so you should first try to discern which of these is causing the applications to perform poorly. Below are a few common causes to look for.
A healthy storage network should process I/O requests within a few milliseconds, but once queued, this can turn into hundreds of milliseconds.
When poor performance comes on suddenly, it could indicate a drive failure in the RAID group. While the firmware should have alerted you about this, it is worth verifying the performance of the disks in the array. This is because when a drive fails, the rebuild operation can impact performance across the array and cause the sudden performance drop. The intensity and duration of it depends on the RAID implementation and the number of disks.
If too many users are accessing an application at the same time, it generates a large number of small random I/O requests, typically with more reads than writes. Unless those requests are spread over many drives, they end up getting queued -- the data equivalent of waiting in line. A healthy storage network should process I/O requests within a few milliseconds, but once queued, this can turn into hundreds of milliseconds. Once you get around 500ms, you can bet the IT complaints are going to start stacking up!
There are some applications common to every user, such as email, that see high activity throughout the work day. If another application happens to occupy a volume on the same physical drives as the email, then there will be a performance trade-off between these two. This means you will see intermittent drops in the performance of the email server throughout the day as I/O requests are shared with the other application on that drive.
If you have an active Virtual Machine (VM) environment, there are a few potential pitfalls that could wreak havoc with the storage system, such as if too many VMs are assigned to the same array port, or you have a couple VMs running I/O intensive tasks that consume the lion’s share of resources. Even though VMs are configured to only us a fixed percentage of resources, if those requirements are not properly balanced across the servers and the storage, it can become a source of bottlenecks.
Let’s assume that you -- the IT person -- has dutifully performed the network diagnostics and there was no obvious red flag and hence, no easy fix. But, you did identify a lot of latency between the servers and the storage, as is often the case when a SAN reaches its operating capacity. Make no mistake, this is the source of your performance issue. If you already have hybrid storage, the most common (and least expensive) solution is to add more SSD drives and get rid of that inherent latency that comes from HDDs. On the more-expensive side, if you don’t have a hybrid storage system, you can add an all-flash array to your existing SAN or add/ upgrade to a hybrid that will offer the best “bang for the buck” performance as well as raw storage capacity. It might a good conversation to start having with your manager, using your recent diagnostics to make the case.
If you find yourself SAN shopping, remember that the core problem you are trying to solve is performance.
If you find yourself SAN shopping, remember that the core problem you are trying to solve is performance. Do not get distracted by capacity at this level. You have already measured the IOPS and latency across your existing storage network, so you have some idea what kind of performance you need. Now take it a step further and find a solution that gives you ample headroom for the future, otherwise you may find yourself in this same place a few short years from now.
Many vendors will push all-flash storage solutions, but even though SSDs are dropping in price, they are still considerably more expensive (per Megabyte) than a HDD, and they still lack the extensive storage capacity per drive. For most business applications, a hybrid storage array provides the very best of both worlds with a much friendlier price tag. Just to give an idea, a “good” hybrid can deliver 200,000 IOPS per workload with sub-millisecond latency.
Beyond performance, compare the total capabilities of each storage system. There a lot of questions you need to ask, but here are the top nine questions:
Do NOT think that all SANS are all the same. Recent innovations in technology have significantly changed the way storage is done.
These features and many more are the result of superior software development, where the company has made its largest investment.
For decades there has been no major innovation in the enterprise storage arena. Drives have gotten bigger and cheaper and increased the overall array capacity, and then SSDs joined the scene and gave a nice nudge to performance. That’s about it. The way data is processed, backed up, and restored really hasn’t changed. A lot of this placidity was driven by the big storage vendors who, at the behest of their major clients had to maintain backwards compatibility with all their legacy products. It’s hard to move forward when you’re always looking back, and the whole industry fell into this decades-long lull. That is, until recently.
Even when you find the “right solution”, you want to really research the vendor’s history of providing customer support, particularly to a business your size. There are a few big storage vendors who have name recognition and reputation, and they will gladly sell you a box. The question is, when you need support, will you always be playing second fiddle to their billion-dollar clients? This is another area where going with an emerging and fast growing vendor can be advantageous. In general, they have everything to prove, and should be much more deeply vested in your success. Again, check references and talk to other clients. Compare the vendor’s service plan to others in terms of cost and deliverables. This is a major purchase decision, so you don’t want to leave any part of it to chance.
CNS Partners is an authorized and dedicated reseller of NetApp storage arrays, and also offers Cloud Storage solutions. Whether you are just browsing or the market seriously or seeking to implement a comprehensive storage solution, CNS Partners brings decades of industry experience working with VMware, Microsoft Hyper-V, and Citrix XenServer. CNS Partners also has broad network, switching, and application level technical assistance to offer.
This is a major purchase decision, so you don’t want to leave any part of it to chance.
To learn more about the ways we’ve helped other companies just like yours build resilience into their IT environments cost-effectively, download CNS Partner's Expert Guide to High-Performing IT Systems today.