I come across more and more often companies that struggle to get good performance using SQL Server, it surprises me that people employed in these companies try to solve the issues in very odd way, which usually is caused by having certain understanding of concepts but no real experience and in depth knowledge of performance tuning.
The funniest approach I come across is to expect SQL Server to perform well without ANY performance tuning at all and saying SQL Server is just slow :)
I have decided to write SQL Server Performance Tuning overview. I hope that people who are asked to improve the performance will find this overview helpful so they can focus on areas that really will make difference or make decisions that will work in long term.
Our article is written with SQL Server 2012 and SQL Server 2008 R2 in mind but this should be applicable to previous version.
Subject: My experience is focused on building BI systems using Microsoft BI Stack with Kimball Data Warehouse Design therefore this article will focus on these areas and may not be applicable in certain cases to other approaches.
- Accidental DBAs.
- Certain companies don't have dedicated and experienced DBAs which is majority of companies that have performance issues. Accidental DBA is a person that is not a DBA and does not have skills or experience that standard DBA has but was asked to take a role of DBA. This is most likely a SQL developer or someone who has some knowledge about SQL Server (Architect, Tester and so on).
- Fresh DBAs
- If you are DBA and have certain knowledge but lack experience or try to find other ways that might help you in you job.
When it comes to issues with performance than generally they can be grouped together and below are several situation that I come across myself:
- I paid a lot of money for you SQL Server! Why don't you perform well!
- This is situation where people develop (without DBA/Performance knowledge) a database, load data and write SQL queries against it or use some software that auto-generates SQL queries. Problem occurs fairly quickly when they develop a report or dashboard and it takes 20, 30 or more seconds just to get data from SQL Server. They generally blame SQL Server or sometimes hardware.
- Everything is slow, let's upgrade software and hardware!
- These kind of situations occur often with accidental DBAs where the system is overloaded and they think that upgrading software or hardware will solve all the problems. Generally design of BI solution is ignored and certain best practices as well, which is usually caused by low budgets and trying to find "magic" way to solve the problem.
There are systems that are simply large and contain hundreds of GBs of data (or TBs of data) and these situation are not covered here. I focus on relatively small databases up to let's say 10-50 GBs and are slow (if database is structured incorrectly 10GB might take 200GB of disk space) ... but believe me I've come across databases that are 10MBs and were slow (typical situation one).
I think the best way to give you good overview of performance tuning is to start with compare it to something else. BI systems can be compare to a company that has a website and sales large amounts of good. This kind of companies are amazon, Tesco and so on.
Imagine you have set up a website that sells goods. You have different suppliers of goods and small warehouse to store it.
Let's translate it:
Factory: Source of data
Scheduled Delivery: Data travelling from point A (source) to Point B (Destination)
Warehouse: Delivery arrives and is stored in a warehouse
The second part is where a customer places an order. It is picked in a warehouse, travels to customer and is delivered to customer.
Aim is to deliver goods to client in the shortest amount of time.
Your probably experienced or heard of someone who placed an order:
- With next day delivery and didn't get it!
- And didn't get it after 4 weeks!
- And it was "lost" in the system
- but delivery Van broke down and it was delayed
- and it was unavailable (out of stock)
Those situation are very similar to the ones in data warehouse and companies has to deal with it the same as you have to deal with it. There is a actually a reason why Data warehouse is called data warehouse and it is because it is very similar to a normal warehouse with good and has similar process that supply chain has.
Imagine a warehouse with goods and someone who places an order for several products. Can you image how the goods are stored in a warehouse and what would be the process of retrieving them?
Your probably think about some structured way to do that. If yes that is good and in SQL Server database when you create a table it is created in two ways one is structured (clustered index) and non-structured (heap table). Choosing the right one will improve the performance (design decision).
Let's come back to the warehouse image. Even if your warehouse stored data in a structured way you still need a way to retrieve the data in the most efficient way.
Image you have 3 products to products to retrieve. How would you do that in the most efficient way?
Would you go through the entire warehouse and try to find it? How long would that take? Very long? That is precisely what happens if your tables are not optimized. It will take long time to retrieve data and you waste your resources.
Maybe you thought about organizing your warehouse and splitting it into section from A to Z and then having a list of products that shows you which section has your products and where exactly it is located for instance "A-2 P24".
Can you do the same with the table? Of course and you should! It's called non-clustered index in SQL Server 2012 Enterprise edition you can also create read-only Column Based Index. This will allow you to locate your product easily and drastically reduce time to retrieve it.
Our task also involved 3 products so you should have some kind of way to identify the best way to get 3 products. It might be that you need some special equipment to retrieve one of the products or carry it. Comparing to SQL this may be compare to JOINs which need special lists (indexes) to get optimal performance.
Creating proper indexes is not easy but it is mandatory to get good performance! Unless you want your resources to check every single box in your data warehouse just to find one product!
Now that we covered the basics let's discuss how you should approach performance tuning. Our example has different stages and works as a system. It is important to understand that you need to measure your performance in KEY areas. The important bit is KEY as you can optimize system is many different ways but you need to focus on things that really matter for your users or system.
Start with measuring your system and creating some kind of Key Performance Indicators (KPIs). Below are several examples:
- Average time per query
- Target: 2 seconds
- Actual: ??
- Number of queries (excluding exceptions) during one day that take more than 30 seconds between 9am and 5:30pm
- Available disk space:
- Average Index fragmentation
The list can be very long but you should take time to identify KEY KPIs for you on focus on them first.
Once you have KPIs you need to review your hardware, software, workload and database design and ensure your KPIs are achievable but at the same time determine which part is most important to achieve your KPI and plan future changes accordingly.
You can monitor you Server using Perfmon (high level), DMVs (detailed view) and several other tools or you may need to write T-SQL Scripts.
There are several options for performance improvement but remember each one will give different results depending on your environment and often skills and experience of the person performing the optimization.
- Software upgrade can improve performance or give you more features to do that and make management easier. Upgrades are serious and you should take into consideration many aspects. Latest version are generally recommended for new projects and projects that are easy to migrate. Remember that with SQL Server you can test the difference using SQL Server 2012 Developer edition which is equivalent of Enterprise but for 50£ yes 50.00£ as long as you don't use it on production server or for production purposes you can use it for testing. Licensing is important aspect so ensure you consult your decision with an expert.
- Upgrading hardware can improve the performance but it all depends (like can you afford it?). If you have good hardware than changing it to even better might have sometime little effect, on the other side if you have really low spec hardware and you get an expert to buy & configure you really good one than you may see very big difference in performance (bear in mind that without indexes even with very expensive hardware you might not see improvement that you expected).
- This is an important aspect that can really boost your system. If you install SQL Server on one physical disk you might get much better performance but splitting it into multiple physical disks (system tables in particular). By limiting memory usage on SQL Server you may avoid server going really slow. Going outside of SQL Server you can enable hyper-threating (on in some PCs) which can give you in theory 30% improvement however average case is 10-15% improvement and sometimes worse than before. Remember every settings needs to be carefully considered and needs to be tested properly to ensure you get expected performance improvement and avoid decreasing performance.
- SQL Server Features
- In order for your system to perform well you need maintain indexes and statistics. These are KEY to get best performance. Indexes are simply but few people can use them very efficiently and this is where you may get the biggest difference. Some people prefer to focus on other aspects and ignore indexes because they need to be applied on most tables. This is not a good idea and during design you should think about creating indexes or allocate certain amount of time to refine them. Once indexes are in place it takes some time before they need to be readjusted so although initially you might spend some time on it. It really is worth the effort. REMEMBER to re-organize or defragment indexes on regular basis (daily or weekly).
- In data warehouses you use ETL to do heavy lifting and cubes (multidimensional or in-memory) to get very good retrieval times. Without them you system might be always very slow. Design is KEY element of performance tuning. I've come across system that would "bypass" ETL and use multiple views and stored procedure (full reload of data) and end results was 15x bigger cubes that it should be and load time of 8h instead of literally 10 minutes.
- Design requires a lot of skills and knowledge and ensure you do it properly first time. If you need to create Proof of Concept (POC) you can bypass certain best practice but remember that the cost of doing that is re-design later on (for instance replacing load process with proper ETL process).
I hope you found this article useful and gave you several good ideas of how you can handle it. Performance tuning can get complex but if you identify bottleneck (using KPIs), allocate reasonable amount of time to maintain it then you might find that you just need to focus on several KEY aspects and you can master them in relatively short period of time.