I decided to write a blog post about Scrum Agile related to Kimball Data Warehousing projects. Kimball Data Warehouse projects are the only projects (deliberately) I have been doing for a few years in a number of companies and I noticed a few interesting and repeating aspects.
According to article published on the Data Warehouse Institute (TDWI) website http://tdwi.org/articles/2010/06/16/agile-data-warehousing.aspx
"Agile data warehousing (ADW) uses scrum as an alternative to waterfall project planning, providing a streamlined framework for building DWBI applications that regularly delivers modules faster with one fourth the developer hours, cuts project costs in half, and drives project defect rates toward zero."
The above statement is quite strong and not everyone will agree with that but in most cases people who disagree are usually people who don't fully understand Scrum and misuse the methods or "forget" to use key methods.
In this blog post I will present my interpretation of Scrum for Kimball Data Warehouse projects.
Let's start from explaining what is Scrum. Let's begin with word Scrum. Scrum is a word (and does not stand for anything; not acronym) and I actually don't know why it is called Scrum but Scrum is a method to deliver software and in my case Data Warehouses. Scrum is part of Agile development, you can think about it as a branch of Agile.
To be more precise Scrum is an iterative and incremental agile development method for managing projects and development.
Let's start with explaining this sentence:
"Iteration means the act of repeating a process usually with the aim of approaching a desired goal or target or result. Each repetition of the process is also called an "iteration," and the results of one iteration are used as the starting point for the next iteration."
- Incremental (in my own words)
Incremental means that you have a project to deliver and you break it down into small pieces and deliver it one piece at a time. For instance in data warehousing you you have multiple data marts (business processes = measure groups) and Incremental in this case means you will deliver ONE data mart (= business process/Measure group) at a time. Some people think about data marts as multiple business processes/measure groups which is fine but I prefer to think about data marts (and I believe the same is in Kimball books) as one business process / measure group (with all associated dimensions).
- Managing projects and development
Scrum provides guides for managing projects and development and I will go into more details about it later in this blog post.
Scrum provides you with methods and guides but YOU will have to implement them and configure by configure I mean that Scrum will be slightly different depending on projects, environments, people and so on so Scrum does not guarantee success of a project but if you decide to use properly it can help you to make your project a successful project.
You can think about Scrum like a set of tools and plans. Tools will not build a house but they will help you to do it quickly and efficiently if you use them properly. You need a plan in order to build a house but building a house is a major investment and you should use your plan as guide but sometimes you might need to slightly change it for some details for instance:
- You need to comply with regulations (and you house is 0.5meter too tall)
- Your situation changed and you need to "forget" about nice to have features.
- Your build part of it and you noticed that actually it would be better if you make some changes to the plan.
Data Warehouse project are similar to building a house but you get more flexibility as you can build one part (data mart) and start using it and build the rest later using experience you gathered from first build.
A while ago I attended a webinar provided by AXASoft http://www.axosoft.com/ that specializes in Scrum software (On Time) and I really liked one of the points the person made. Which is Scrum is simple but not easy in my option this sentence perfectly describes Scrum.
It's simple because you can learn main aspects literally within several hours but it is not easy because you need to properly use it using your own style and environment you are in and that takes experience and deeper knowledge. You can follow Scrum methods but you might not get the efficiency you expected; for instance you have a team of people and if you don't have good communication with the team or let's say it openly the team doesn't particularly like your style than you have a problem and Scrum will not solve it but if you follow Scrum guides than it can help you to minimize the conflicts and build up good relationship with team.
The reason I choose this example is because there is sometimes someone within a team that is excessively worried about the project and "polluts" the atmosphere that can create "blame culture" (direct or indirect) or someone is scapegoating (finding someone to blame) which occurs more often only because processes or proper management is not in place. Scrum helps to create an environment where team effort is KEY to get productivity and results largely exceeding anything else and it achieves that mainly because of great team atmosphere that can be created by following Scrum methods (building atmosphere and productivity takes time & effort)
Now that we covered many aspects of Scrum on high level let's move on into details of Scrum methodology.
I will now describe Scrum and I focus on comments that relate to Data Warehouse projects using Kimball Methodology. This blog post is my interpretation of Scrum using Kimball but I hope you will find it useful in your case.
I will start with roles and there are only three, how to create the plan and estimate effort, how you manage it including self-organizing for team members and finish off with very important aspect reviews and how you can use those to make the projects more efficient.
There are only three roles in Scrum and there are:
- Scrum Master
- Scrum Team
- Scrum Owner
Scrum Master: This person is responsible for managing the project but it should be fairly neutral. Scrum Master organizes daily standup (short 15 minutes updates; more on this later on). Ensures Scrum team is not interrupted and can simply do their work (very important) and helps to solve any difficulties team or project has. As you can see Scrum Master is almost like a Project Manager but with specific Scrum responsibilities.
My comments: In my opinion Scrum Master should NOT be technical (with few exceptions) and have at least general knowledge about Data Warehousing projects. Having technical person generally means it will get involved into Scrum team "business" and may tell people what to do and how to do it and that is very dangerous (and not Scrum). Scrum team was employed to do the job and it's best to leave them alone and give them opportunity to use their skills and experience. Scrum is a process to manage team and organize workload NOT to tell Scrum team how to do their job. Scrum Master should first of all listen to Scrum team and do something about their concerns and problems, so basically keep the team happy. Scrum Master should be objective and help to resolve issues including those between team members which may occur from time to time. Scrum Master is a very important role (mainly because it is just one person) and choosing the right person is very important.
Scrum Team: These people will actually do the job. Scrum Team members are responsible to attend daily standup and organize their own work.
My comments: Team is obviously key. These are people who actually do the work. In data warehouse project team will usually consist of Business Analyst, Data Steward, Data Architect, Developers (ETL, Reports) and Testers. There might be several other roles involved but these roles are generally must have. I say must have but very often there are two roles that are usually "missed". Data Steward that understands data and talks to business is generally not employed (big mistake! especially that this person is already there in business you just need to find it and convince someone to have their time). Testers are also often not employed and certain testing responsibility is moved to developers (big mistake again! No testers = often lower quality as developers are not given time to test it properly and usually are not so keen to do that).
Doing these mistake will only results in wasting other roles time and creating problems and I'm very open with that because that is ALWAYS the case, I have no doubts about it. Obviously not everyone has the budget and sometimes they prefer to be not efficient and that is ok because politics plays rather big part in data warehouse projects and it is up to the company (=always certain people) how they want to do it.
Scrum Product Owner: Product owner is responsible for prioritizing workload items (after getting estimates) and providing feedback from implemented features (very important review and future implementations). Product owner is also responsible for giving "green light" to deploy piece of work to production environment. You sometimes don't want to do deploy new work/features even if you finished your implementation and this decision is left to Scrum Product Owner.
My comments: In my opinion this role is extremely important in data warehouse projects. You have a single person that makes the decision. This generally will save you huge amounts of time trying to talk to business and agree and what to do and when to do it and will prevent you from committing to unrealistic deadlines (we will talk more about workload management later on). This should also largely increase "satisfaction" levels for both business users and Scrum Team. Let's face it business and technical people (especially Database people) talk in two different languages and there is really no point of having discussion on this level. Obviously you do need discussion during requirements gathering but that is handled by Business Analyst and Developers/Architects and Business Analysts is the person who "translates" it.
As you can see structure of roles is very simple and there are few simple responsibilities. Let's move on to more details and see the actual process and methods of Scrum and see in details how roles fit in into this process.
Some people say that agile is an approach where you don't have a plan. Obviously that is not true and in Scrum you have Product Backlog Items which is a collection of items (features/requirements, bugs and so on) that needs to be completed. Scrum focuses on development so outside of Scrum you still need to have road maps, procedures, standards and so on, without them any serious development might not be as effective as it could be (in long run).
The way I think about Product Bakclog Item is anything that should be completed. Simple as that. You can describe items in the following way:
- Details: Simply what needs doing and any supporting documents etc
- Effort: How much time is needed to develop it. This is generally provided by Scrum Team.
- Effort (from my point of view) can be split into bucket of 1,3,5 hours (or 1,3,5 days).
- Items above 5 days should be split into smaller items
- If you have items between 1 and 3 days. Chose higher number. In this case 3.
- Priority: How important the item? (business or development)
The above is just a simple list of three most important items.
The process is as follow:
- You gather requirements (record bugs) and add them as Product Back Log items
- Scrum team estimates effort
- Product Owner ads priority
- Product Owner decides in which order the items should be implemented.
- Sprint (more on that later on) is created using chosen items
- Scrum sprints are 1 to 4 weeks. So PBI are chosen based on this timeframe.
- Once Sprint is complete (or slightly in advance) the process is repeated
- Product Owner decided if completed sprint should be released to Production
My comments: In Data warehouse project Product Backlog Items might be managed slightly differently. First of all you most likelyu needs some kind of tools to manage the entire process and if you are using Microsoft products you can use TFS with new tempalate that was designed specificly for Scrum development. Once you have a tool you need to establish a process.
From my experience the process can be follow in data warehouse but certain adjustments are recommended.
Priority: First of all product owner should set priority based on Scrum team feedback as generally in data warehouse there are dependencies (development or release dependencies).
Effort: You may want to establish a process of estimating effort. Junior developer might need 5 days (if it's within her/his skills), Senior Developer might need 1 day. This is big difference. You might want to chose higher estimates when an item can be potentially completed by two developers with largely different experience. It's much better to complete something earlier than later. Sprint
Sprint: Sprint (discuss in more details in next section) should be between 1-4 weeks.... Remember Sprint is for development and does not equal release to production. Very often you should have several short sprints before you perform release to production and most likely your last sprint will be preparation for production release with testing.
What I like about Scrum Product Backlog Items? Data Warehouse Project are literately never ending so amount of things you could do is very large but PBIs allow you to break it down into manageable items and it fits very well with Kimball Data Warehousing methodology where you aim to deliver one business process (single measures group = data mart) at a time.
Another aspect I like is that it is BUSINESS DRIVEN and Product Owner after consultation with business users and Scrum Team will decide what are THE most valuable items for the next sprint (not necessarily release). The challenge here is that Data Warehouse process (ETL + reporting) can get complex and Scrum Team should make Product Owner aware of the consequences.
Changing the State of a Product Backlog Item.
Microsoft has a page dedicated to Product Backlog Item (Scrum) that is part of TFS template and one of the diagrams describe the process of changing status. for more information http://msdn.microsoft.com/en-us/library/ff731576.aspx
Please be aware that diagram below describe ONLY process of changing the status and later on I will try to draw a diagram myself of Scrum (from my point of view)
I will finish here for today and later on I will cover:
and most likely several other ones