SQL Clarity: August 2018

Sunday, 12 August 2018

SQL Server DBA: The worst days

In a recent blog post Steve Jones posed the question; what was the worst day in your career? Great idea by the way.

A couple of experiences that occurred early on in my DBA career sprung to mind. There was the rather nasty corruption of a critical but not highly available application database that happened mid-afternoon which led to a very manual and overnight full restore (legacy system means very legacy hardware).

The subsequent post-restore checks were also quite lengthy meaning the entire recovery process concluded at around 5.30AM the next morning, which actually wasn't that far from my estimated ETA of a working system. Operationally the effects weren't too bad; transactions were captured using a separate system and then migrated into the restored database when it came back online. I'll never forget the post incident discussion either; no finger pointing, no blame whatsoever just a well done to all for having a successful recovery operation and a genuine interest in how we could further minimise any impact in future.

Then there was the time the execution of an application patch with a slight (and undiscovered until then) code imperfection brought down an entire production server, that just happened to be attempting to process some rather critical financial workloads from various systems at the same time. In truth it was a completely freak event that had happened on a combination of very old systems that were considered flaky at best.

The systems were brought online quickly enough but tying together the results of the various processes that may or may not have worked took hours and hours of querying with lots of manual updates. It might sound terrible, but because of the coordinated effort between different teams and individuals it had actually taken a fraction of the time that it could have done and not only that, data was confirmed to be 100% accurate.

Want another corruption tale? Why not. How about the time a system database on a 2005 instance went all corrupt rendering the instance completely useless? Of course it never happens to a system that nobody cares about, no, yet another critical system. The operational teams went to plan B very quickly but even better, a solution that avoided large restores was implemented quickly so the downtime, although handled well, was still significantly reduced.

Looking back there's plenty more, I think it's fair to say that disaster is a real occupational hazard for database professionals. And yet despite being labelled "worst days" I actually look back on them with a large degree of genuine fondness.

You see disasters are always going to happen when databases are involved, it's a fact and how we deal with them at the time is equally as important as how we learn from these events. In each of these examples a recovery plan was in existence for both technical and operational viewpoints, as well as that everyone involved knew what was happening, what to do and critically not to add any additional pain to the situation but to arrive at the solution as quickly as possible.

Learning from these events meant asking the right questions and not taking a viewpoint of blame. How can we prevent this, how can we make a recovery process more robust and what can we implement technically and operationally to improve our response times and also critically, when can we schedule the next Disaster Recovery test?

Worst days? In one sense most definitely yes. Nobody wants to be in the middle of a technical disaster that's going to take hours to resolve but a solid recovery plan, collaborative effort to a solution and an open forum to analyse and learn from the event makes these memories much less painful!

Building a DevOps culture

In my last post I described some of the reasons why organisations fail to implement a successful DevOps methodology. Often there is a misunderstanding of what actually DevOps is but often existing working cultures can be the thing hindering progress.

From webopedia: "DevOps (development and operations) is an enterprise software development phrase used to mean a type of agile relationship between development and IT operations."

Being a consultant I often work in the "space" between different technical roles which gives me an ideal view of how well companies are utilising DevOps practices or sometimes, where they're going wrong.

For me the most crucial part is building strong collaborative working relationships between teams. In the database world this isn't just between developers and DBA's but also any team who in some way interacts with SQL Server. This includes support teams, testers, release and change teams, architects and technical management.

How we seek to build these relationships is pivotal. As I mentioned in the last post, forced collaboration is a common approach that ends up being counter productive. Organisations in their rush to build a DevOps culture can be too rigid in how they look to develop increased inter-team working, often over-formalising and creating very process driven activities.

Instead organisations should look to encourage rather than dictate and I've seen many successful ways that this achieved, often in a management hands-off style that lets technical teams freely integrate and discuss innovative ways of doing things in much more open forums. When consulting with database professionals we explore common pain points that are shared between teams and how solutions to which are in some way, arrived at by leveraging one another's expertise.

I say in some way because often the the issue isn't strictly technical but comes down to process instead. Release and change management are great examples of this; developers naturally want to make more and more frequent changes to systems which is against the better nature of traditional DBA's.

Understanding each others objectives is the first stage of developing a collaborative effort to build upon existing processes (not work around them) to help each other achieve common aims. The word I never use is compromise and it should never feel like that. All involved should feel like they are building solutions together and not feel like that are to required to relinquish something to get there.

This is a common side effect where the approach to DevOps is unbalanced where teams are becoming involved at different stages. Instead organisations must involve all parties as early as possible and avoid maintaining those traditional silos.

Increased cross functional teams means that teams work much faster together and this effects both development and problem management. One of the obstacles for moving to a platform of more frequent deployment is the risk of introducing much more failure to production systems. Done correctly, a DevOps methodology negates this by increasing the stability of systems and reducing the complexity of releases to environments which in turn makes faults much easier to not just to recognise but also rapidly back out from.

It sounds like a case of everyone wins and typically I would honestly agree with that statement. A DevOps methodology has benefits for both teams and businesses alike; better employee engagement, much more personal development opportunities, increased productivity, more stable environments, more frequent enhancements and improved response times to defects.

Issues that are preventing a DevOps methodology from being implemented can be often be resolved from a cultural perspective. A key starting point for organisations is to encourage collaborative relationships early on and for teams/individuals to seize the initiative and start talking about common pain points, desired solutions and building shared knowledge.

Wednesday, 8 August 2018

Reasons why DevOps implementations fail.

Over the last few years we have seen a monumental rise in the number of organisations adopting a DevOps working culture. This trend shows no signs of slowing down whatsoever and whilst many are now realising the benefits of these working practices many are also struggling with the adoption and in some cases it's either stopped completely or not even started.

There's a number of reasons of why this happens and I've seen some common causes, which is actually a good thing because we can recognise where these are occurring or even prevent them before they start to become a problem.

No clear direction.

It's very difficult to drive to a destination if you don't know where you're actually going (trust me on this, I've tried it). This is obviously very reasonable advice however many DevOps projects fail because of a lack of direction. It's actually a common issue with many buzzing trends, particularly in IT where organisations rush into a technology stack or movement just for the sake of doing it. Inevitably this ~~often~~ always leads to failure.

Organisations need to fully understand what a DevOps culture is, its objectives and its close relationships with their existing business processes and frameworks. A common misconception is people often viewing DevOps as a direct replacement for ITIL when in actual fact it's more of a cultural change built on top of ITIL principles. By fully understanding DevOps the benefits of adoption become much more viable and ultimately the path to it's success becomes much simpler.

Adopting a silo approach to DevOps.

I often see individual teams being very successful in implementing components of DevOps practices only for other teams being behind in terms of adoption and/or understanding. The classic case is the Developer and DBA; the developer is pushing for much more frequent releases (a realised benefit of DevOps) but then the DBA, who perhaps isn't on board, is then trying their best to slow down all of these changes to their production databases. In the words of Bon Jovi, "we're half way there".

This lack of cohesion or a shared direction can result in a significant bottleneck and the DevOps practices start to creak a little. Then other unintended side effects start to creep in, such as blame and finger pointing (some of the things that a healthy DevOps culture seeks to eliminate) and then it can all start to fall apart.

DevOps for databases is one particular area that is so heavily reliant on lots of people in different roles working together in a collaborative manner. An organisation must identify this and start to encourage teams to engage and build with each other in the early phases of a DevOps implementation, but organisations also have to be very careful in how they seek to develop this collaborative culture...

Forced collaboration.

I believe collaboration underpins the entire DevOps methodology so it makes perfect sense for organisations to work towards developing much closer working relationships between teams however organisations can also over-formalise things, even making the activity seem very process-driven which often leads to much less buy in from individuals, even entire teams.

This causes obvious problems, not least the silo approach mentioned in the previous point, so organisations have to find the balance on being almost relaxed in how they let relationships build and at the same time provide a certain degree of steer. This isn't as easy as it sounds and it is certainly reliant on strong leadership. In my experience successful implementations have been led by those that enable positive change rather than those who try to dictate it.

Rushing into new tools.

New tools are awesome, fact and in a DevOps ecosystem there are so many to pick and choose from that each bring new ways of doing things and [hopefully] improving productivity. The advantages are great, without a doubt but often tools can be implemented way too early without a focus on the underlying processes. This can significantly reduce the effectiveness of what a particular toolset/platform is trying to achieve; a release management tool for example won't improve a change/release process if the process is fundamentally flawed.

The most successful DevOps implementations focus on people and process first, leveraging the strengths and principles of existing frameworks and building strong collaborative working practices. New tools are going to be an important factor of a system designed with DevOps principles in mind but they need to be leveraged correctly and for the right reasons.

These are some of the common pitfalls that I've seen occur during DevOps implementations, many of which are deeply rooted in the working culture, not the technologies. There is undoubtedly so much to gain by adopting the principles and often it requires organisations to step back and observe their working practices first and spend time exploring the benefits of working with DevOps.

In my next post I'll cover how to address some of these issues and offer some insights into the benefits of building collaborative relationships between data professionals.