Showing posts with label development. Show all posts
Showing posts with label development. Show all posts

Sunday, 12 August 2018

SQL Server DBA: The worst days

In a recent blog post Steve Jones posed the question; what was the worst day in your career? Great idea by the way.

A couple of experiences that occurred early on in my DBA career sprung to mind. There was the rather nasty corruption of a critical but not highly available application database that happened mid-afternoon which led to a very manual and overnight full restore (legacy system means very legacy hardware).  

The subsequent post-restore checks were also quite lengthy meaning the entire recovery process concluded at around 5.30AM the next morning, which actually wasn't that far from my estimated ETA of a working system. Operationally the effects weren't too bad; transactions were captured using a separate system and then migrated into the restored database when it came back online. I'll never forget the post incident discussion either; no finger pointing, no blame whatsoever just a well done to all for having a successful recovery operation and a genuine interest in how we could further minimise any impact in future. 

Then there was the time the execution of an application patch with a slight (and undiscovered until then) code imperfection brought down an entire production server, that just happened to be attempting to process some rather critical financial workloads from various systems at the same time. In truth it was a completely freak event that had happened on a combination of very old systems that were considered flaky at best.

The systems were brought online quickly enough but tying together the results of the various processes that may or may not have worked took hours and hours of querying with lots of manual updates. It might sound terrible, but because of the coordinated effort between different teams and individuals it had actually taken a fraction of the time that it could have done and not only that, data was confirmed to be 100% accurate.  

Want another corruption tale? Why not. How about the time a system database on a 2005 instance went all corrupt rendering the instance completely useless? Of course it never happens to a system that nobody cares about, no, yet another critical system. The operational teams went to plan B very quickly but even better, a solution that avoided large restores was implemented quickly so the downtime, although handled well, was still significantly reduced.

Looking back there's plenty more, I think it's fair to say that disaster is a real occupational hazard for database professionals. And yet despite being labelled "worst days" I actually look back on them with a large degree of genuine fondness. 

You see disasters are always going to happen when databases are involved, it's a fact and how we deal with them at the time is equally as important as how we learn from these events. In each of these examples a recovery plan was in existence for both technical and operational viewpoints, as well as that everyone involved knew what was happening, what to do and critically not to add any additional pain to the situation but to arrive at the solution as quickly as possible.

Learning from these events meant asking the right questions and not taking a viewpoint of blame. How can we prevent this, how can we make a recovery process more robust and what can we implement technically and operationally to improve our response times and also critically, when can we schedule the next Disaster Recovery test? 

Worst days? In one sense most definitely yes. Nobody wants to be in the middle of a technical disaster that's going to take hours to resolve but a solid recovery plan, collaborative effort to a solution and an open forum to analyse and learn from the event makes these memories much less painful!

Building a DevOps culture

In my last post I described some of the reasons why organisations fail to implement a successful DevOps methodology. Often there is a misunderstanding of what actually DevOps is but often existing working cultures can be the thing hindering progress.

From webopedia: "DevOps (development and operations) is an enterprise software development phrase used to mean a type of agile relationship between development and IT operations."

Being a consultant I often work in the "space" between different technical roles which gives me an ideal view of how well companies are utilising DevOps practices or sometimes, where they're going wrong.

For me the most crucial part is building strong collaborative working relationships between teams. In the database world this isn't just between developers and DBA's but also any team who in some way interacts with SQL Server. This includes support teams, testers, release and change teams, architects and technical management.

How we seek to build these relationships is pivotal. As I mentioned in the last post, forced collaboration is a common approach that ends up being counter productive. Organisations in their rush to build a DevOps culture can be too rigid in how they look to develop increased inter-team working, often over-formalising and creating very process driven activities. 

Instead organisations should look to encourage rather than dictate and I've seen many successful ways that this achieved, often in a management hands-off style that lets technical teams freely integrate and discuss innovative ways of doing things in much more open forums. When consulting with database professionals we explore common pain points that are shared between teams and how solutions to which are in some way, arrived at by leveraging one another's expertise. 

I say in some way because often the the issue isn't strictly technical but comes down to process instead. Release and change management are great examples of this; developers naturally want to make more and more frequent changes to systems which is against the better nature of traditional DBA's. 

Understanding each others objectives is the first stage of developing a collaborative effort to build upon existing processes (not work around them) to help each other achieve common aims. The word I never use is compromise and it should never feel like that. All involved should feel like they are building solutions together and not feel like that are to required to relinquish something to get there.

This is a common side effect where the approach to DevOps is unbalanced where teams are becoming involved at different stages. Instead organisations must involve all parties as early as possible and avoid maintaining those traditional silos. 

Increased cross functional teams means that teams work much faster together and this effects both development and problem management. One of the obstacles for moving to a platform of more frequent deployment is the risk of introducing much more failure to production systems. Done correctly, a DevOps methodology negates this by increasing the stability of systems and reducing the complexity of releases to environments which in turn makes faults much easier to not just to recognise but also rapidly back out from. 

It sounds like a case of everyone wins and typically I would honestly agree with that statement. A DevOps methodology has benefits for both teams and businesses alike; better employee engagement, much more personal development opportunities, increased productivity, more stable environments, more frequent enhancements and improved response times to defects.   

Issues that are preventing a DevOps methodology from being implemented can be often be resolved from a cultural perspective. A key starting point for organisations is to encourage collaborative relationships early on and for teams/individuals to seize the initiative and start talking about common pain points, desired solutions and building shared knowledge.

Wednesday, 8 August 2018

Reasons why DevOps implementations fail.

Over the last few years we have seen a monumental rise in the number of organisations adopting a DevOps working culture. This trend shows no signs of slowing down whatsoever and whilst many are now realising the benefits of these working practices many are also struggling with the adoption and in some cases it's either stopped completely or not even started.
There's a number of reasons of why this happens and I've seen some common causes, which is actually a good thing because we can recognise where these are occurring or even prevent them before they start to become a problem.

No clear direction.
It's very difficult to drive to a destination if you don't know where you're actually going (trust me on this, I've tried it). This is obviously very reasonable advice however many DevOps projects fail because of a lack of direction. It's actually a common issue with many buzzing trends, particularly in IT where organisations rush into a technology stack or movement just for the sake of doing it. Inevitably this often always leads to failure.

Organisations need to fully understand what a DevOps culture is, its objectives and its close relationships with their existing business processes and frameworks. A common misconception is people often viewing DevOps as a direct replacement for ITIL when in actual fact it's more of a cultural change built on top of ITIL principles. By fully understanding DevOps the benefits of adoption become much more viable and ultimately the path to it's success becomes much simpler.     

Adopting a silo approach to DevOps.
I often see individual teams being very successful in implementing components of DevOps practices only for other teams being behind in terms of adoption and/or understanding. The classic case is the Developer and DBA; the developer is pushing for much more frequent releases (a realised benefit of DevOps) but then the DBA, who perhaps isn't on board, is then trying their best to slow down all of these changes to their production databases. In the words of Bon Jovi, "we're half way there".

This lack of cohesion or a shared direction can result in a significant bottleneck and the DevOps practices start to creak a little. Then other unintended side effects start to creep in, such as blame and finger pointing (some of the things that a healthy DevOps culture seeks to eliminate) and then it can all start to fall apart. 

DevOps for databases is one particular area that is so heavily reliant on lots of people in different roles working together in a collaborative manner. An organisation must identify this and start to encourage teams to engage and build with each other in the early phases of a DevOps implementation, but organisations also have to be very careful in how they seek to develop this collaborative culture...

Forced collaboration.
I believe collaboration underpins the entire DevOps methodology so it makes perfect sense for organisations to work towards developing much closer working relationships between teams however organisations can also over-formalise things, even making the activity seem very process-driven which often leads to much less buy in from individuals, even entire teams.

This causes obvious problems, not least the silo approach mentioned in the previous point, so organisations have to find the balance on being almost relaxed in how they let relationships build and at the same time provide a certain degree of steer. This isn't as easy as it sounds and it is certainly reliant on strong leadership. In my experience successful implementations have been led by those that enable positive change rather than those who try to dictate it.
Rushing into new tools.
New tools are awesome, fact and in a DevOps ecosystem there are so many to pick and choose from that each bring new ways of doing things and [hopefully] improving productivity. The advantages are great, without a doubt but often tools can be implemented way too early without a focus on the underlying processes. This can significantly reduce the effectiveness of what a particular toolset/platform is trying to achieve; a release management tool for example won't improve a change/release process if the process is fundamentally flawed.

The most successful DevOps implementations focus on people and process first, leveraging the strengths and principles of existing frameworks and building strong collaborative working practices. New tools are going to be an important factor of a system designed with DevOps principles in mind but they need to be leveraged correctly and for the right reasons.


These are some of the common pitfalls that I've seen occur during DevOps implementations, many of which are deeply rooted in the working culture, not the technologies. There is undoubtedly so much to gain by adopting the principles and often it requires organisations to step back and observe their working practices first and spend time exploring the benefits of working with DevOps.

In my next post I'll cover how to address some of these issues and offer some insights into the benefits of building collaborative relationships between data professionals.

Sunday, 11 February 2018

The DBA and the Data Platform





I've been consulting now for well over a year, it's been a fantastic experience so far; I've had the pleasure of meeting and working with some truly incredible people and it's also been amazing to be helping organisations get the most from their SQL Server platforms.

The last 12 months have been extremely interesting considering the current technical shift towards a seemingly endless list of new platforms and capabilities. For many organisations this shift has fundamentally changed how they do things (or going to do things) and many are going through their own journey of digital transformation understanding these new technologies and how to leverage them to improve their existing processes.

Right at the centre is data, nowadays considered to be a companies greatest asset, so how we use data or to be more exact how we extract more value from it is a frequent aim of these transformation projects. As a result the data professional roles within an organisation have changed too and many teams and individuals are having to broaden their technical skill set to design, deliver and support these new capabilities.

There is no truer example of this than the role of DBA but yet the question still remains on which direction(s) they should be focusing their future on? By that I mean development; which skills should they be learning, what certification, what products, platforms etc? It's easy to see why it's confusing; there are lots and lots of new technologies available to us but that also means there is no consistent path for administrators to set out on. 

It was easier in the not so distant past; the DBA role tended to come in two distinct flavours; development or operational, but now the line between the two has became far less apparent and in many cases non-existent. The DBA role has without question become much more rounded containing core elements from both "sides" but it goes beyond that. Modern approaches to database administration have reduced the footprint of the operational side of the role so a more DevOps orientated DBA is an essential part of the roadmap, for organisations and individuals alike.

This shared roadmap is actually an integral component of an evolving data platform. I've worked with many different organisations at different stages of development but a key component in each has always been a collaborative working practice between data professionals and I'd say the success of a platform depends on it. Over the past year I've worked alongside many different teams; architects, DBA's, developers, release analysts, change and testing specialists, when working together they all have a great influence of the effectiveness of a data solution. 



Whilst it's fair to say organisations should be encouraging this type of working practice it is very much dependent on individuals to pursue and implement it. For DBA's this means engaging with other teams much more frequently, understanding their skills and processes but also adding their own expertise and skill set to produce a much more co-operative function. 

It might not fully answer this common question of what next for DBA's, there's a lot of factors involved, not least your personal career aspirations but it does show how when looking for your next phase of self development the paths that you need to take could be very close to your current role. 

Your skills as a DBA will always be of paramount importance towards availability, security and performance, but by widening your technical scope by working much more closely with those around you not only improves you as an data professional but also the overall effectiveness your data platform. The starting point is to seek out these opportunities; start building relationships, start knowledge sharing and developing new ideas and different ways of working. 

In many ways the data platform is a reflection of the people tasked with shaping it. If we choose not to broaden ourselves, learn new ideas then at some point we come to a standstill, especially in this rapidly evolving technical landscape. If that happens a data platform stops growing, not in terms of data quantity but most certainly in terms of both capability and perhaps most critical of all, value. 

This puts a big emphasis on self-development but we've always had that, it's nothing new. The key is to seek out those opportunities, start close and work with teams that you will probably already have a working relationship with. As these relationships grow so does the collaborative skill set and as a result business processes improve and technical solutions get faster and more productive.

This also cuts down reactive measures, less time fire fighting means more time delivering enhancements and of course, more learning. Any learning process shouldn't have a cut off point, the more we broaden our horizons the more the effective we become and in turn, our products, our services and our platforms keep on evolving.

This is the true nature of a technical platform and we as data professionals play such an important part not just in the support or development of them but towards shaping their growth, effectiveness and their value. Perhaps the key to all of this is realising the value of our own development alongside it.   

Wednesday, 7 June 2017

Why I became a SQL Consultant.

This post is kind of a follow up to my recent "How I became a..." article that I wrote about my early days of a DBA. You can read that post and plenty of others from database professions on this page provided by Kevin Hill. The posts are seriously good reads and although needless to say, I cannot recommend them enough!

This post though is centred on the biggest and most difficult decision that I have ever had to make about my career and that is to quit working as an employee and become a consultant. I've been wanting to write this post for some time and because lately I seem to be getting asked at least half a dozen times a day why I made this decision now seems an ideal time to get it out there.

Firstly I need to paint a bit of a picture of my situation prior to making this change. I was in a great place; brilliant role, fantastic manager (you're welcome), using lots and lots of technology, pretty much everything you could want. Which of course does beg the question, why did I decide to move away from this?

Don't get me wrong, I'd had this sort of vague aspiration for a long time, largely because of the amazing people out there doing seriously great things already and whilst they were (and still are) a big influence on me it wasn't just a case of "hey I want to do that" and off I went. 

I did a lot of thinking about what I enjoyed the most from working with SQL Server and how I could take that forward. For me the most appealing thing has always been helping people get the most from their SQL Servers. As a DBA this can take many forms; it could be installing and configuring SQL Server as part of an entire enterprise solution, or it could be an upgrade to a newer version, performance tuning, bug fixing, implementing HA/DR, training, mentoring etc etc. The point being that whatever I was doing or whoever I was working with the motive was always the same, to make a particular SQL platform better in some shape or form.

Which is exactly how I think as a consultant.

In fact as a consultant I operate in a very similar way only now it's on a larger scale. I still have exactly the same motives as before but whereas in a permanent role I was working within a single instance, now my driver is to help as many organisations as I can with their data platforms. 

In many ways it's like scaling out a database (bear with me) but rather than distributing data I'm distributing my services and (hopefully) doing more work. Of course that comes with its own set of challenges which I will go into on another post.

Since I made the move to consultancy the #1 question I have been asked is why, hence this blog post and the answer is always the same. I've always loved working with SQL Server, with different people and different systems etc so becoming a consultant was a way to expand on that and reach out to even more people and help them achieve what they want with their data platforms.

If you'd like to know more about the consultancy and the services we provide then you can follow this link and by all means if you have any questions, feedback then by all means get in touch, be great to hear from you.

Thursday, 2 February 2017

Torn Pages and using DBCC PAGE

Today's post is about torn pages and using the undocumented DBCC PAGE command within SQL. Occasionally you might find you need to run this command and in this example it's because I have received an error message that looks a bit like this:

SQL Server detected a logical consistency-based I/O error: torn page (expected signature: 0xaaaaaaaa; actual signature: 0xa9aaaaaa). It occurred during a read of page (1:25756919) in database ID 9 at offset 0x000031209ee000 in file 'F:\DATA\ScratchDatabase.mdf'

This is an error that has been picked up on one of my test systems and indicates that SQL Server has detected a torn page, that is a page that has been incorrectly written by SQL Server and possibly indicates a problem in the IO subsystem.

The problem here is that whilst we know the database and the page where the error has occurred we don't know the specific table the page belongs and importantly what type of page is in error. The reason why the page type is important is because this will drastically impact our recovery process but the first thing we will do is check a system table to see if any other page errors have been reported:

SELECT * FROM msdb..suspect_pages


database_id

file_id

page_id

event_type

error_count

last_update_date

9

1

25756919

3

1

2017-01-29 08:24:05.760

Here we can see that only one row has been returned, the event_type refers to a torn page, here's the full type descriptions and a link for more information:

1 = An 823 error that causes a suspect page (such as a disk error) or an 824 error other than a bad checksum or a torn page (such as a bad page ID).

2 = Bad checksum.

3 = Torn page.

4 = Restored (page was restored after it was marked bad).

5 = Repaired (DBCC repaired the page).

7 = Deallocated by DBCC.


To identify the table in error we will use DBCC PAGE. This command takes a database id (or name), file id and page id and will return the actual page information back to the results window (we need to enable trace flag 3604 to do this though); for my example I would use the following command:

DBCC PAGE (9, 1, 25756919, 0)

This returns a lot of information to the results window (I can use the option WITH TABLERESULTS for a grid view) so I have removed a fair chunk of it and highlighted the two parts that I really need.


PAGE: (1:25756919)
 

PAGE HEADER:


Page @0x0000000728010000


m_pageId = (1:25756919)             m_headerVersion = 1                 m_type = 2


m_typeFlagBits = 0x0                m_level = 0                         m_flagBits = 0x104


m_objId (AllocUnitId.idObj) = 266296m_indexId (AllocUnitId.idInd) = 256


Metadata: AllocUnitId = 72057611489902592                               

Metadata: PartitionId = 72057607060652032                                Metadata: IndexId = 20

Metadata: ObjectId = 567673070      m_prevPage = (1:25756918)           m_nextPage = (1:38369560)

The first is Metadata: ObjectId = 567673070 which refers to the table that this page belongs to. The second is Metadata: IndexId = 20 which tells me as it isn't 0 (heap) or 1 (clustered index) that the page error is within a non-clustered index and I can use the following code to identify the actual index where the problem lies.

SELECT * FROM sys.indexes WHERE index_id = 20 AND object_id = 567673070


This returns the suspect index and to resolve we can very simply drop and recreate the index effectively reallocating the pages, suspect and all. This is potentially much quicker than performing a full restore and I have lost count of the times I have seen people resort to a full restore without checking the actual type of page corruption that they're attempting to resolve.


Another option here is that because we know the effected table we can run a DBCC CHECKTABLE command (rather than a DBCC CHECKDB on the entire database), passing in the full table name as a parameter; to get the name of the table we use our ObjectId again that was returned from DBCC PAGE with the following:

SELECT OBJECT_NAME (567673070)

This would return the table name that we can use to check the structure to see if our page issue has been resolved:

DBCC CHECKTABLE ('dbo.ETL_TestLoad) WITH NO_INFOMSGS

DBCC results for 'ETL_TestLoad'.
There are 341465 rows in 93513 pages for object "ETL_TestLoad".
DBCC execution completed. If DBCC printed error messages, contact your system administrator.


Using DBCC CHECKTABLE in this scenario may have proven that our torn page has been resolved but you should always, without question, perform a full database consistency check to look for further errors. In the event of IO issues this could easily return more and more page problems, maybe irrecoverable data page ones at that. Any page read/write errors must result in both a full consistency check and a check with the storage people to see if they can see any underlying issues.



For some further reading check out my post: How to consistency check large databases

Thursday, 8 December 2016

SQL Server vNext CTP1


My last post was about the new adaptive query processing family of features coming to SQL Server. It’s first introduction is the Memory Grant Feedback functionality now added to our execution plans…all good stuff I promise you.

You can enable this functionality now if you have the CTP1 ofSQL Server vNext. You read that right, this new technology is there for you to play with right now over at the evaluation centre!

That isn’t all there is to it either; amongst other things you can now scale out Integration Services meaning you can execute packages over multiple machines using a master/worker(s) scenario. I really like this having been in situations previously where this kind of functionality would have been really useful!

SQL Server is changing (would evolving be OTT?) and getting on your hands on the CTP (and its future editions) is a must. Personally, I'm getting other teams involved (we are in DevOps era right?) and getting heads together on how we can use the new technology and take advantage of it. Seriously! It's a really innovative time for SQL Server and personally I don't think there is a better time to be working with the product.