Considering Long-Term Data Strategies – Why a Database Is for Life, Not Just for Christmas

By Dave Stokes, Technology Evangelist, Percona

In 1978, the UK charity The Dog’s Trust started a campaign to raise awareness that a dog is for life, not just for Christmas. While it might make sense to get a dog for a special occasion, there are long-term consequences to that decision, and some of them can last for years.

The same is true for databases. These platforms are incredibly important within your application, so once they are in place they tend to be there for years, even decades. This can be hugely beneficial if you make the right choices at the start, but can lead to significant problems or additional costs if your assumptions are wrong.

While you might be looking at the latest and greatest new technologies for your applications, how can you know that you are making the right decision when you implement them? And what are the longer-term consequences around picking databases that you should be aware of?

What are your choices?

There are plenty of databases available – according to DB-Engines, there are 420 database and data management products currently in use. There are those suitable for specific niches like time-series or vector data, and those that are general purpose databases suitable for multiple uses. Whatever application you are putting together, there are a plethora of options available.

Alongside this, all the database providers are looking to expand the number and types of workloads that they can support. Relational and non-relational databases alike are adding support for newer use cases like JSON and vector data. So, what should you choose and how should you make your decision?

The first consideration should be around your team and what they are familiar with, and whether they will need any support or training around the project that you are embarking on. Is this something that you will extend your existing approach to, or will you have to adopt a new technology to support the deployment? If this does involve a new technology, you will have to ensure that you have budgeted for additional training and education support over time.

For new deployments, you should check how much any existing products or open source projects you have deployed can support the use case. For example, vector data is a new area due to the growth of Generative AI projects within businesses, and it requires specific functionality. However, databases like PostgreSQL and MongoDB have added support for vector data and search, so you may be able to use a more general purpose database that your team is familiar with rather than having to implement a new technology partner from scratch.

The key aim here is to make sure that you are picking the right technology for the right reasons. We have all encountered software projects where something has been implemented using a ‘new and shiny’ technology option where an older, tried and tested option would be a better fit. This is an example of someone looking to carry out a project simply so they have experience with that technology, so that they can reference it on their CV in the future.

The importance of open source

The second element here should be to look at your costs and budgets. Are you looking at proprietary software or open source projects for a particular service, and how much will those options cost over time? This is not as simple as proprietary software being more expensive than open source software.

Total cost of ownership (TCO) describes the long-term cost for running a particular software asset or component within a project. By looking at the overall lifetime value and cost for a project rather than the initial investment alone, you can get a better comparison between products and make better decisions on what solutions to choose. This includes looking at the additional costs that might be associated with deployments such as consulting and support costs, rather than just software licenses.

Using TCO, you should still see lower costs with open source software compared to proprietary software. Open source software should be available for free to trial and test, and to use in production, which does avoid a substantial cost. However, you should probably consider getting independent advice and support for any solution that you implement to run production workloads. Alongside this, you have the choice to run in the cloud, on a cloud service that you manage, or in your own data centre environment – getting some advice early can help you implement the right kind of environment and avoid under or over-estimating the resources that you will need.

Going back to the dog analogy at the start, open source has been described as “free as in puppy” by Scott Hanselman in the past, where there is a cost associated with supporting open source software. However, it does produce much more value than that management overhead.

Whatever your approach, looking at your deployment model can help you when it comes to improving your efficiency in running your application infrastructure over time. However, don’t underestimate the impact that looking at your query design can have too. As your application evolves and you get asked for more data out of your systems, the queries that you put together at the beginning may not be fit for purpose over time. Similarly, you may find that you don’t have the right indexes in place, or are actually gathering too much data, which can affect your query performance.

Carrying out an audit of your current query requirements and whether they still meet your needs can show up potential problems. Updating your query design can help improve efficiencies and mean that you can avoid a platform update or need to use more cloud resources, which can then have a significant impact on the cost to run your service over time.

Stick or twist?

You will also probably inherit existing projects where someone else went through this exercise to determine their enterprise IT architecture. Hopefully, you will have services in place that make life easy and that fit their roles properly. Equally, your new project may carry on successfully for years. Either way, you will eventually have to evaluate whether to stick with that existing architecture or replace it over time.

As the technology market evolves and organisations want new functionality in their applications, the technology you have relied on for years may eventually become out of date or not be the right option any longer. Software will hit end-of-life status and be outside support – for example, the open source database MySQL will see version 5.7 hit end-of-life status in October 2023 while version 8.0 is predicted to reach end-of-life during 2026. Other databases will have their own dates to track where they may need to be replaced.

Even if it is not end-of-life, you may find that your technology stack is not delivering what you or the organisation wants. At this point, you will have to consider whether the cost of re-implementing the system is worth the effort, or where you can look to replace specific components that are supported. Your decision process here will be similar to building a new project, but there are some other variables that you will have to consider. For straight replacement projects, will you be making a change to keep things running in exactly the same way as before, or will you have the chance to add more features that users want? Straight swap projects can be harder to get signed off, as they cost more but at no additional value.

The alternative is that you look to make a bigger change – can you implement new functionality and new infrastructure as part of a combined plan? This can work in theory, but in reality it can lead to potential problems. Breaking down potential issues into smaller elements keeps teams focused on specific goals and deadlines, and removes some of the risk that can be found in larger technology projects.

Over time, you will have to look at your application infrastructure and decide on what you think is the best approach. The important element here is that you make your decisions based on the right priorities.Dave Stokes Percona

Dave Stokes is a Technology Evangelist for Percona and the author of MySQL & JSON – A Practical Programming Guide. Dave has a passion for databases and teaching. He has worked for companies ranging alphabetically from the American Heart Association to Xerox and work ranging from Anti-submarine warfare to web developer. www.percona.com