My one advice for every software developer…

Greg Balajewicz
6 min readMar 22, 2021

--

I have one advice, that can change the course of the professional life of any software developers, juniors or seniors. It applies some architects and technical managers too. Especially to those, who have not had the good fortune of suffering through computer science or equivalent education.

:: the advice

My advice — Learn how SQL database systems work! Learn it from the ground up, from the way they handle the file system, to the way they optimize transactions. If you are really brave, and understand the pitfalls of distributing computing, then brave on all the way up how these systems handle “high availability”.

:: why take the advice

One may object to the focus on SQL database systems, since they may seems like dinosaurs destined for extinction. Aren’t NoSQL and Graph DB taking over? NO!

Aren’t NoSQL and Graph DB taking over? NO!

If you think this way, please, take my advice — study how rational databases work. I have seen my fair share of systems, teams, developers and screw ups. These new database technologies are nothing but specialized versions of SQL databases. Mostly dropping what is not needed for speed or ease of use, adding what is necessary for specialized use. In fact, you will never understand these new database systems, if you do not understand SQL systems. You will always be puzzled why the dinosaurs are still around, you will never be able to discriminate between them in your particular use case, and eventually choose the wrong one for the job.

:: relational database systems, a crush course

SQL database systems, otherwise known as “relational database systems” are the natural starting point for storing data, because the world is relational. World is made up of objects and relationships thus we represent the world, in terms of objects and relationships. Relational databases do that very nicely.

For example: Wheels belong to a car. A car is made of up 4 wheels. A car has an engine, sometimes has a driver, and passengers. Bank Account must be linked to a Customer. A customer has to have certain properties to be valid, like name, contact information, identifying information etc.

The world is relational, therefore we model the world in relational terms. This is why relational (aka SQL) databases are the starting point, the mother of all databases. I would say, in the current cloud terminology, that a relational database, is a “relational PaaS”. Using this platform, you define the objects(Customer, Account), define constraints (Customer must have a name), defined relationships (Account must belong to a customer) and these platforms ensures that your data is always consistent and follows the rules you so eloquently defined in your data model.

But like any such system, it comes at a cost. When you create a new Customer, a bunch of checks have to be performed — Does the Customer have a name defined? Contact info, identify info? Those checks take time, even if fractions of milliseconds.

:: relational database systems — the things they do

When an Account is created, and linked to a customer, the system has to verify that that customer exists. But how to accomplish that? Shall the system read the “customer file” from disk, on line at a time, and see if the customer is found? Certainly reading the file from disk is slow, so that is a no-go. Why not keep the file in memory? Reading from RAM is about 92,000X faster than from disk, so that’s good! But what if my bank is successful and has 100 millions of customers … do I store every customer in RAM? Do I heck every customer one by one? That could take a long time… and a lot of RAM. 10GB if each customer took just 100 bytes (rather optimistic size)

Indexes can help here. Why store the entire customer record in RAM, if all we need is the customer identifier. If we sort that list (and apply a bunch of optimizations that only geeks can understand) rather than having to do check 50,000,000 customers on average to find the right one, we only have to do 27 checks. Indexes are powerful! Assuming our Customer Identifier is a 10 characters string, our RAM footprint drops from 10 to 1GB. Pretty cool!

Can we do even better? Yea, the little geek inside me says. Why should the Customer Identifier be a 10 characters string, why not a number? Or perhaps we can add a number identifier. We can drop from 10 bytes, to perhaps 4, thus lower our RAM requirement from 1 to 0.4GB. That is cool too. In addition, this drops the cost of the single customer identifier compare since comparing integers is much faster than comparing strings — a very useful fact to remember when designing any database

Nonetheless, even behemoth of a server’s RAM, that only oil sheiks can afford, will run dry, and the system will have to decide what to keep in memory, and what to not, rendering some actions lighting fast, and some — at unexpected times, of course — snail slow.

Enforcing the relationships and constraints comes at a cost, and database vendors compete on optimizing that cost. We will not get into further complications like locking, I hope this illustration is sufficient to make my point — SQL databases systems enforce data integrity at a cost of performance, and hardware.

:: NoSQL database systems —the things they do

NoSQL databases for example, are specialized versions of their SQL counterpart, that throw out certain functionality, for the sake of speed. For example, creating an Account, without any integrity checks what so ever; just append the record to the end of a file. As we know from messaging solutions like Kafka, such operations are extremely fast.

:: SQL or noSQL ?

There is a place for both in system design. However, without a very good understanding of the differences, poor decisions will be made. You may find that properly designed SQL database may give you noSQL performance in certain areas, thus freeing you from over complicating your system.

:: the root of all evil

I have done a lot of consulting in my career and been called to deal with horrible consequences of poor choices often. Scene is set for developing PTDS! I have seen NoSQL databases blindly adopted for relational data, and much that was free in SQL databases, implemented in-house with catastrophic fervor and enthusiasm. I have seen many people argue for noSQL databases for speed, without much understanding of why SQL systems are slower and what is lost in the choice. I have sat at meeting with calls to replace a SQL database with noSQL database as a solution to issues experience with the current SQL database. However, 99% of performance issues I see, are related to poorly used SQL systems, not the system it self. Too often by people that never learned how SQL databases work and thus used the technology poorly. In almost all cases, the issues could have been avoided. In many cases, the issues can be fixed. Many developers most work with relational data and system, yet few understand how they work resulting in disastrous decisions and horrible code.

::the unpopular fix-all solution

There is no fix-all solution I can give. I cannot offer design principles, pitfalls to avoid that would be sufficient. There is only one real advice — take the time to learn how relational database systems work, from the ground up.

Pick any major system, and take the time to learn. You can pick one that you are likely to work with most — but this is not critical, since most of what you learn will apply to all. If you can, work towards a developer-level certification exam in one of those system.

I know full well, that for a new software developer, it may seem like an overwhelming task to learn all the needs to be learned. I think starting with data, how it is stored and handled, because it is so central to much of what we do, is a real good starting point. Once you understand how relational databases will work, everything else will come a lot easier. NoSQL databases, GraphQL will require a lot less study. The purpose of tech like REDIS and memcached will make a lot more sense, and, hopefully, result in a healthy fear of caching in general. Code-first approach may lose its luster for production code. Database design review may not seem like such a drag anymore.

Learn how SQL systems work, and you will learn to understand noSQL systems in the process

--

--

Greg Balajewicz
Greg Balajewicz

Written by Greg Balajewicz

Software engineer and an entrepreneur with 20+ years of financial and video game industry experience. A minimalist and non-dogmatic pragmatist.

No responses yet