Friday, 10 February 2012

SQL/PostSQL/NoSQL

As one addicted to database technology since dBASEII, and having used just about everything since then and seen promising companies and their tech-children come and go, and also having read E.F. Codd, Chris Date, Fabian Pascal and Michael Stonebraker, to name a few, I like to think I know my way around relational databases, and a lot about pre-relational databases (dBASE, FoxPro, etc.), and a little about post-relational databases (Cache, etc.).

I've read a few things about NoSQL (technology or movement? That is the question!). As my colleague Stuart McLachlan rephrased the question: If it's a movement, the real question is whether it's a religious or bowel movement. The piece that prompted his excellent witticism can be found at:


The author does make a few good points, among them the difficulty of bending traditional SQL systems to handle massively scalable sub-second retrieval of the sort required for huge systems such as what might be found at Facebook, Amazon, Google, etc. Even in organizations much smaller than those, the problem remains. 

What seems to be ignored in this discussion, however, is the distinction between transactional databases (OLTP) and retrieval/analysis (OLAP) databases. In the former, accuracy and timeliness are of paramount importance. In the latter, timeliness takes second place to performance, and it is generally accepted that such systems will not deliver instantaneous accuracy, but rather snapshots of how the data looked at some previous moment. The granularity of the moments of interest is decided upon or settled for by some combination of hardware costs, need for close-to-now data, and so on.

I think that the NoSQL people blur this distinction purposely. In the case of Amazon, where there is a lot of buying and selling going on, the assumption is that inventory is, by and large, unlimited. Therefore the need for sub-second retrieval trumps the need for accuracy. Amazon can always re-order. 

That is not always the case. For several years, I was the database developer/DBA and programmer for a company that sold event-travel packages, and did deals with U2, Madonna, the Rolling Stones and so on. The basic scenario went like this: we bought several hundred tickets to every event in a tour, then bundled them with travel, hotel, car rentals and pre- and post-event parties. In this type of system, inventory is fixed. There are only on average 400 tickets available for any given event, and only so many hotel rooms, and so on. It is therefore paramount for the sales people in all four offices, plus the customers booking on the web site, to know exactly how many tickets remained in inventory, at all times. The acceptable time lag is essentially zero. 

The paean to NoSQL cited above ignores another critical factor: the rapid descent of solid state disks (SSDs), which are now available in sizes up to 10 terabytes (TB). A company called Fusion-io offers systems combining up to four of these SSDs for a total of 40 TB. That's large enough for lots and lots organizations with equal needs for performance and accuracy. The performance specs are awesome:
The 10 TB ioDrive Octal offers:
  • More than 1.3 million IOPS with 6.7 GB/s bandwidth
  • Extreme capacity density and lightweight footprint delivering up to 20 TB of ioMemory in a 1U server
  • Process multiple terabytes of data in a fraction of the time
  • Optimized performance for read-heavy environments such as data warehousing, scale-out architectures, supercomputing and research applications
  • More work per unit of processing thanks to Fusion's cut-through architecture, which delivers performance increases as CPUs become more powerful
  • Consolidate data-intensive infrastructure by an order of magnitude
  • Reduce failure points and system complexity
  • Reduce power and cooling costs of expensive, data-intensive server farms

Granted, these systems are not exactly cheap, but they are within the budget of many organizations. 
In short, rumours of the death of classic relational databases are premature. And in my opinion, the intellectual validity of the arguments in favour of RDMSes remains.


For more information about these SSDs, visit  www.fusionio.com. Regrettably, I own no shares in this company.

I do not mean that there is no place for NoSQL databases, but rather that their purpose is retrieval rather than transactions (Insert, Update, Delete). In fact, it could well be that the best system design might be a combination of the two approaches: use an OLTP database for transactions and a NoSQL database for retrievals. The problem remains, How and When to update the NoSQL database to reflect changes in the OLTP database. The answer to that question, I think, will depend on several factors: size of the databases, frequency of transactions, permissible lag time and so on.
A.


No comments:

Post a Comment