In Memory Database is a fascinating field for a programmer, for an architect, and for technology leaders. Let us try to understand in this article, what is In-Memory DB and why are these so useful.
As a programmer, designer or architect, whenever I work with traditional databases, one major area to be conscious about is, that database access should be designed carefully. This is because traditional database system store everything on disk, and hence any access or update to this data could be one of the slowest operations in application flow. Undoubtedly, a lot of research and improvements have been done to improve the performance of disk based database systems with improved storing, searching and retrieving logic. Still, disk access, being mechanical operation, always have limitations. And this is one of the important reasons to be fascinated about in-memory database systems.
Visualize a system where all data (or all data required for curernt context) is sitting in memory. Which means, we can expect a lot more efficient and faster access to data. Database software can have much simpler logic to manage the data, as there is no need to manage the loading/unloading of pages in memory. There is no need to work on time taking locking mechanism to safeguard the data on disk, as access in memory is much faster and chances of conflicts are extremly small which change the way we design database system algorithms. It also means that we can exploit the best available CPU power (which is increasing continuously) to process the data available in memory, having no limitation of disk access. This could also mean that there is no need of another layer of caching system. Rather, database itself will be working like in-memory cache. There are many more benefits if we design the system considering in-memory database system.
Let us understand what is In-Memory Database System (IMDS). It is a DBMS which maintain all data primarily in Main Memory. Data will be loaded in memory even if it is in GB or TB. A few highlights:
• With 64 bits computer architecture, systems are capable to address 16 EB of data (1 TB * 1000 * 1000). 82% of the enterprise application’s databases are below 1 TB, growing with an average speed of 10% per year, which means that In-Memory database systems can cater to most of the applications now and in coming future as well.
• Does not need write or read to Disk, hence no dependency on mechanical parts and their performance limitations. When all data is stored in single address space, it reduces the complexity of storage algorithms, no need now of loading and unloading the pages in memory
• Much faster than Disk based traditional DBMS. Having all data in memory means that data is available at our finger tips, only microseconds or nanoseconds away.
• And it supports ACID properties of Database, including D (durability)
With above, it is also important to understand that what IMDS is not, to break out of various myths.
· No. IMDS is not the traditional Disk based DBMS with just having all data loaded in memory (as cache work). Rather internal design, algorithms are quite different and hopefully much more improved than traditional database system leveraging upon the flexibility to assume whole data in memory, and no disk access overhead. Hence, of course, IMDS are not a caching technology, rather a full fledge DBM solution
· IMDS are not volatile, rather these can support ‘D’ of ACID with perfect durability and with advantage of various flexible durability options.
· IMDS are not the embedded database only. These work good for embedded applications by providing small foot print, however, these are equally efficient and rather better in some aspects for large time critical applications and can work in client – server architecture.
· A common myth is that IMDS may need a long time to populate the in memory store on startup. Actually, it is not.
To have a better perspective of functioning and technology, let us put few of the architectural attributes of IMDS.
• IMDS can work both as Embedded Single Process or in Client Server Architecture model.
• IMDS is Partition Aware. Normally, vertical partitioning is done by Normalization kind of design strategies. IMDS supports horizontal partitioning by breaking the table data by rows. For example, one of the criteria to break the rows can be demographic data.
• IMDS supports ‘Shared Nothing’ Architecture, which ensure Useful for high availability and fault tolerant design.
• Mostly better Data Structures have been used in IMDS, like T-Tree rather than B-Tree
• It needs simpler concurrency control as locks need to be maintained for lesser time having all data in memory.
• With IMDS, data is being stored at various nodes with multiple copies of same data. It enable scalable infrastructure model, as new nodes can be added easily, data will be replicated to this new node as per data design.
• Disaster recovery is also easy as multiple copies of data on different nodes is available by design.
Considering all of above, here are few of the advantages of IMDS:
· IMDS can provide extremely fast transactions processing by overcoming limitation of traditional database to read write data using mechanical operations on disk. As per POC done by ‘McObject’, reads are 420 times faster and writes are 4 times faster than disk based operations
· These are highly scalable with horizontal and vertical scaling
· These can ensure high availability with replicated data among multiple nodes
· Can also support highly fault tolerant design with active – active, or active stand by strategies
· These provide support for SQL Standards
· These do supports most of the database connectivity standards i.e. JDBC etc
With so many good things to write, article won’t be complete without mentioning few of the challenges with IMDS, which need more research in coming years. These are:
• Although IMDB designs are already evolved enough to support durability, but these comes at the cost of synchronizing some data with persistent storage using checkpoints, transaction logging or some high volumn of data transfer across the nodes
• Future direction could be to improve SSD technologies (solid state drive) and use these to store the data for durability purpose. These are slower than RAM, but much faster than disk.
• One more bit problem
• In Memory data storage is limited by the total memory space computer architecture can handle. As of now, it is limited to 16 EB. However, if data goes one bit behind this limit, it will pose a challenge which is bound to come in future.
• SSD or PCM (NVRAM) kind of technologies can be used to store data and would help to avoid ‘one more bit’ kind of problems in future.
I hope this article would have helped you to start with IMDS by giving an overview. Please refer to a detailed paper published by us here.
Let us close the article by listing few of the IMDS solution available in market to give you a reference to explore further:
• Commercial Solutions
• ExtremeDB by McObject
• TimesTen by Oracle
• SQLFire by VMWare
• SolidDB by IBM
• Hana by SAP
• Big Memory by Terracotta
• Altibase HDB/XDB by Altibase
• Open Source Solutions