In-Memory Computing and the Future of Machine Learning
We explore how in-memory computing addresses our growing need for speed.
- By Edward Huskin
- August 30, 2019
The last few decades have seen a large jump in the processing power available to the public for day-to-day use, in accordance with Moore's Law. This has largely been achieved by increasing clock speeds, with newer processors evolving from 8-bit chunks of memory to the more phenomenally powerful 64-bit chunks.
Although this revolution has been largely beneficial for the average consumer, the jump hasn't been as drastic in terms of server processing because disk speeds of traditional relational databases have remained largely the same.
With artificial intelligence now more than just a buzzword and the amount of data we produce growing every year, in-memory processing might be what we've been waiting for.
Business Data Becomes Big Data
At a 2017 conference held in California, HP research scientist John Paul Strachan explained how Facebook users alone produce a whopping 4 petabytes of data every day. To understand the vastness of the data we produce, consider Twitter. The very first tweet was sent by Jack Dorsey in 2006, and it would take the platform three years to reach the billionth tweet.
Now, it takes just two days for a billion tweets to be sent. The amount of data we produce on Twitter alone has grown by six orders of magnitude -- that's from 5,000 tweets a day in 2007 to 500,000,000 per day in 2013 and 660,000,000 per day in 2019.
Finally, consider the amount of data we produce that never ends up on a public website such as Google. Cameras and other sensors have become mainstream ways of collecting individual data.
The Downside of Relational Databases
All this data has to be processed in data centers that are often not properly equipped to handle such large data volumes. Bandwidth caps aside (a discussion that deserves its own article), processing speed has almost reached its limit. Although processors were previously made faster by increasing clock speeds, material science has reached a bottleneck on how much higher the clock speed can go.
The most prevalent technologies currently in use for storing and retrieving data are relational databases such as SQL and emerging solutions such as NoSQL together with other disk-based solutions.
Relational database management systems (RDMSs) such as those provided by Oracle's OracleDB or open-source solutions (including Postgres and MySQL) are currently the most popular way to deal with structured data. These are popular because there's hardly any other specification on the market that's able to mimic the power of SQL joins and the relational aspect the databases provide. However powerful and irreplaceable joins may be, their biggest drawback is speed.
One of the drawbacks facing SQL-based solutions is that they were designed to retrieve whole rows of data at a time. However, modern data processing requirements often don't need that much data at once, so a lot of data and processing power ends up going to waste. In other words, many modern applications deal with partial rows, which aren't supported.
Additionally, features such as ACID are what make SQL solutions so powerful. These add much-needed functionality but end up sacrificing speed. To further deal with this bottleneck, technologies such as OLAP cubes were invented, but it's cumbersome to set up the required data structures.
The Need for Faster Processing
Traditional methods wouldn't come close to being able to handle the kinds of data we produce today, no matter the amount of processing power and memory. As a result, the bulk of the work we do in analyzing and processing this data today falls on the shoulders of artificial neural networks.
Neural networks are complex, but at their core they work by "learning" how to perform tasks after being fed examples rather than being controlled by fixed, programmatic code.
The most commonly used neural networks have a three-layer architecture for their relatively low memory requirements, but eight-layer networks are known to perform much better in use cases such as image classification. These form the basis for deep learning, machine learning, and, by extension, artificial intelligence.
How In-Memory Processing Affects Machine Learning
The first and perhaps most significant implication of in-memory processing is real-time processing of data. By leveraging processing on multiple cores and in-memory caching, data doesn't have to be sent anywhere to be processed. Even with modern SSDs that can deliver speeds of up to 2.5Gbps, it still pales woefully in comparison to the incredible 134Gbps modern DRAM can accomplish.
Furthermore, even with the kinds of speeds offered by modern SSDs, there still lies a significant bottleneck in the fact that data must still be moved into and out of the RAM (by far the slowest step in the process) rather than having all that done on the fly.
The inference of lower latencies and truly real-time applications will ring out from the world of IoT to more crucial applications such as autonomous vehicles. The latter case, where life-or-death decisions may need to be made in a fraction of a second, should highlight just how important these increased speeds will be for machine learning.
Fast, real-time processing of data is an important aspect of optimizing a deep learning model for speed. Different kinds of artificial neural networks (ANNs) have shown a significant improvement in performance when dealing with tasks such as sound processing, facial recognition, and image classification. An ANN works by adjusting weights between neurons based on new incoming data in a process referred to as training.
Training a neural network requires significant computational power and time and will primarily depend on the accuracy that's needed, the amount of data, and the kind of neural network being trained. Moderately large to huge neural networks take multiple days, at times even weeks, to fully train. This slow speed is usually attributed to the need to move the weight data back and forth between the memory and processor, which, if you recall, is the slowest step in traditional data processing. The reduction of ETL load provided by new in-memory processing systems has the potential to greatly speed up neural training. The adjusted weights are available to the processor faster, allowing the neural network to recalculate the model, based on these weights, faster -- even with new data being added.
Each kind of neural network is best suited to different tasks, much like how traditional algorithms perform better for certain problems than others. Convoluted neural networks, for instance, excel at image recognition and classification. Long short term memory neural networks (LSTMs), on the other hand, were invented to take care of the vanishing gradient problem. Put simply, LSTMs store previous data for longer periods of time compared to other models. If an ANN were to predict the next word in the sentence, "I live in France. I speak ... French" would need to infer from previous data (in this case, the word France) in order to make a correct prediction.
Unfortunately, LSTMs suffer the same bottlenecks in training and in operation as other ANNs. Because the data has to continuously be passed from storage to the processor, ETL once again slows down the system. In-memory processing eliminates this bottleneck. Major concerns lie in the fact that RAM is volatile, but modern datacenters are ridiculously effective. Major CDN providers such as AWS, Digitalocean, and Cloudflare have never had issues where whole data grids went offline due to any fault of their own. (Any well-built LMST is meant to be fault redundant. The rest of the cluster should still be able to run smoothly in the event one node goes down). It never hurts to have backups, but your training data will still be available for years while being stored in memory.
Enabling New Use Cases
With the kind of speed we've come to enjoy thanks to improved in-memory performance, new use cases are bound to open up to us. For example, hybrid transactional/analytical processing (HTAP) would not be possible without these improvements.
HTAP addresses a common bottleneck in the computing world: transactional databases and analytics databases that are usually separated due to memory constraints. HTAP gets rid of this barrier by enabling the handling of these mixed workloads within the same architecture. This then further enables the processing of large volumes of data on the server while being able to send them back to the client in incredibly short periods of time.
Use cases that would greatly benefit from such speeds include fraud detection, vehicle routing, and patient monitoring.
Ad Personalization
Google is the perfect example of how machine learning is bound to change the future of computing.
Through their numerous data collection schemes, Google knows the tastes, preferences, and buying patterns of anyone who relies on its services. More than two billion people use Google daily, so you can imagine the amount of data collected.
Processing this data takes a huge amount of resources. One of the tools Google relies on is TensorFlow, a library they produced and maintain. Its native in-memory extension helps marketers target the right kinds of ads to the right users.
Autonomous Vehicles
One of the more significant ways modern ML has affected the industry is by enabling experiments with autonomous cars, which must make split-second decisions that could mean the difference between life and death.
The data collected by sensors all over an autonomous car is both shared with other cars and sent to the closest server to optimize routing and avoid traffic as well as other road hazards that occur suddenly and without warning.
Built-In Support for Machine Learning
Platform vendors have acknowledged the growing popularity of machine learning by integrating some of the most popular machine learning libraries right into their in-memory platforms. This will eliminate the need for attaching third-party libraries and having to train them to get their applications up to speed, granting a real-time operational model from the ground up.
Popular libraries such as Apache Spark, Apache Kafka, and TensorFlow have also seen the adoption of the in-memory computing craze. These library vendors have added native integrations into their platforms to take advantage of the full power of deep learning and artificial intelligence, which means the need to maintain a separate database for extracting, transforming, and loading (ETL) the data is completely eliminated. As a result, the cost and complexity of analyzing streams of data from different sources is greatly reduced.
Improving Scalability
Aside from speed, in-memory computing also scales faster than traditional disk-based solutions. Most modern databases can be distributed among multiple servers via sharding, but in-memory processing is based on parallelized distributed processing.
Rather than relying on a central server to manage all the connections in a system, a distributed system enables computers spread out across different data centers to share their processing abilities, speeding up analytics.
A Final Word
In-memory computing has evolved because of the drawbacks of traditional database management systems, the most significant of which is speed. It may be the technology we've been waiting for to help meet our ever-increasingly complex, real-time, low-latency requirements.
In-memory computing will continue to evolve and is bound to offer huge scalability improvements, scaling to our IoT applications beyond anything we can possibly manage with traditional architectures. Everything is brought together in a single, unified platform without significant drawbacks, helping to reduce operational and development costs.