Hybrid Cloud Analytics: Plan for These Challenges
Moving to a hybrid cloud analytics model can help you remove analytics roadblocks, but today's new data environment can introduce new problems because it is so complex. Being aware of these issues is the first step to overcoming them.
- By James E. Powell
- December 14, 2018
As big data gets bigger, more diverse, and more dispersed, enterprises large and small are turning to hybrid cloud analytics -- where data from on premises and cloud sources is analyzed seamlessly.
In a recent Webinar, "Achieving Business Value Using Hybrid Analytics," Fern Halper, vice president and senior director of TDWI Research for advanced analytics, discussed what hybrid cloud analytics is all about (and how it can quickly become complex) and the data and analytics challenges of this kind of environment. By taking into account all these dimensions, your enterprise will be better able to plan for -- and reap the benefits of -- a hybrid cloud analytics environment.
The Data Dimension
When you think of "hybrid," you may think first about the type of cloud configuration (public, private, or a combination). However, hybrid can describe the type of data your enterprise is working with -- structured, unstructured, and multi-structured data. Hybrid can alsodescribe the data's location. Data once only on premises is moving to the cloud (or is being created there), or it's both on premises and in the cloud if it's replicated.
Sometimes the data is offloaded from the on-premises data warehouse to the cloud, for example, to offload burdened systems. It's not necessarily a lift and shift but more of a gradual value-based approach. Sometimes data landing and staging is being moved to the cloud, and some organizations are very gutsy and have moved all their data to the cloud. However, more often we see a hybrid mixed model; often organizations are pulling data from one source to another.
Add new storage platforms to this mix, as Halper explains. "In a survey we asked, 'What infrastructure do you have or plan to have in place for predictive analytics?' Enterprises have data warehouses, of course, but its use in the public cloud is now being used by over a quarter (28 percent) of respondents. That figure is set to more than double if users stick to their plan." Just as popular are new data storage platforms, such as the data lake, which can handle a variety of data types. Data lakes -- along with Hadoop on the public cloud and analytics platforms on the public cloud -- are also set to double in use within a few years.
Changing Analytics
The analytics performed on this dispersed data is also changing to a hybrid model. Organizations are expanding their analytics tools kits, adopting machine learning, natural language processing (NLP), and AI technologies. In a recent survey conducted by TDWI, Halper asked respondents what they were doing with these technologies Over half were deploying them in operations for use cases as diverse as predictive maintenance and supply chain optimization. Marketing, a top user noted, put predictive analytics and machine learning to use analyzing customer behaviors, in particular, and churn. Text analysis, a sort of offshoot of NLP, is being used to analyze social media data and perform sentiment analysis. IT is also using these technologies to predict machine failures.
"Different types of analytics are being used together and they're being used separately to gain business value. For example, someone might use -- in the same analysis -- text analytics to determine the sentiment of customers then integrate that information with other data about the customer to create a big data set for customers in order to create a model for retention. Likewise with geospatial data: you might marry it with customer data to predict risk for an insurance company," Halper points out.
Enterprises aren't just expanding the type of analytics performed. They've also changing where analytics is performed. Halper explains that hybrid cloud analytics means being able to access analysis that was done either on premises or in the cloud. "If I perform one kind of analysis on premises, perhaps profiling customers using a visual analytics tool, and then I want to build a churn model in the cloud, I should have an easy way to get at the on-premises analysis. It should sort of seem seamless to me as an end user," she explains.
What's Behind the Move to the Cloud
Cloud power is important for analytics because that platform can provide several key benefits. In the top three: scalability, flexibility, and cost.
"Clearly, the cloud scales up and down according to your needs. When you need to perform analytics processing on large datasets, the cloud enables you to procure as much storage and compute services as necessary. When you're done with the analysis, you can release these services so you're no longer paying for more than you need." With the virtual resources of the cloud, onboarding new data sources and setting up analytics sandboxes is easier than doing the same tasks on premises.
A cloud platform automatically provides the latest version of its software, and that can be helpful with analytics. When it comes to cost, more than a third (37 percent) of survey respondents said it was a big driver for the cloud. The cloud may not always be the most economical option. Compared to on-premises data platforms, startup costs can be lower and ongoing maintenance for the cloud environment can be lower.
Other benefits include the ability to deal with data diversity (the traditional data warehouse often wasn't meant to deal with disparate data types such as unstructured data), putting computing power where the data is (minimizing data movement), and the ability to separate computing and storage resources (so you can spin up and shut down them independently).
Additional Considerations
Take the data types, locations, and storage platforms, combine them with new and dispersed analytics, and you see how quickly complexity can grow. You may have multiple on-premises installments, there can be multiple clouds or even different cloud providers. Data can come from internal sources, external sources, apps in the cloud, and apps on premises.
Organizations are developing strategies to deal with this, such as integrating these systems into some sort of unified architecture or they're federating or virtualizing the data. Most modern data warehouses now have this hybrid multiplatform systems architecture that's been unified by this logical architecture.
Organizations are using another coping mechanism: using semantic layers to try to get a better view with metadata to integrate between disparate data sources.
Splitting "what goes where" is another approach. Some organizations use their data warehouse for some types of analytics and other platforms for newer and more sophisticated analytics such as machine learning. The data goes to a platform on the cloud where it can be explored and then models are built, which means you need to migrate the data. There are, of course, different ways to migrate the data in terms of the initial migration and then offloading data if needed.
A hybrid cloud can exacerbate data issues you're already dealing with, such as governance. Your organizations will need visibility across multiple platforms, so lineage becomes an issue in metadata. Likewise, security issues are complicated by the fact that data can be in more than one place (sometimes at the same time if you've replicated data).
A final challenge Halper mentions is multiple locations. "You may be developing reports and visualizations in one place but building models in another place. If you want to see them in one place, you need more of a unified view. Users often have a hard time wrapping their heads around this complexity, so they select a vendor who can deal with data consolidation and aggregation at a scale complimented by tools that provide a unified and easy-to-use view of this extended distributed platform."
To Learn More
You can view the complete Webinar in our archive here.