The Six Principles of Modern Data Architecture
One of the things that I love to be able to do in my role as a Product Manager is spend time with customers and prospects understanding what is important to them as they make their journey to a modern data architecture. Based on these discussions I’ve noticed a consistent set of themes emerging; themes that span industries, use cases, and geographies. I’ve come to think of these themes as principles of an Enterprise Data Architecture.
Whether you are responsible for data, systems, analysis, strategy, or results, you can use these principles as a guide to help you navigate the fast-paced modern world of data and decisions. With new technologies you can create the right architecture to enable your business run at an optimized level. But before you jump into technology, consider first the foundational tenets.
#1 Data is a Shared Asset: Enterprises that start with the vision of data as a shared asset ultimately outperform their competition, as CIO explains. By starting with the idea of data as a shared asset, instead of allowing departmental silos to persist, modern enterprises are able to ensure that all stakeholders have a complete view of the company. By a complete view, this means a 360 degree view of customer insights along with the ability to correlate valuable data signals from manufacturing to logistics that can drive improved corporate efficiency.
#2 Provide the Right Interfaces for Consumption: Putting data in one place is not sufficient to achieve the vision of a data driven organization. In order for people (and systems) to benefit from a shared data asset the proper interfaces need to be made available for ease of consumption this might be in the form of an OLAP interface for business intelligence, a SQL interface for data analysts, a realtime API for targeting systems, or the R language for data scientists.
#3 Ensure Security and Access Controls: The emergence of Hadoop as the unified data platform for enterprises has enabled, and necessitated, the enforcement of data policies and access controls directly on the raw data, instead of in a web of downstream data stores and applications. The emergence of data security projects like Apache Sentry make this approach to unified data security a reality.
#4 Ensure a Common Vocabulary: By investing in an Enterprise Data Hub, enterprises can now create a shared data asset for multiple consumers across the business. However, it’s critical to ensure that users of this data are analyzing and understanding this data using a common vocabulary. Product catalogs, fiscal calendar dimensions, provider hierarchies, and KPI definitions need to be common, regardless of how data is consumed or analyzed. Without this shared vocabulary, companies spend more time disputing or reconciling results instead of driving improved performance through a shared understanding.
#5 Information Through Data Stewardship: Time and time again I’ve seen enterprises that have invested in a Hadoop data lake start to suffer when they allow self-serve data access to the raw data stored in these clusters. Without the proper data curation modeling of important relationships, cleansing of raw data, curation of key dimensions, and measures, end users can have a frustrating experience – vastly reducing the perceived and realized value of the underlying data. By investing in core functions that do data curation the value of the shared data asset can be ultimately realized.
#6 Eliminate Data Copies & Movement: Inherent in the value proposition of Hadoop is that it is a multi-structure, multi-workload environment for parallel processing of massive data sets. Hadoop scales linearly as workloads and data volumes grow. By eliminating the need for data movement, a modern enterprise data architecture reduces cost, increases “data freshness”, and optimized enterprise data agility.
The principles above cover key elements important to every business in today’s fast paced world of exploding data, self-service insights, and increased need for security. So regardless of the industry you are in, the role you play in the organization, or where you are in your big data journey, I encourage you to adopt and share these principles as a means of establishing a sound foundation to build a modern big data architecture.
AtScale and Cloudera believe in these principles as the basis to beginning your big data journey. This is apparent in our desire to deliver unified, highly performant, data management systems based on Hadoop. Depending where you are in your data journey, you might find value in our recent Better Together webinar here.
While the path to a modern enterprise data architecture can seem long and challenging, with the right framework and principles, you can successfully make this transformation sooner than you think.
Josh Klahr is the VP of Product at AtScale. You can read more by Josh on the AtScale blog at blog.atscale.com.