Data Engineering is Critical to Big Data Success
I mentioned in an earlier blog titled, “Staffing your big data team,” that data engineers are critical to a successful data journey. That said, most companies that are early in their journey lack a dedicated engineering group. And the longer it takes to put a team in place, the likelier it is that your big data project will stall.
The data engineering team is responsible for collecting and ingesting batch and stream-oriented data, inventorying the data, working through ingest bottlenecks, and developing and streamlining ETL processes.
Data engineering is a discipline that you can re-skill people into, the below diagram illustrates the specific skill sets you will need to develop into a successful data engineering team. However, it’s imperative to find people who have an intense interest in the data that they are working with. Almost anyone can produce a result, but true data engineers will closely scrutinize and check it. For example, a report leveraging worldwide geolocation data on points of interest might yield results that suggest the most searched for category was fortresses and castles. Common sense, however, suggests it’s more likely that castles should rank below restaurants, gas stations, hotels, etc… Good data engineers put a closer lens on the data, the tools, and the code being used to develop reports that produce the right results.
Image 1: Data Engineering Skillsets
The data engineering team also must be accountable for the inventory, acquisition, governance, and security of a business’ data. And, in more mature businesses, the data engineering team is tasked with creating and maintaining common profiles, for example, one that tracks a customer’s interactions across channels and lifecycle. Those centrally managed profiles can be leveraged by multiple teams within an organization to provide more customized service, support, and cross-sell/upsell opportunities. Other common profiles managed by the data engineering team may include product lifecycle and data lifecycle.
In addition to the aforementioned responsibilities, the data engineering team is also the hub of all things data related. They must constantly advocate for new and better data and work with others across the centralized and decentralized organization to ensure the data assets are protected and accessible.
For example, your security and privacy experts are likely responsible for Cloudera Enterprise security models. This includes perimeter security, authentication, authorization models, encryption etc. They must work closely with the data governance program to understand the business privacy policies and ensure that only the appropriate people have access to data for the appropriate use cases. Then this information must be executed against by the data engineers.
Finally, a company’s data architecture must be aligned to the data engineering team. That architecture exists to store, serve, and process data. Data engineers must work closely with the architects to drive requirements and ensure the underlying technology meets the needs of the data including storage, serving, and processing.
As you can see, the scope of a data engineer is both broad and deep, and it’s often considered among the best jobs in the United States. Whether you’re a practicing or aspiring data engineer, we at Cloudera would love to hear from you. Leave us a comment and let us know what you think of this critical data-journey-defining role.