Best Practices Gap
As you build out your first multi-tenant Hadoop cluster, it is easy to focus on getting things working, without planning for how to provide appropriate structure and processes to provide operational support to your users.
A few items to put on the list for documenting in your cluster include the following:
- Access Control – How do your users get set up with credentials to access the cluster?
- Directory Structures – What is the appropriate directory structure for user directories in Unix and HDFS?
- Retention and archiving – How do clean up unnecessary files and avoid wasting disk space, and retain what you require?