Latest Posts
How to configure clients to connect to Apache Kafka Clusters securely – Part 3: PAM authentication
In the previous posts in this series, we have discussed Kerberos and LDAP authentication for Kafka. In this post, we will look into how to configure a Kafka cluster to use a PAM backend instead of an LDAP one. The…
Read more
Building a Machine Learning Application With Cloudera Data Science Workbench And Operational Database, Part 3: Productionization of ML models
In this last installment, we’ll discuss a demo application that uses PySpark.ML to make a classification model based off of training data stored in both Cloudera’s Operational Database (powered by Apache HBase) and Apache HDFS. Afterwards, this model is then…
Read more
Cloudera Flow Management Continuous Delivery while Minimizing Downtime
Cloudera Flow Management, based on Apache NiFi and part of the Cloudera DataFlow platform, is used by some of the largest organizations in the world to facilitate an easy-to-use, powerful, and reliable way to distribute and process data at high…
Read more
Fostering community to help drive cultural change
2020 put on full display how humanity shows up in times of hardship. We saw everything from street celebrations to usher weary medical personnel home after long days fighting to save lives to places like food banks receiving more donations…
Read more
Finding digital transformation in high places – how a ski resort improved operational agility and customer experiences
Most blogs in my history are very focused on Industry 4.0’s digital transformation of the manufacturing industry, which in itself is pretty remarkable. By 2025, Industry 4.0 is expected to generate greater than $11 trillion in economic value as connected…
Read more
Optimized joins & filtering with Bloom filter predicate in Kudu
Introduction In database systems one of the most effective ways to improve performance is to avoid doing unnecessary work, such as network transfers and reading data from disk. One of the ways Apache Kudu achieves this is by supporting column…
Read more
Cloudera Data Warehouse Demonstrates Best-in-Class Cloud-Native Price-Performance
Introduction Cloud data warehouses allow users to run analytic workloads with greater agility, better isolation and scale, and lower administrative overhead than ever before. With the ability to quickly provision on-demand and the lower fixed and administrative costs, the costs…
Read more
Brick and Mortar Stores are Now Built Brick by Brick with Digital Insights
This blog is the final post of a 4-part series. You can read the first blog posts, here: 1. Get to Know Your Retail Customer: 2. Accelerating Customer Insight and Relevance; Improving your Customer-Centric Merchandising with Location-based in-Store Merchandising; and…
Read more
Building a Machine Learning Application With Cloudera Data Science Workbench And Operational Database, Part 2: Querying/ Loading Data
In this installment, we’ll discuss how to do Get/Scan Operations and utilize PySpark SQL. Afterward, we’ll talk about Bulk Operations and then some troubleshooting errors you may come across while trying this yourself. Read the first blog here. Get/Scan Operations…
Read more
Apache NiFi – the data movement enabler in a hybrid cloud environment
Cloudera provides its customers with a set of consistent solutions running on-premises and in the cloud to ensure customers are successful in their data journey for all of their use cases, regardless of where they are deployed. Cloudera DataFlow provides…
Read more