Skip to main content

Easy Understanding of Cloudera Data Platform (CDP) Components


Initially there were three mostly used Big Data Players in the market like Cloudera, Hortonworks and MapR. Out of these MapR was acquired by HPE (HP Enterprise) and other two got merged in 2019.
After merging Hortonworks with Cloudera, it is now know as CDP (Cloudera Data Platform) or Enterprise Data Cloud.

Cloudera Data Platform is a complete Edge to AI solution for any kind of workload requirement like data ingestion for collecting data from edge devices, data engineering for data enrichment, data warehouse for reporting, operational database for building applications and ML/AI for predcitive or prescrpitive use-cases with unified, consistent security and governance.

I have seen there are lot of confusions to most of the engineers who was earlier working on either Hortonworks or Cloudera components becuase there are renaming of the many components after the merger has been completed. Some of those components has been removed, some of them has been replaced by another components. So here i will try to give an overviw about which CDP product is offering which components in newer version of Cloudera Data Platform and try to simplify the naming of the different CDP Products and their components.

CDP (Cloudera Data Platform) now provides services to almost every use-case based on Hardware requirements like On-premise or Data Center environment, Public Cloud, multi-cloud or even Hybrid envrionment as well. Hybrid Solution can be a combination of CDP Private Cloud and CDP Public Cloud solutions.

  • For Cloud Offering, a new product is available now known as CDP Public Cloud which is a data analytics and management platform deployed on any of the major Public Cloud providers like AWS, Azure or Google Cloud with the functionalities like Artificial Intelligence, unified security and data governance. The major benefit of this kind of solution is that Storage and compute services are separated from each other. For the storage part we can use AWS S3, Azure ADLS Gen 2 or GCS from Google Cloud storage as Data Lake and for the compute we can use VMs on any of the cloud provider without any storage disks requirement.

To access the CDP Public Cloud, the credntials will be provided by Cloudera Account team to access the URL https://console.cdp.cloudera.com. Once logged in, you will get the Cloudera Data Platform Home Page as below. 

The CDP Home Page has 2-parts:
  • Control Plane: It consists the services (Data Catalog, Replication Manager, Workload Manager, Management Console) which can be used across multiple clusters in CDP Public Cloud or Hybrid environment to manage the data, infrastructure and analytics workloads.

  • Service Plane: It consists of all the workload Experiences/Services (Data Flow, Data Engineering, Data Warehouse, Operational Database, Machine Learning) which can be created whenever required and suspend when workload operation has been completed to control the public cloud usage.
    Data Hub Clusters is also a service to create and manage different workload clusters based on built-in templates for common workloads (powered by Cloudera Runtime) on any of the public cloud. It provides workload isolation and elasticity for every workload, every application with a different software version, different configuration and deployed on different cloud infrastructure on demand.


If you will select Management Console, you will get the Management Console Page as below.

Management Console is generally used by CDP Admins to manage and monitor all of the CDP services from a Single pane of glass across all the environments created in different public cloud. It is also used to provision and destroy CDP services deployed in data center or in muplitple or hybrid cloud. 


    • CDP Public Cloud on AWS Cloud



  • Differnce between Public Cloud Services used by CDP Public Cloud


  • For On-premise environment offering, CDP now providing a solution known as CDP Private Cloud where we have mainly two kind of solutions called CDP Private Cloud Base (Earlier known as CDP Data Center) and CDP Private Cloud Plus (also known as CDP Experience).

    • CDP Private Cloud Base is similar to CDH which you might have already used in the past but with some additional components like Hive-on-Tez (Hive3), Phoenix, Zeppelin, SMM (Streams Messaging Manager), SRM (Streams Replication Manager), CC (Cruise Control), Ranger, Knox, Atlas, NiFi, Flink and many others.

    • CDP Private Cloud Plus is an offering similar to CDP Public Cloud but on existing On-prem or Data Center hardware environment. We can use CDP Private Cloud Base Cluster as Data Lake with SDX components (Knox, Ranger, Atlas, Hive Metastore) and for the compute part we can create experiences on RedHat OpenShift Cluster (v4.5 or later) or Kubernetes Cluster (created and managed by Cloudera) coexist with Base Cluster. But this solution is not as much mature as CDP Public Cloud becuase the number of experiences which are available on CDP Private Cloud Plus is less than CDP Public Cloud offering. In future, it might be same as CDP Public Clud but for that we need to wait for some more time.
            

Comments