DR. ETL

Dr.ETL is a data flow orchestration platform, it can handle batch, streaming and ondemand data pipelines and monitor them in real time. Assists in maintaining the data quality and error handling mechanisms while streamlining the data processing process by supporting AI Based Audit Rules, Data Integrity, SLAs, and their rich control action framework.

Benefits

  • Hundreds of functions provide the capability of defining complex data flows.
  • Loading heterogeneous sources of data and analyzing them for anomalies
  • Exploring and conditioning data and identifying relationships between them.
  • Streaming data from various sources in real time.
  • An interactive drag-and-drop interface of the AI-enabled Spark orchestration engine of Dr.ETL allows for the definition of distributed processing spark pipelines.
  • A real-time detailed monitoring system of spark pipelines applied to yarn and cloud native resource managers like Kubernetes (K8s), with historical data available for analysis later.
  • Cloud native observability frameworks like ELK/EFK and Prometheus are supported.
  • The ability to see in detail what happens between pipeline data processing steps.
  • Using notification service, any event in the system will be notified, such as errors or delays.
  • The auto-healing and auto-retrigger support helps eliminate the need to manually process missing data.
  • Dr. ETL anomoly detection framework identified data and process anomolies.
  • Batch pipelines can be triggered using cron/timer schedules and ondemand pipelines can be triggered manually or via external API call.
  • Support for multiple Kubernetes clusters with active-active topology provides redundancy for spark pipelines and also aids in disaster recovery and geo redundancy.