Comprehensive Data Quality for Devops Implementation (Johnson & Johnson)
The Need
Johnson & Johnson’s self-service analytics platform – SAP and Primary Access Data Lake (PADL) – for J&J’s divisions such as Surgical Vision, Vision Care, and Tear Science needed a DevOps implementation for ETL, audit, validation, and transformation testing. The initiative required a Python/robot framework and also validation using SAP source(s) where test automation could be accomplished. The manual version of the framework was confusing for the PADL SQA team because of the lack of orchestration for the process.
The Solution
DataTrust (formerly called RDT) provided a self-service no-code solution that automated the DataOps process and integrated with the PADL enterprise systems (e.g. Jira, X-Ray, Control-M). The solution provided a minimal learning curve and managed Terabytes of data from multiple sources into the system.
The Impact
By using the RDt platform, J&J’s PADL SQA team automated all the data quality assurance and data quality control processes. In addition, an automation framework for the upcoming releases, regression testing, functional testing, smoke testing, and on-going data controls for the PADL landscape were done.
Beyond PADL for J&J enterprise use: In addition, J&J’s enterprise architecture board reviewed and approved the PADL data quality framework and leveraged it for a model DataOps implementation for upcoming initiatives by other divisions, regions, and functions withing J&J.
The RightData Edge
DataTrust as a software gave J&J a baseline framework where the framework established both a technical and process-oriented approach to manage data in the data lake. The simple advantage was replacing manual process with higher quality automation. J&J went on to say, “We use our RightData instance for multiple uses such as Bulk Data Comparisons between a source and target database, Data Profiling, Testing Scenarios, and most importantly Data Integrity Controls. RightData provides a single tool option for many data comparison and data auditing requirements. The vendor is very responsive to our needs and has incorporated several suggestions into the RightData tool for extension and ease of use.”
Learn more about DataTrust
RDt is a comprehensive platform for data quality, risk, or compliance needs. Learn more or contact us to chat about your needs.
RDT Data Quality: A no-code data quality suite that improves data quality, reliability, consistency, and completeness of data. Data quality is a complex journey where metrics and reporting validate their work using powerful features such as:
Database Analyzer: Using Query Builder and Data Profiling, stakeholders analyze the data before using corresponding datasets in the validation and reconciliation scenarios.
Data Reconciliation: Comparing Row Counts. Compares number of rows between source and target dataset pairs and identifies tables for the row count not matching.
Data Validation: Rules based engine provides an easy interface to create validation scenarios to define validation rules against target data sets and capture exceptions.
Connectors For All Type of Data Sources: Over 150+ connectors for databases, applications, events, flat file data sources, cloud platforms, SAP sources, REST APIs, and social media platforms.
Data Quality: Ongoing discover that requires a quality-oriented culture to improve the data and commit to continuous process improvement.
Database Profiling: Digging deep into the data source to understand the content and the structure.
Data Reconciliation: An automated data reconciliation and the validation process that checks for completeness and accuracy of your data.
Data Health Reporting: Using dashboards against metrics and business rules, a process where the health and accuracy of your data is measured, usually with specific visualization.