In March of 2021, I was tasked with working on a visionary redesign effort to create a comprehensive end-to-end tool for training, evaluating, deploying, and monitoring complex machine learning use cases. This was a cross-functional effort involving data science, engineering, and product management.

Murphy McQuet (Project Manager Lead), Alex Chao (Project Manager), Romain Juban (Data Science Advisor), Jake Whitcomb (Project Manager)

Research and Design Lead
Primary and Secondary Research, User/Usability Testing, Heuristic Analysis, Affinity Diagramming, Concept Mapping, Low to high fidelity Design

Product Background

C3’s mission is to provide enterprises from a variety of industries (energy, finance, oil and gas, healthcare, etc.) with robust, scalable applications that deliver AI-driven insights. Common AI use cases within these applications involve inventory optimization, anomaly detection, predictive maintenance, and more. While C3’s ultimate vision is to provide out-of-the-box applications for these various industries and use cases, there are typically challenging custom configurations requested by customers, which makes the application development process quite challenging for internal C3 developers.
‍
As a result, C3 developed its own internal data management, ML model evaluation, and deployment pipeline product called IDS (Integrated Development Studio). This tool was intended to provide internal developers and data scientists with GUI to help evaluate and monitor the various artifacts (data models, ML models, deployment pipelines) in their respective workflows.

The Challenge

While initially well-received, internal IDS usage quickly dropped off due to gaps in both functionality and overall usability. This was a serious concern since IDS was eventually poised to go-to-market publicly, with its first controlled beta release set for mid 2022. Confusing product navigation, outdated UI components, and slow performance made the usability issues fairly obvious.

The lacking functionality, however, provided a larger opportunity to fundamentally redesign the tools within IDS from the ground up. Within the various sub-tools, my team focused on data scientist workflows for preparing data, training ML models, evaluating performance, and deploying/monitoring models in production environments.

Initial Research Methodologies

The research phase involved assessing the functionality gaps from multiple angles (user interviews, domain understanding, and third-party competitive analysis). I lead our team’s research effort to document our assumptions and define goals of what we were trying to understand about the ML development process.

Primary Research - Understanding User Workflows and Tools

Since our team had an arsenal of internal data scientists to gain insights from, our first step was to identify the exact gaps within the IDS data science experience. We spoke with 7 internal data scientists to better understand how they configure and track ML experiments, evaluate training results, and monitor deployed models over time.

Secondary Research - Competitive Analysis

We also surveyed a variety of tools for both data monitoring/ingestion and machine learning. The tools we observed ranged from large enterprise products like Microsoft Azure and Amazon Sagemaker to smaller scale products focused on delivering quick ML predictions like Akkio and ObviouslyAI.

Primary + Secondary Research - Understanding Technical Domain

Driven by the discovery that the ML development workflow is not as straightforward as simply training a model and viewing results, our team spent a substantial amount of time understanding sub-workflows like data segmentation, hyperparameter optimization, and feature engineering.

Research and Design Process

Mapping Ideas and Information Architecture

Once we had gained our initial footing of the core needs from our research, we embarked on the ideation and concept phase of the project. Our team brainstormed ideas born from the user needs and frustrations, which were then clustered and categorized into functionality ideas. Throughout this process, we began to establish an information architecture to map the various ML artifacts (models, experiments, notebooks, deployments, segments, environments) for the re-envisioned ML Studio based on our conceptual understanding from research.

Left: Mapping the relationship of ML Artifacts | Right: Notes and ideas for core workflow functionality

Left: Mapping ML Artifacts and Entities in Relation to IDS Technical Paradigms | Right: Internal Glossary for Documenting our ML Artifact Concepts

Finalizing the Target User

Since majority of the intended users of ML Studio would eventually be data scientists, we created a user persona named Dana to capture the key goals, needs, and frustrations we learned about from internal and external data scientists. Her primary scenario (training, evaluating, deploying, and monitoring complex machine learning models in a production environment) was based on common enterprise-facing machine learning use cases internally.

Dana's use case, goals, needs, and pain points

Concept Validation

Due to the complexity of the ML workflows we were surveying, our team decided to break up functionality areas into different phases spread over multiple sprints. Each sprint, we would attempt to tackle a specific part of the core workflow (setting up details for a ML project, model performance evaluation, model deployment, etc) through low-fidelity wireframe flows and user validation. Upon covering most of the core functionality, we moved onto sub-flows like data segmentation, model auditing/approvals, and prediction rules and conditions.

Example of notes from a remote user testing session evaluating content for a deployed model detail view.

Addressing Unique Challenges

Throughout this research effort, we stumbled upon several unique workflow challenges that required novel UI solutions. Below are a few examples of the solutions we designed to target some challenges.

Challenge 1: Bucketing Training Data During Training and Experimentation

During the experimentation process, data scientists may find that large sets of training data may require further modeling nuance across certain dimensions (i.e. the country of a client, the manufacturer of an asset, etc). While most data scientists are comfortable with slicing their data as they see fit in their IDE of choice like Jupyter Notebook, it can be cumbersome to keep track of these segments in notebooks alone.

Solution: Data Segmentation

A data scientist is able to create segments from their data in ML Studio with a visually-guided segment creator. Starting from the "parent" segment (the original data target), they are able to create sub-segments through attribute filters or groups or an arbitrary percentage-based split (typically useful for A/B testing). In this view, users can rename individual sub-segments, view the count of the subjects in that segment, and the percentage of subjects or rows that segment contains in respect to the total population.

Models can be trained and deployed on any sub-segments created in this view.

3 sub-segments created from the parent segment ("clientAccounts"), with an open popover allowing the user to create another sub-segment from "clientsFrance"

Challenge 2: Understanding Key Model Prediction Contributors

To continuously refine a ML model, it is important to understand which key contributors (features) are impacting the model's performance. Currently, data scientists manually create data visualizations like SHAP plots directly in their notebook(s) to understand feature contribution values of their model predictions.

Solution: Prediction Analyzer

Once a model has been trained, a data scientist is able to view robust details about the model's predictions and feature contributions. The graph at the top shows the output of all predictions, with the purple sliders serving as a filter for the list of prediction subjects below.

Each row of the subjects list is expandable, with the option to view which top 5 influential features contributed to the output value in a positive or negative direction.

Top: Prediction Value Histogram Visualization with sliders to filter subjects list below | Bottom Left: Expanded view of single subject with local SHAP value showing top feature contributions for the output | Bottom Right: Unexpanded list of subjects from the predictions, including name, prediction date, prediction output value, and true value (assuming supervised learning and label provided)

End-to-end Solution

Core Functionality Walkthrough

The following diagram shows the high-level steps of Dana's use case with ML Studio.

Create ML Project

Dana is able to create an ML project based on her specific modeling challenge. She is able to select which performance metrics are the most important to her needs.

Segment Training Data

During her training process, Dana is able to further segment her data by attributes (i.e. certain column filters) or an arbitrary percentage split.

Evaluate Model Performance Metrics

Dana is able to understand key details about the performance of her model with detailed visualizations, metrics, and feature contribution.

Evaluate Model Predictions

Dana can also dive deeper into the individual predictions made by the model and view force plots to understand individual feature contributions.

Retrain Model

She is able to easily retrain models with new data, incrementing the version of the model.

Audit Models

Collaborators are able to view updates and leave comments on Dana's modeling process.

Deploy Model to Environment

Once the model is ready to be deployed, Dana is able to go through a thorough series of steps to deploy the model.

Create Prediction Rules

To ensure the performance of the model does not drift over time, Dana can also create prediction rules for data quality and performance drift purposes (i.e .automatically drop incoming values that fall beyond certain thresholds).

Monitor Deployed Model

Over time, Dana and her collaborators have a robust view of the deployed model's performance and health.

Compare Deployed Models Over Time

If there are multiple models deployed, Dana is able to compare the performance of multiple models side-by-side to understand when one is outperforming the other.

Promote Challenger Model to Champion

Once Dana understands that her newer model is outperforming the existing Champion model, she is able to swap the model assignment and promote the new model to Champion.