Nov 2020 - Present

Visually Guided Machine Learning with Ex Machina

Summary

Ex Machina is a low-code data exploration and analytics tool for non-technical users (Data Analyst, Business Analyst, Strategy Associate, etc). Its primary goal is to help businesses with limited technical resources/expertise to understand their data and leverage lightweight machine learning functionality in a visually guided way.

Collaborators

Josh P. (Project Manager), Matt Connor (Project Manager), Lisa Xu (Designer)

Role and Responsibilities

Research and Design Lead
Primary and Secondary Research, User/Usability Testing, Heuristic Analysis, Affinity Diagramming, Concept Mapping, Low to high fidelity Design

Product Background

With the world’s data increasing exponentially every year, the ability to extract actionable insights from data is becoming increasingly more important. As traditional data analytics grow outdated, the data science field has skyrocketed in popularity in recent years due to the more powerful insights data scientists can provide by cleaning and preparing data and identifying more granular patterns or trends through machine learning. However, data scientists remain in limited supply today, and many companies struggle to hire the high price point of a typical data scientist’s salary.

The Challenge

If not all companies have the luxury of affording a highly technical team of data scientists, how might we enable less technical data analysts to elicit valuable insights from their data with machine learning? 

What is Ex Machina?

C3 AI’s Ex Machina is a product meant to target the aforementioned need. Ex Machina is a no-code tool meant for ad-hoc data exploration, preparation, and analysis. The product supports a variety of powerful functions, from detailed data profiling, to various transformations (joining datasets, filtering values, dropping columns, etc), to statistical summaries (hypothesis testing with 1 or 2 sample means, standard deviation, etc). The insights from Ex Machina (presented in data tables or a variety of visualizations) can be shared within teams and ultimately aims to make the workflow for users like Business Analyst and Strategy Associates faster and more intuitive.

While compelling in its use cases, Ex Machina is still a relatively young and rapidly growing product that has plenty of room to improve. Specifically, machine learning is a challenge to represent in a UI due to its technical nature and nuances in training configuration.

Research Overview

With the focus on making machine learning more accessible and intuitive for non technical users, my team and I created a research plan to document our goals and assumptions. We investigated the domain from multiple angles: primary research interviews, competitive analysis on similar products, and reviewing pain points from customers on the initial Ex Machina experience.

Primary Research - Understanding User Workflows and Tools

I conducted a series of 30 minute remote interviews with both internal and external users that roughly fit the job description of a non-technical Analyst persona. These interviews focused on understanding their current workflows and tools, challenges, and general knowledge of the value of machine learning.

Primary + Secondary Research - Understanding Technical Domain

Along with the general interviews, I set up multiple user feedback sessions with early beta Ex Machina customers to better understand their existing experience with the product. The focus during these sessions were centered on understanding their specific use cases in Ex Machina and general user experience.

Secondary Research - Competitive Analysis

We also surveyed a variety of competitor products to understand which areas were successful versus lacking within Ex Machina’s product experience. This process allowed us to investigate different frameworks and paradigms for simple and complex machine learning training configurations.

Research Summary

Our team distilled the various findings and feedback from the research phase into three core guiding principles for the design of Ex Machina's new machine learning functionality.

Process

Ideation

Based on the guiding insights from research, our team began brainstorming new concepts for how we could successfully abstract technically complex information while still maintaining user understanding of core actions. Before jumping directly into designs, we made sure to validate that our understanding of the simple ML user workflow made sense to the non-technical analysts. We continued to review these concepts with internal and external users (both technical and non-technical) to iterate on functionality and concept feedback.

These concepts were based around AutoML, a handy technology that automatically tests and trains multiple models and surfaces the highest performing model (in the context of that modeling problem’s validation metric), such that the user does not need to manually experiment with different models themself. For the non-technical analyst users we were searching for, AutoML enabled powerful functionality for our team to build UI on top of.

Progressive Disclosure UI Framework

Model Training Configuration

After several iterations and user review sessions, we proposed a general UI framework that balanced progressive disclosure of technical information in a simple way.

The configuration for training a machine learning model in this framework will always focus on the required inputs that determine the actual prediction output: in the case of common business use cases like classification and regression, the only truly required input is the training target column. All other inputs like feature selection, imbalanced/missing data strategy, training data splits, or hyperparameter settings are all hidden from view.

During and after training the AutoML run, the top component will always show a leaderboard of models trained during a given run, and the bottom component will populate whichever model has been selected by the user from the leaderboard (defaults to top performing model).

Model Predictions

The first piece of information the user sees on the Selected Model details section is a snapshot of the predictions outputted from the model. The training target column is highlighted in green and the predictions column to the right is shown in blue. We decided to default to this view after receiving feedback about how performance metrics may be confusing and uninterpretable to users who are unfamiliar with machine learning.

Model Performance

While the performance metrics of the model aren’t what most target users are interested in, this information still lives in the second tab of the Selected Model details. All information at this level contains tooltips to help the user understand how to interpret various scores, definitions, and visualizations.

Adaptive to Different Models

Every machine learning challenge is unique. The specific scores and visualizations that appear at the Selected Model Performance Results are determined by the type of modeling approach and algorithm (i.e. for a Regression AutoML node, the validation metric may be Mean Squared Error [MSE], while Classification might prioritize F1 Score).

Top Row, left to right: Model Performance Results for a Classification Model with ROC/PR Curves (i.e. Logistic Regression Model), Classification Model with Confusion Matrix (i.e. Random Forest Model), Clustering Model with Cluster Centroids and Silhouette Plot (i.e. K-Means Model)

Bottom Row, left to right: Model Performance Results for a Regression Model with R Squared visualization (i.e. ARIMA model), Regression Model with AIC information (i.e. ARIMA model), Clustering Model with histogram plot of mean length (i.e. Isolation Forest Model)

Promoting User Guidance

The importance of guidance and explainability in the machine learning functionality sparked a broader effort to enhance user guidance throughout multiple stages of the product. This guidance ranges from simple tooltips to full in-product documentation for nodes. Below are some examples of different levels in the product where Ex Machina attempts to explain various concepts to the user.