C3’s mission is to provide enterprises from a variety of industries (energy, finance, oil and gas, healthcare, etc.) with robust, scalable applications that deliver AI-driven insights. Common AI use cases within these applications involve inventory optimization, anomaly detection, predictive maintenance, and more. While C3’s ultimate vision is to provide out-of-the-box applications for these various industries and use cases, there are typically challenging custom configurations requested by customers, which makes the application development process quite challenging for internal C3 developers.
As a result, C3 developed its own internal data management, ML model evaluation, and deployment pipeline product called IDS (Integrated Development Studio). This tool was intended to provide internal developers and data scientists with GUI to help evaluate and monitor the various artifacts (data models, ML models, deployment pipelines) in their respective workflows.
The research phase involved assessing the functionality gaps from multiple angles (user interviews, domain understanding, and third-party competitive analysis). I lead our team’s research effort to document our assumptions and define goals of what we were trying to understand about the ML development process.
Since our team had an arsenal of internal data scientists to gain insights from, our first step was to identify the exact gaps within the IDS data science experience. We spoke with 7 internal data scientists to better understand how they configure and track ML experiments, evaluate training results, and monitor deployed models over time.
We also surveyed a variety of tools for both data monitoring/ingestion and machine learning. The tools we observed ranged from large enterprise products like Microsoft Azure and Amazon Sagemaker to smaller scale products focused on delivering quick ML predictions like Akkio and ObviouslyAI.
Driven by the discovery that the ML development workflow is not as straightforward as simply training a model and viewing results, our team spent a substantial amount of time understanding sub-workflows like data segmentation, hyperparameter optimization, and feature engineering.