Skip to main content Skip to secondary navigation

Technology Transfer for Defense is a cross-campus effort of the Precourt Institute for Energy

Creating Machine Learning Models with Labeled Data

Main content start

PI: Chris Re

Department: Computer Science

Sponsor: United States Navy (USN) ONR NEPTUNE Program

United States Navy (USN) intelligence analysts require automated tools to improve the quality and quantity of their output. Machine learning (ML) can help achieve these goals, but requires labeled data, which is expensive and time-consuming to collect. The problem we address is developing and deploying methods to rapidly create operationally useful ML models with limited labeled data and human resources. We propose a technical solution to this problem based on our research group’s work developing data programming, a technique that combines knowledge from domain experts, existing knowledge bases, and other models to create weakly labeled training sets. In practice, data programming has enabled analysts and researchers to create ML models using person-days rather than person-years of labeling resources. We leverage the H4X process to define a clear plan for translating data programming -- and the award-winning Snorkel software that supports it – into practice for USN users. We describe the problem curation and customer discovery processes, provide several technical objectives formulated from end user feedback, and describe a series of proposed Minimum Viable Products (MVPs) we plan to test under the NEPTUNE program. We anticipate outcomes that include successful identification of USN use cases where data programming can improve analysis outcomes via rapid training set creation; development of Snorkel software applications to support these use cases; extension of the underlying data programming techniques to analysis contexts that require the use of multi-modal data; and integration of passively collected, observational signal from analysts into the process of supervising ML models. If successful, we expect that the outputs of this work will enable USN analysts to rapidly create training datasets for training ML models using a combination of existing unlabeled data, their own domain expertise, and observational signals passively collected during usual analysis processes. Integrating such ML models into existing analysis and decision support processes would enable USN analysts to make operationally important assessments.

H4D Focus Areas: AI/ML, Big Data, Technology Transition

Back to H4D Projects Page