Leveraging machine learning for deep-sea video and image analysis

MBARI’s Video Lab has made significant advancements towards incorporating machine learning (ML) into the annotation and analysis of our 38-year archive of deep-sea video. Central to these efforts is the integration of the Video Annotation and Reference System (VARS) with advanced ML tools and the development of a new machine-assisted annotation workflow (VARS-ML).

 

Quick VARS-ML statistics

  • Over 28,000 hours of deep-sea video footage recorded since 1987.

  • More than 500,000 frame grabs available for quick review, reference, and revision.

  • Over 10 million annotations of marine life, habitats, science experiments, and phenomena.

  • VARS-ML has generated over 690,000 localizations—crucial to model training—of 1,600 unique classes.

VARS-ML workflow

We retrain our models using an iterative approach. With each iteration, our models become more accurate by including more and/or better localized training data. Our goals are twofold: to improve model performance and to decrease the time required for human validation. We believe that in the future this dynamic process will allow us to better handle the immense scale of our expanding video archives while maintaining or enhancing scientific value. Machine learning will also open up opportunities for data mining of the older footage in the archives, enabling us to refine and update observations and to explore undiscovered findings from the past.

  1. Localizing imagery: We identify and localize targets of interest within our extensive deep-sea archive using custom-built applications. Video Lab staff can quickly and accurately create labeled datasets used in model training.
  2. Model training: We use cutting-edge YOLO models from Ultralytics to train our machine learning algorithms. These models are optimized for object detection, allowing them to recognize marine species, equipment, and other features with high precision.
  3. Generating ML proposals: Once trained, the models generate proposals—these are computer-generated and automated annotations of objects in video or image datasets.
  4. Human validation of ML proposals: Human experts review and validate the machine learning proposals, ensuring accuracy and refining the dataset. This step provides critical feedback, enabling further model improvement and ensuring annotation quality and consistency.

Model training in the cloud | Ultralytics YOLO models:

MBARI uses Ultralytics YOLO models to train our machine learning algorithms. These models are optimized for object detection and identification, allowing them to recognize marine species, equipment, and other features with high precision. Cloud computing resources are used to train large models more quickly and efficiently.

Model inferencing | Generating ML proposals using cloud compute, local GPU, or desktop computers:

Once trained, the models generate proposals (model inferencing)—computer-generated and automated annotations of objects in new video or image datasets. We can run inferencing on cloud compute, local GPU, or desktop computers, depending on the size of the dataset and the speed required. This allows for flexibility when processing new data. In the near future, we will be able to run inferencing during ROV dives in real time.

VARS-ML tools

The VARS team has developed custom applications to streamline the localization and validation processes, making it more efficient to generate training datasets and validate results. These innovations are central to our success in applying machine learning to marine science.

VARS Annotation:

The VARS Annotation GUI, with its custom Sharktopoda video player, allows us to localize directly on video during our normal annotation process. To our knowledge, Sharktopoda was the first standalone video player built with this capacity. An integrated model (VARS-oracle) enables us to generate ML proposals on-the-fly while using VARS.

VARS-localize:

VARS-localize allows us to search for existing frame grabs and rapidly localize objects for generating ML additional model training data.

VARS-gridview:

VARS-gridview is used to review, edit, and validate ML and human-generated localized regions of interest (ROIs). We can sort ROIs by class label, observer, dive number, and visual similarity.

 

Applications of MBARI’s ML work

Example of MBARI452k model object detection and tracking (Labels = ID number + class label + confidence level)

  • Automated identification of marine species: VARS-ML models are currently used to identify numerous species observed in deep-sea videos. In the future, using ML models and our ML integrated workflow will reduce the manual labor needed for annotating footage and will likely improve the consistency of data analysis.
  • Object detection and tracking: Machine learning models are used to classify, detect, and track objects (like jellies, squid, or other marine life) in video streams. In the future these models will help MBARI study the distribution, movement, and interactions of deep-sea organisms.
  • Data labeling and archive workflow: VARS-ML tools automate the annotation of visual data, a process which has been painstakingly done manually for the past 38 years. These ML tools will allow us to rapidly process new or existing video and image data, enhancing our ability to understand species distribution, temporal population changes, and interactions with other animals and the environment.

 

Driving research forward

MBARI is committed to utilizing cutting-edge technology to advance marine research, ultimately providing critical insights into ocean health and ecosystems while fostering collaboration among scientists. The VARS-ML project combines the expertise of marine scientists, engineers, and data scientists to unlock the potential of machine learning in deep-sea research. This approach not only enhances our understanding of ocean ecosystems but also sets the stage for groundbreaking discoveries in the near future.