Unicamp NAMSS 2D Migrated Dataset

This repository contains code to download, process, and analyze the Unicamp NAMSS 2D Migrated Dataset. This dataset consists of seismic data collected during the North Atlantic Margin Seismic Survey (NAMSS) project.

Installation

In order to set up the required environment, we use conda environments. To create and activate the environment, run the following commands:

conda env create -f environment.yml
conda activate namss

Quick Start

The repository includes scripts to download and process the dataset, as well as Jupyter notebooks for analysis and visualization. There are several ways to customize the dataset processing, which user may refer to the NAMSS_Metadata and NAMSS_2DMigrated_Dataset README files for more details. We already made available many of default values and configurations to facilitate the quick start.

Download Original Data

To download the dataset, run the following command:

cd NAMSS_2DMigrated_Dataset
./download_migrated_data.sh 
# Survey W-29-80-WO direct file links are broken, so we download manually:
./download_w-29-80-wo.sh

This script will read all CSV files in NAMSS_2DMigrated_Dataset/Migrated_Balanced folder, download all files listed in it and save it into NAMSS_2DMigrated_Dataset/Migrated_files folder, which will contain a subfolder for each survey, and each survey folder will contain the downloaded SEGY files.

Creating the dataset

Once the dataset is downloaded, you can convert the SEGY files to TIFF images and create dataset splits using the provided scripts.

cd NAMSS_2DMigrated_Dataset
python create_dataset.py

It will read all SEGY files in NAMSS_2DMigrated_Dataset/Migrated_files, convert them to TIFF images, normalize them to the range [-1, 1], and then save them into Data/unicamp-namss-dataset train, validation and test folders. This script make cleanup based on NAMSS_2DMigrated_Dataset/curated_survey_list.csv file, which contains the list of surveys that passed our quality control (manual inspection of images).

Analysis Scripts

The Analysis folder contains scripts for visualizing the embeddings extracted from the dataset. These scripts utilize libraries such as Plotly, UMAP, and Pytorch to extract, reduce dimensions, and visualize the embeddings. The pipeline includes:

Embedding Extraction: Extract embeddings from the dataset using pre-trained models, including: ResNet50 (pretrained with COCO) and DINOv2 (pretrained with LVD-142M). For each image, a fixed-size embedding is extracted and saved for further analysis.
Dimensionality Reduction: Apply UMAP to reduce the dimensionality of the extracted embeddings to 2D space for visualization.
Visualization: Generate scatter plots of the 2D embeddings, colored by various metadata attributes such as year, survey, and acquisition parameters.

NOTE: Step 1 can be a computationally intensive process. Please ensure you have access to a machine with a compatible GPU and sufficient memory to handle the dataset.

Embedding-Space Variability Figures

The Analysis/plots_namss.py script is used to generate the embedding-space variability figures (Figure 5 in the paper). It will create UMAP visualizations of the embeddings extracted from the NAMSS dataset. You may change METADATA_ALL_PATH, DATA_PATH, and UMAP_FIGURES_PATH variables to point to your dataset and desired output locations, if needed. Then, run the script using:

cd Analysis
python plots_namss.py

This will generate and save the UMAP plots in the specified output directory, colored by different metadata attributes (year, macro-region, and survey).

Relationship of Unicamp-NAMSS to other seismic datasets

We also provide scripts that compares the Unicamp-NAMSS dataset to other seismic datasets, such as the Netherlands F3 dataset and the AI-Crowd's SEAM-AI dataset (Figure 6 in the paper). In order to run these scripts, you first need to download and extract the Netherlands F3 dataset and the SEAM-AI dataset. They can be found at:

Netherlands F3 dataset: https://zenodo.org/record/3755060/files/raw.zip (does not need registration). Extract and use the seismic_entire_volume.npy file.
SEAM-AI dataset: https://www.aicrowd.com/challenges/seismic-facies-identification-challenge (needs registration). Download the data_train.npy file.

Once downloaded and extracted, you may update the paths in the Analysis/plots_compare_datasets.py script to point to the respective dataset files. Then, run the script using:

cd Analysis
python plots_compare_datasets.py

This will generate UMAP visualizations comparing the embeddings from the Unicamp-NAMSS dataset with those from the Netherlands F3 and SEAM-AI datasets, and save the plots in the specified output directory.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
Analysis		Analysis
NAMSS_2DMigrated_Dataset		NAMSS_2DMigrated_Dataset
NAMSS_Metadata		NAMSS_Metadata
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unicamp NAMSS 2D Migrated Dataset

Installation

Quick Start

Download Original Data

Creating the dataset

Analysis Scripts

Embedding-Space Variability Figures

Relationship of Unicamp-NAMSS to other seismic datasets

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Unicamp NAMSS 2D Migrated Dataset

Installation

Quick Start

Download Original Data

Creating the dataset

Analysis Scripts

Embedding-Space Variability Figures

Relationship of Unicamp-NAMSS to other seismic datasets

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages