Skip to content

jimbojumbo/face-dataset-cleaner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

face-dataset-cleaner

An implementation to clean large scale public face dataset

This is an unofficial implementation based on the paper A Community Detection Approach to Cleaning Extremely Large Face Database

To do the experiment, first prepare your face-dataset and LFW embedding files using a pre-trained face recognition network.

Use the lfw_far_thresholding.py to determine the similarity threshold between different face images.

Then run the dataset_adjacency_build.py to save the image pair similarity information in csv files, which will then be used in dataset_cleaner.py to build the graphs and do small community cleaning.

A small tool is provided to move original images to a separate folder according to the clean data list.

A first version of cleaned VGGFace2 training and testing image lists can be downloaded at Google Drive

About

An implementation to clean large scale public face dataset

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages