Skip to content

IDEAL-Lab/shapley-value-simple-game

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fast Shapley Value Computation in Data Assemblage Tasks as Cooperative Simple Games

Install dependencies

  • OS: Ubuntu 20.04 LTS.
  • Rust: curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Build

cargo build --release

Generate source data

We use two data sets in our experiment.

  • TPC-H: a benchmark data set that lacks data owner information
     git submodule update --init --recursive
     ./scripts/compile-tpch.sh
     ./scripts/tpch-dbgen.sh -s 1.0
    After this step, we can find the source data in the folder "./data/tpch/data".
  • ESD (European Soccer Database): a real-world data set that contains data owner information.
    • We can manuallly export a csv verison of the data set from the sqlite database or download a csv version directly here.
    • Save the csv files to the folder "./data/soccer/data"
    • Run command: ./scripts/soccer-dbgen.sh

Generate assignment data

We skip this step for the ESD data set since it contains data owner information. For TPC-H data set, assign source data to data owners and store the assignment via:

python3 ./scripts/assign_data.py -d <dataset> -a <alpha> -b <beta> -k <number_of_data_owner> -m <max_copy> -o <equal owners> -r <equal records> -f <output dir>

Example:

python3 ./scripts/assign_data.py -d tpch -a 3.0 -b 3.0 -k 500 -m 4 -o 1 -r 1 -f ./data/tpch/assignment

After this step, we can find the assignment data in the folder "./data/tpch/assignment".

Compute Shapley value

 cal_sv  -d <dataset>  -c <source data dir> -a <data assignment dir> -o <output file> -m <method>

Example:

  • TPC-H:
./target/release/cal_sv -d tpch -c data/tpch/data -a data/tpch/assignment -o rdsv.json -m rdsv
  • ESD:
./target/release/cal_sv -d soccer -c data/soccer/data -o rdsv.json -m rdsv

Compute Shapley value with ablation

Calculate Shapley value for all data owners by ablating one type of decomposition via:

 cal_sv_ablation  -d <dataset>  -c <source data dir> -a <data assignment dir> -o <output file> -m <method> --ablation <ablation_type>

Example:

  • TPC-H:
./target/release/cal_sv_ablation -d tpch -c data/tpch/data -a data/tpch/assignment -o rdsv.json -m rdsv --ablation no-horizontal
  • ESD:
./target/release/cal_sv_ablation -d soccer -c data/soccer/data -o rdsv.json -m rdsv --ablation no-horizontal

About

Source code for the SIGMOD24 paper "Fast Shapley Value Computation in Data Assemblage Tasks as Cooperative Simple Games"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors