Skip to content

abachman-dsac/npd-data-quality

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Warning

DO NOT COMMIT DATA FILES TO THIS REPOSITORY

setup

setup assumes you are on a mac with mise and homebrew installed:

# update project tooling
mise install

# update python dependencies
uv sync

# (optional) install a sqlite browser
# brew install db-browser-for-sqlite

# setup local .env file, update DATA_ROOT according to where you put the .csv
# dump files
cp sample.env .env

put the .csv files in raw_csv/ or change .env DATA_ROOT to be the path RELATIVE TO THIS PROJECT DIRECTORY where the files are stored.

usage

bin/load-sqlite

rerun when the .csv files change

open data/dump.db using whatever Sqlite tooling you like.

what it does

bin/load-sqlite loops through every .csv file in the DATA_ROOT folder, generates a tablename based on the file basename (location.csv becomes location) and loads it into data/dump.db using Pandas' default settings.

here's a simple example of what that looks like for one file at a time:

import os
import pd
import sqlite3

csv_filename = "./raw_csv/location.csv"
table_name = "location"
database_filename = "./data/dump.db"

df = pd.read_csv(csv_filename)
conn = sqlite3.connect(database_filename)
df.to_sql(table_name, conn, if_exists="replace", index=False)
conn.close()

If data/dump.db does not exist, it will be created.

If the table does not exist, it will be created. If it does exist, it will be replaced with the data present in the .csv file.

About

Exploring NPD project data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors