|
1 | 1 | # SQL Tables |
2 | 2 |
|
| 3 | +## Template |
3 | 4 |
|
| 5 | +### {ModelName} |
| 6 | +{Description} |
4 | 7 |
|
| 8 | +| Column | Type | Indexed | Nullable | FK | Default | Description | |
| 9 | +|:-------|:----:|:-------:|:--------:|:--:|:-------:|:------------| |
| 10 | +| | | ✓ | ✓ | ✓ | | | |
5 | 11 |
|
| 12 | +#### Other indices |
| 13 | +* `{column_name}`, `{column_name}`, ... [(unique)] |
| 14 | +* ... |
| 15 | + |
| 16 | +## Data |
| 17 | + |
| 18 | +### SourceUniqueIdentifier (SUID) |
| 19 | +Identifier for a specific document from a specific source. |
| 20 | + |
| 21 | +| Column | Type | Indexed | Nullable | FK | Default | Description | |
| 22 | +|:-------|:----:|:-------:|:--------:|:--:|:-------:|:------------| |
| 23 | +| `identifier` | text | | | | | Identifier given to the document by the source | |
| 24 | +| `ingest_config_id` | int | | | ✓ | | IngestConfig used to ingest the document | |
| 25 | + |
| 26 | +#### Other indices |
| 27 | +* `source_doc_id`, `ingest_config_id` (unique) |
| 28 | + |
| 29 | +### RawData |
| 30 | +Raw data, exactly as it was given to SHARE. |
| 31 | + |
| 32 | +| Column | Type | Indexed | Nullable | FK | Default | Description | |
| 33 | +|:-------|:----:|:-------:|:--------:|:--:|:-------:|:------------| |
| 34 | +| `suid_id` | int | | | ✓ | | SUID for this datum | |
| 35 | +| `data` | text | | | | | The raw data itself (typically JSON or XML string) | |
| 36 | +| `sha256` | text | unique | | | | SHA-256 hash of `data` | |
| 37 | +| `harvest_logs` | m2m | | | | | List of HarvestLogs for harvester runs that found this exact datum | |
| 38 | + |
| 39 | +## Ingest Configuration |
| 40 | + |
| 41 | +### IngestConfig |
| 42 | +Describes one way to harvest metadata from a Source, and how to transform the result. |
| 43 | + |
| 44 | +| Column | Type | Indexed | Nullable | FK | Default | Description | |
| 45 | +|:-------|:----:|:-------:|:--------:|:--:|:-------:|:------------| |
| 46 | +| `source_id` | int | | | ✓ | | Source to harvest from | |
| 47 | +| `base_url` | text | | | | | URL of the API or endpoint where the metadata is available | |
| 48 | +| `earliest_date` | date | | ✓ | | | Earliest date with available data | |
| 49 | +| `rate_limit_allowance` | int | | | | 5 | Number of requests allowed every `rate_limit_period` seconds | |
| 50 | +| `rate_limit_period` | int | | | | 1 | Number of seconds for every `rate_limit_allowance` requests | |
| 51 | +| `harvester_id` | int | | | ✓ | | Harvester to use | |
| 52 | +| `harvester_kwargs` | jsonb | | ✓ | | | JSON object passed to the harvester as kwargs | |
| 53 | +| `transformer_id` | int | | | ✓ | | Transformer to use | |
| 54 | +| `transformer_kwargs` | jsonb | | ✓ | | | JSON object passed to the transformer as kwargs, along with the harvested raw data | |
| 55 | +| `disabled` | bool | | | | False | True if this ingest config should not be run automatically | |
| 56 | + |
| 57 | +### Source |
| 58 | +A Source is a place metadata comes from. |
| 59 | + |
| 60 | +| Column | Type | Indexed | Nullable | FK | Default | Description | |
| 61 | +|:-------|:----:|:-------:|:--------:|:--:|:-------:|:------------| |
| 62 | +| `name` | text | unique | | | | Short name | |
| 63 | +| `long_title` | text | unique | | | | Full, human-friendly name | |
| 64 | +| `home_page` | text | | ✓ | | | URL | |
| 65 | +| `icon` | image | | ✓ | | | Recognizable icon for the source | |
| 66 | +| `user_id` | int | | | ✓ | | User with permission to submit data as this source (TODO: replace with django permissions stuff) | |
| 67 | + |
| 68 | +### Harvester |
| 69 | +Each row corresponds to a Harvester implementation in python. (TODO: describe those somewhere) |
| 70 | + |
| 71 | +| Column | Type | Indexed | Nullable | FK | Default | Description | |
| 72 | +|:-------|:----:|:-------:|:--------:|:--:|:-------:|:------------| |
| 73 | +| `key` | text | unique | | | | Key that can be used to get the corresponding Harvester subclass | |
| 74 | +| `date_created` | datetime | | | | now | | |
| 75 | +| `date_modified` | datetime | | | | now (on update) | | |
| 76 | + |
| 77 | +### Transformer |
| 78 | +Each row corresponds to a Transformer implementation in python. (TODO: describe those somewhere) |
| 79 | + |
| 80 | +| Column | Type | Indexed | Nullable | FK | Default | Description | |
| 81 | +|:-------|:----:|:-------:|:--------:|:--:|:-------:|:------------| |
| 82 | +| `key` | text | unique | | | | Key that can be used to get the corresponding Transformer subclass | |
| 83 | +| `date_created` | datetime | | | | now | | |
| 84 | +| `date_modified` | datetime | | | | now (on update) | | |
| 85 | + |
| 86 | +## Logs |
| 87 | + |
| 88 | +### HarvestLog |
| 89 | +Log entries to track the status of a specific harvester run. |
| 90 | + |
| 91 | +| Column | Type | Indexed | Nullable | FK | Default | Description | |
| 92 | +|:-------|:----:|:-------:|:--------:|:--:|:-------:|:------------| |
| 93 | +| `ingest_config_id` | int | | | ✓ | | IngestConfig for this harvester run | |
| 94 | +| `harvester_version` | text | | | | | Semantic version of the harvester, with each segment padded to 3 digits (e.g. '1.2.10' => '001.002.010') |
| 95 | +| `start_date` | datetime | | | | | Beginning of the date range to harvest | |
| 96 | +| `end_date` | datetime | | | | | End of the date range to harvest | |
| 97 | +| `started` | datetime | | | | | Time `status` was set to STARTED | |
| 98 | +| `status` | text | | | | INITIAL | Status of the harvester run, one of {INITIAL, STARTED, SPLIT, SUCCEEDED, FAILED} | |
| 99 | + |
| 100 | +#### Other indices |
| 101 | +* `ingest_config_id`, `harvester_version`, `start_date`, `end_date` (unique) |
| 102 | + |
| 103 | +### TransformLog |
| 104 | +Log entries to track the status of a transform task |
| 105 | + |
| 106 | +| Column | Type | Indexed | Nullable | FK | Default | Description | |
| 107 | +|:-------|:----:|:-------:|:--------:|:--:|:-------:|:------------| |
| 108 | +| `raw_id` | int | | | ✓ | | RawData to be transformed | |
| 109 | +| `ingest_config_id` | int | | | ✓ | | IngestConfig used | |
| 110 | +| `transformer_version` | text | | | | | Semantic version of the transformer, with each segment padded to 3 digits (e.g. '1.2.10' => '001.002.010') |
| 111 | +| `started` | datetime | | | | | Time `status` was set to STARTED | |
| 112 | +| `status` | text | | | | INITIAL | Status of the transform task, one of {INITIAL, STARTED, RESCHEDULED, SUCCEEDED, FAILED} | |
| 113 | + |
| 114 | +#### Other indices |
| 115 | +* `raw_id`, `transformer_version` (unique) |
0 commit comments