|
5 | 5 | ### {ModelName} |
6 | 6 | {Description} |
7 | 7 |
|
8 | | -| Column | Type | Indexed | Nullable | FK | Default | Description | |
9 | | -|:-------|:----:|:-------:|:--------:|:--:|:-------:|:------------| |
10 | | -| | | ✓ | ✓ | ✓ | | | |
| 8 | +| Column | Type | Indexed | Nullable | FK | Default | Description | |
| 9 | +| :----- | :--: | :-----: | :------: | :-: | :-----: | :---------- | |
| 10 | +| | | ✓ | ✓ | ✓ | | | |
11 | 11 |
|
12 | 12 | #### Other indices |
13 | 13 | * `{column_name}`, `{column_name}`, ... [(unique)] |
|
18 | 18 | ### SourceUniqueIdentifier (SUID) |
19 | 19 | Identifier for a specific document from a specific source. |
20 | 20 |
|
21 | | -| Column | Type | Indexed | Nullable | FK | Default | Description | |
22 | | -|:-------|:----:|:-------:|:--------:|:--:|:-------:|:------------| |
23 | | -| `identifier` | text | | | | | Identifier given to the document by the source | |
24 | | -| `ingest_config_id` | int | | | ✓ | | IngestConfig used to ingest the document | |
| 21 | +| Column | Type | Indexed | Nullable | FK | Default | Description | |
| 22 | +| :----------------- | :--: | :-----: | :------: | :-: | :-----: | :--------------------------------------------- | |
| 23 | +| `identifier` | text | | | | | Identifier given to the document by the source | |
| 24 | +| `ingest_config_id` | int | | | ✓ | | IngestConfig used to ingest the document | |
25 | 25 |
|
26 | 26 | #### Other indices |
27 | 27 | * `source_doc_id`, `ingest_config_id` (unique) |
28 | 28 |
|
29 | 29 | ### RawData |
30 | 30 | Raw data, exactly as it was given to SHARE. |
31 | 31 |
|
32 | | -| Column | Type | Indexed | Nullable | FK | Default | Description | |
33 | | -|:-------|:----:|:-------:|:--------:|:--:|:-------:|:------------| |
34 | | -| `suid_id` | int | | | ✓ | | SUID for this datum | |
35 | | -| `data` | text | | | | | The raw data itself (typically JSON or XML string) | |
36 | | -| `sha256` | text | unique | | | | SHA-256 hash of `data` | |
37 | | -| `harvest_logs` | m2m | | | | | List of HarvestLogs for harvester runs that found this exact datum | |
| 32 | +| Column | Type | Indexed | Nullable | FK | Default | Description | |
| 33 | +| :------------- | :--: | :-----: | :------: | :-: | :-----: | :----------------------------------------------------------------- | |
| 34 | +| `suid_id` | int | | | ✓ | | SUID for this datum | |
| 35 | +| `data` | text | | | | | The raw data itself (typically JSON or XML string) | |
| 36 | +| `sha256` | text | unique | | | | SHA-256 hash of `data` | |
| 37 | +| `harvest_logs` | m2m | | | | | List of HarvestLogs for harvester runs that found this exact datum | |
38 | 38 |
|
39 | 39 | ## Ingest Configuration |
40 | 40 |
|
41 | 41 | ### IngestConfig |
42 | 42 | Describes one way to harvest metadata from a Source, and how to transform the result. |
43 | 43 |
|
44 | | -| Column | Type | Indexed | Nullable | FK | Default | Description | |
45 | | -|:-------|:----:|:-------:|:--------:|:--:|:-------:|:------------| |
46 | | -| `source_id` | int | | | ✓ | | Source to harvest from | |
47 | | -| `base_url` | text | | | | | URL of the API or endpoint where the metadata is available | |
48 | | -| `earliest_date` | date | | ✓ | | | Earliest date with available data | |
49 | | -| `rate_limit_allowance` | int | | | | 5 | Number of requests allowed every `rate_limit_period` seconds | |
50 | | -| `rate_limit_period` | int | | | | 1 | Number of seconds for every `rate_limit_allowance` requests | |
51 | | -| `harvester_id` | int | | | ✓ | | Harvester to use | |
52 | | -| `harvester_kwargs` | jsonb | | ✓ | | | JSON object passed to the harvester as kwargs | |
53 | | -| `transformer_id` | int | | | ✓ | | Transformer to use | |
54 | | -| `transformer_kwargs` | jsonb | | ✓ | | | JSON object passed to the transformer as kwargs, along with the harvested raw data | |
55 | | -| `disabled` | bool | | | | False | True if this ingest config should not be run automatically | |
| 44 | +| Column | Type | Indexed | Nullable | FK | Default | Description | |
| 45 | +| :--------------------- | :---: | :-----: | :------: | :-: | :-----: | :--------------------------------------------------------------------------------- | |
| 46 | +| `source_id` | int | | | ✓ | | Source to harvest from | |
| 47 | +| `base_url` | text | | | | | URL of the API or endpoint where the metadata is available | |
| 48 | +| `earliest_date` | date | | ✓ | | | Earliest date with available data | |
| 49 | +| `rate_limit_allowance` | int | | | | 5 | Number of requests allowed every `rate_limit_period` seconds | |
| 50 | +| `rate_limit_period` | int | | | | 1 | Number of seconds for every `rate_limit_allowance` requests | |
| 51 | +| `harvester_id` | int | | | ✓ | | Harvester to use | |
| 52 | +| `harvester_kwargs` | jsonb | | ✓ | | | JSON object passed to the harvester as kwargs | |
| 53 | +| `transformer_id` | int | | | ✓ | | Transformer to use | |
| 54 | +| `transformer_kwargs` | jsonb | | ✓ | | | JSON object passed to the transformer as kwargs, along with the harvested raw data | |
| 55 | +| `disabled` | bool | | | | False | True if this ingest config should not be run automatically | |
56 | 56 |
|
57 | 57 | ### Source |
58 | 58 | A Source is a place metadata comes from. |
59 | 59 |
|
60 | | -| Column | Type | Indexed | Nullable | FK | Default | Description | |
61 | | -|:-------|:----:|:-------:|:--------:|:--:|:-------:|:------------| |
62 | | -| `name` | text | unique | | | | Short name | |
63 | | -| `long_title` | text | unique | | | | Full, human-friendly name | |
64 | | -| `home_page` | text | | ✓ | | | URL | |
65 | | -| `icon` | image | | ✓ | | | Recognizable icon for the source | |
66 | | -| `user_id` | int | | | ✓ | | User with permission to submit data as this source (TODO: replace with django permissions stuff) | |
| 60 | +| Column | Type | Indexed | Nullable | FK | Default | Description | |
| 61 | +| :----------- | :---: | :-----: | :------: | :-: | :-----: | :----------------------------------------------------------------------------------------------- | |
| 62 | +| `name` | text | unique | | | | Short name | |
| 63 | +| `long_title` | text | unique | | | | Full, human-friendly name | |
| 64 | +| `home_page` | text | | ✓ | | | URL | |
| 65 | +| `icon` | image | | ✓ | | | Recognizable icon for the source | |
| 66 | +| `user_id` | int | | | ✓ | | User with permission to submit data as this source (TODO: replace with django permissions stuff) | |
67 | 67 |
|
68 | 68 | ### Harvester |
69 | 69 | Each row corresponds to a Harvester implementation in python. (TODO: describe those somewhere) |
70 | 70 |
|
71 | | -| Column | Type | Indexed | Nullable | FK | Default | Description | |
72 | | -|:-------|:----:|:-------:|:--------:|:--:|:-------:|:------------| |
73 | | -| `key` | text | unique | | | | Key that can be used to get the corresponding Harvester subclass | |
74 | | -| `date_created` | datetime | | | | now | | |
75 | | -| `date_modified` | datetime | | | | now (on update) | | |
| 71 | +| Column | Type | Indexed | Nullable | FK | Default | Description | |
| 72 | +| :-------------- | :------: | :-----: | :------: | :-: | :-------------: | :--------------------------------------------------------------- | |
| 73 | +| `key` | text | unique | | | | Key that can be used to get the corresponding Harvester subclass | |
| 74 | +| `date_created` | datetime | | | | now | | |
| 75 | +| `date_modified` | datetime | | | | now (on update) | | |
76 | 76 |
|
77 | 77 | ### Transformer |
78 | 78 | Each row corresponds to a Transformer implementation in python. (TODO: describe those somewhere) |
79 | 79 |
|
80 | | -| Column | Type | Indexed | Nullable | FK | Default | Description | |
81 | | -|:-------|:----:|:-------:|:--------:|:--:|:-------:|:------------| |
82 | | -| `key` | text | unique | | | | Key that can be used to get the corresponding Transformer subclass | |
83 | | -| `date_created` | datetime | | | | now | | |
84 | | -| `date_modified` | datetime | | | | now (on update) | | |
| 80 | +| Column | Type | Indexed | Nullable | FK | Default | Description | |
| 81 | +| :-------------- | :------: | :-----: | :------: | :-: | :-------------: | :----------------------------------------------------------------- | |
| 82 | +| `key` | text | unique | | | | Key that can be used to get the corresponding Transformer subclass | |
| 83 | +| `date_created` | datetime | | | | now | | |
| 84 | +| `date_modified` | datetime | | | | now (on update) | | |
85 | 85 |
|
86 | 86 | ## Logs |
87 | 87 |
|
88 | 88 | ### HarvestLog |
89 | 89 | Log entries to track the status of a specific harvester run. |
90 | 90 |
|
91 | | -| Column | Type | Indexed | Nullable | FK | Default | Description | |
92 | | -|:-------|:----:|:-------:|:--------:|:--:|:-------:|:------------| |
93 | | -| `ingest_config_id` | int | | | ✓ | | IngestConfig for this harvester run | |
94 | | -| `harvester_version` | text | | | | | Semantic version of the harvester, with each segment padded to 3 digits (e.g. '1.2.10' => '001.002.010') |
95 | | -| `start_date` | datetime | | | | | Beginning of the date range to harvest | |
96 | | -| `end_date` | datetime | | | | | End of the date range to harvest | |
97 | | -| `started` | datetime | | | | | Time `status` was set to STARTED | |
98 | | -| `status` | text | | | | INITIAL | Status of the harvester run, one of {INITIAL, STARTED, SPLIT, SUCCEEDED, FAILED} | |
| 91 | +| Column | Type | Indexed | Nullable | FK | Default | Description | |
| 92 | +| :------------------ | :------: | :-----: | :------: | :-: | :-----: | :------------------------------------------------------------------------------------------------------- | |
| 93 | +| `ingest_config_id` | int | | | ✓ | | IngestConfig for this harvester run | |
| 94 | +| `harvester_version` | text | | | | | Semantic version of the harvester, with each segment padded to 3 digits (e.g. '1.2.10' => '001.002.010') | |
| 95 | +| `start_date` | datetime | | | | | Beginning of the date range to harvest | |
| 96 | +| `end_date` | datetime | | | | | End of the date range to harvest | |
| 97 | +| `started` | datetime | | | | | Time `status` was set to STARTED | |
| 98 | +| `status` | text | | | | INITIAL | Status of the harvester run, one of {INITIAL, STARTED, SPLIT, SUCCEEDED, FAILED} | |
99 | 99 |
|
100 | 100 | #### Other indices |
101 | 101 | * `ingest_config_id`, `harvester_version`, `start_date`, `end_date` (unique) |
102 | 102 |
|
103 | 103 | ### TransformLog |
104 | 104 | Log entries to track the status of a transform task |
105 | 105 |
|
106 | | -| Column | Type | Indexed | Nullable | FK | Default | Description | |
107 | | -|:-------|:----:|:-------:|:--------:|:--:|:-------:|:------------| |
108 | | -| `raw_id` | int | | | ✓ | | RawData to be transformed | |
109 | | -| `ingest_config_id` | int | | | ✓ | | IngestConfig used | |
110 | | -| `transformer_version` | text | | | | | Semantic version of the transformer, with each segment padded to 3 digits (e.g. '1.2.10' => '001.002.010') |
111 | | -| `started` | datetime | | | | | Time `status` was set to STARTED | |
112 | | -| `status` | text | | | | INITIAL | Status of the transform task, one of {INITIAL, STARTED, RESCHEDULED, SUCCEEDED, FAILED} | |
| 106 | +| Column | Type | Indexed | Nullable | FK | Default | Description | |
| 107 | +| :-------------------- | :------: | :-----: | :------: | :-: | :-----: | :--------------------------------------------------------------------------------------------------------- | |
| 108 | +| `raw_id` | int | | | ✓ | | RawData to be transformed | |
| 109 | +| `ingest_config_id` | int | | | ✓ | | IngestConfig used | |
| 110 | +| `transformer_version` | text | | | | | Semantic version of the transformer, with each segment padded to 3 digits (e.g. '1.2.10' => '001.002.010') | |
| 111 | +| `started` | datetime | | | | | Time `status` was set to STARTED | |
| 112 | +| `status` | text | | | | INITIAL | Status of the transform task, one of {INITIAL, STARTED, RESCHEDULED, SUCCEEDED, FAILED} | |
113 | 113 |
|
114 | 114 | #### Other indices |
115 | 115 | * `raw_id`, `transformer_version` (unique) |
0 commit comments