Skip to content

Commit a0319ed

Browse files
committed
Merge branch 'feat/depot-bigtable-sink' of https://github.com/odpf/depot into feat/depot-bigtable-sink
2 parents 99994de + eea0592 commit a0319ed

7 files changed

Lines changed: 123 additions & 4 deletions

File tree

build.gradle

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ plugins {
2020
}
2121

2222
group 'io.odpf'
23-
version '0.3.3'
23+
version '0.3.4-beta.1'
2424

2525
repositories {
2626
mavenCentral()

docs/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ GRPC)
1111
* Log Sink
1212
* Bigquery Sink
1313
* Redis Sink
14+
* Bigtable Sink
1415

1516
Depot is a sink connector, which acts as a bridge between data processing systems and real sink. The APIs in this
1617
library can be used to push data to various sinks. Common sinks implementations will be added in this repo.

docs/reference/configuration/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,4 +7,6 @@ This page contains reference for all the configurations for sink connectors.
77
* [Generic](generic.md)
88
* [Stencil Client](stencil-client.md)
99
* [Bigquery Sink](bigquery-sink.md)
10+
* [Redis Sink](redis.md)
11+
* [Bigtable Sink](bigtable.md)
1012

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
# Bigtable Sink
2+
3+
A Bigtable sink requires the following variables to be set along with Generic ones
4+
5+
## `SINK_BIGTABLE_GOOGLE_CLOUD_PROJECT_ID`
6+
7+
Contains information of google cloud project id of the bigtable table where the records need to be inserted/updated. Further
8+
documentation on google cloud [project id](https://cloud.google.com/resource-manager/docs/creating-managing-projects).
9+
10+
* Example value: `gcp-project-id`
11+
* Type: `required`
12+
13+
## `SINK_BIGTABLE_INSTANCE_ID`
14+
15+
A Bigtable instance is a container for your data, which contain clusters that your applications can connect to. Each cluster contains nodes, compute units that manage your data and perform maintenance tasks.
16+
17+
A table belongs to an instance, not to a cluster or node. Here you provide the name of that bigtable instance your table belongs to. Further
18+
documentation on [bigtable Instances, clusters, and nodes](https://cloud.google.com/bigtable/docs/instances-clusters-nodes).
19+
20+
* Example value: `cloud-bigtable-instance-id`
21+
* Type: `required`
22+
23+
## `SINK_BIGTABLE_CREDENTIAL_PATH`
24+
25+
Full path of google cloud credentials file. Further documentation of google cloud authentication
26+
and [credentials](https://cloud.google.com/docs/authentication/getting-started).
27+
28+
* Example value: `/.secret/google-cloud-credentials.json`
29+
* Type: `required`
30+
31+
## `SINK_BIGTABLE_TABLE_ID`
32+
33+
Bigtable stores data in massively scalable tables, each of which is a sorted key/value map.
34+
35+
Here you provide the name of the table where the records need to be inserted/updated. Further documentation on
36+
[bigtable tables](https://cloud.google.com/bigtable/docs/managing-tables).
37+
38+
* Example value: `depot-sample-table`
39+
* Type: `required`
40+
41+
## `SINK_BIGTABLE_ROW_KEY_TEMPLATE`
42+
43+
Bigtable tables are composed of rows, each of which typically describes a single entity. Each row is indexed by a single row key.
44+
45+
Here you provide a string template which will be used to create row keys using one or many fields of your input data. Further documentation on [Bigtable storage model](https://cloud.google.com/bigtable/docs/overview#storage-model).
46+
47+
In the example below, If field_1 and field_2 are `String` and `Integer` data types respectively with values as `alpha` and `10` for a specific record, row key generated for this record will be: `key-alpha-10`
48+
49+
* Example value: `key-%s-%d, field_1, field_2`
50+
* Type: `required`
51+
52+
## `SINK_BIGTABLE_COLUMN_FAMILY_MAPPING`
53+
54+
Bigtable columns that are related to one another are typically grouped into a column family. Each column is identified by a combination of the column family and a column qualifier, which is a unique name within the column family.
55+
56+
Here you provide the mapping of the table's `column families` and `qualifiers`, and the field names from input data that we intent to insert into the table. Further documentation on [Bigtable storage model](https://cloud.google.com/bigtable/docs/overview#storage-model).
57+
58+
Please note that `Column families` being provided in this configuration, need to exist in the table beforehand. While `Column Qualifiers` will be created if they don't exist.
59+
60+
* Example value: `{ "depot-sample-family" : { "depot-sample-qualifier-1" : "field_1", "depot-sample-qualifier-2" : "field_7", "depot-sample-qualifier-3" : "field_5"} }`
61+
* Type: `required`

docs/reference/metrics.md

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,11 @@ Sinks can have their own metrics, and they will be emmited while using sink conn
66
## Table of Contents
77

88
* [Bigquery Sink](metrics.md#bigquery-sink)
9+
* [Bigtable Sink](metrics.md#bigtable-sink)
910

1011
## Bigquery Sink
1112

12-
### `Biquery Operation Total`
13+
### `Bigquery Operation Total`
1314

1415
Total number of bigquery API operation performed
1516

@@ -19,7 +20,20 @@ Time taken for bigquery API operation performed
1920

2021
### `Bigquery Errors Total`
2122

22-
Total numbers of error occurred on bigquery insert operation.
23+
Total numbers of error occurred on bigquery insert operation
2324

25+
## Bigtable Sink
26+
27+
### `Bigtable Operation Total`
28+
29+
Total number of bigtable insert/update operation performed
30+
31+
### `Bigtable Operation Latency`
32+
33+
Time taken for bigtable insert/update operation performed
34+
35+
### `Bigtable Errors Total`
36+
37+
Total numbers of error occurred on bigtable insert/update operation
2438

2539

docs/reference/odpf_sink_response.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,8 @@ These errors are returned by sinks in the OdpfSinkResponse object. The error typ
1717
* UNKNOWN_FIELDS_ERROR
1818
* SINK_4XX_ERROR
1919
* SINK_5XX_ERROR
20+
* SINK_RETRYABLE_ERROR
2021
* SINK_UNKNOWN_ERROR
2122
* DEFAULT_ERROR
22-
* If no error is specified
23+
* If no error is specified (To be deprecated soon)
2324

docs/sinks/bigtable.md

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
# Bigtable Sink
2+
3+
## Overview
4+
Depot Bigtable Sink translates protobuf messages to bigtable records and insert them to a bigtable table. Its other responsibilities include validating the provided [column-family-schema](../reference/configuration/bigtable.md#sink_bigtable_column_family_mapping), and check whether the configured table exists in [Bigtable instance](../reference/configuration/bigtable.md#sink_bigtable_instance_id) or not.
5+
6+
Depot uses [Java Client Library for the Cloud Bigtable API](https://cloud.google.com/bigtable/docs/reference/libraries) to perform any operations on Bigtable.
7+
8+
## Setup Required
9+
To be able to insert/update records in Bigtable, One must have following setup in place:
10+
11+
* [Bigtable Instance](../reference/configuration/bigtable.md#sink_bigtable_instance_id) belonging to the [GCP project](../reference/configuration/bigtable.md#sink_bigtable_google_cloud_project_id) provided in configuration
12+
* Bigtable [Table](../reference/configuration/bigtable.md#sink_bigtable_table_id) where the records are supposed to be inserted/updated
13+
* Column families that are provided as part of [column-family-mapping](../reference/configuration/bigtable.md#sink_bigtable_column_family_mapping)
14+
* Google cloud [Bigtable IAM permission](https://cloud.google.com/bigtable/docs/access-control) required to access and modify the configured Bigtable Instance and Table
15+
16+
## Metrics
17+
18+
Check out the list of [metrics](../reference/metrics.md#bigtable-sink) captured under Bigtable Sink.
19+
20+
## Error Handling
21+
22+
[BigtableResponse](../../src/main/java/io/odpf/depot/bigtable/response/BigTableResponse.java) class have the list of failed [mutations](https://cloud.google.com/bigtable/docs/writes#write-types). [BigtableResponseParser](../../src/main/java/io/odpf/depot/bigtable/parser/BigTableResponseParser.java) looks for errors from each failed mutation and create [ErrorInfo](../../src/main/java/io/odpf/depot/error/ErrorInfo.java) objects based on the type/HttpStatusCode of the underlying error. This error info is then sent to the application.
23+
24+
| Error From Bigtable | Error Type Captured |
25+
| --------------- | -------------------- |
26+
| Retryable Error | SINK_RETRYABLE_ERROR |
27+
| Having status code in range 400-499 | SINK_4XX_ERROR |
28+
| Having status code in range 500-599 | SINK_5XX_ERROR |
29+
| Any other Error | SINK_UNKNOWN_ERROR |
30+
31+
### Error Telemetry
32+
33+
[BigtableResponseParser](../../src/main/java/io/odpf/depot/bigtable/parser/BigTableResponseParser.java) looks for any specific error types sent from Bigtable and capture those under [BigtableTotalErrorMetrics](../reference/metrics.md#bigtable-sink) with suitable error tags.
34+
35+
| Error Type | Error Tag Assigned |
36+
| --------------- | -------------------- |
37+
| Bad Request | BAD_REQUEST |
38+
| Quota Failure | QUOTA_FAILURE |
39+
| Precondition Failure | PRECONDITION_FAILURE |
40+
| Any other Error | RPC_FAILURE |

0 commit comments

Comments
 (0)