This guide describes how to sink data from RisingWave to Elasticsearch using the Elasticsearch sink connector in RisingWave.
Parameter | Description |
---|---|
sink_name | Name of the sink to be created. |
sink_from | A clause that specifies the direct source from which data will be output. sink_from can be a materialized view or a table. Either this clause or a SELECT query must be specified. |
AS select_query | A SELECT query that specifies the data to be output to the sink. Either this query or a FROM clause must be specified. See SELECT for the syntax and examples of the SELECT command. |
primary_key | Optional. The primary keys of the sink. If the primary key has multiple columns, set a delimiter in the delimiter parameter below to join them. |
index | Required if index_column is not set. Name of the Elasticsearch index that you want to write data to. |
index_column | Allows writing to multiple indexes dynamically based on the column’s value. This parameter is mutually exclusive with index . When index is set, the write index is index ; When index_column is set, the target index is the value of this column, and it must be of string type. Avoiding setting this column as the first column, because the sink defaults to the first column as the key. |
routing_column | Optional. Allows setting a column as a routing key, so that it can direct writes to specific shards based on its value. |
retry_on_conflict | Optional. Number of retry attempts after an optimistic locking conflict occurs. |
batch_size_kb | Optional. Maximum size (in kilobytes) for each request batch sent to Elasticsearch. If the data exceeds this size,the current batch will be sent. |
batch_num_messages | Optional. Maximum number of messages per request batch sent to Elasticsearch. If the number of messages exceeds this size,the current batch will be sent. |
concurrent_requests | Optional. Maximum number of concurrent threads for sending requests. |
url | Required. URL of the Elasticsearch REST API endpoint. |
username | Optional. elastic user name for accessing the Elasticsearch endpoint. It must be used with password. |
password | Optional. Password for accessing the Elasticsearch endpoint. It must be used with username. |
delimiter | Optional. Delimiter for Elasticsearch ID when the sink’s primary key has multiple columns. |
type
. In Elasticsearch 6.x, users could directly set the type, but starting from 7.x, it is set to not recommended and the default value is unified to _doc
. In version 8.x, the type has been completely removed. See Elasticsearch’s official documentation for more details.
So, if you are using Elasticsearch 7.x, we set it to the official’s recommended value, which is _doc
. If you are using Elasticsearch 8.x, this parameter has been removed by the Elasticsearch official, so no setting is required.
upsert
sink type. It does not support the append-only
sink type.
If you want to customize your Elasticsearch ID, please specify it via the primary_key
parameter. RisingWave will combine multiple primary key values into a single string with the delimiter you set, and use it as the Elasticsearch ID.
If you don’t want to customize your Elasticsearch ID, RisingWave will use the first column in the sink definition as the Elasticsearch ID.
RisingWave Data Type | ElasticSearch Field Type |
---|---|
boolean | boolean |
smallint | long |
integer | long |
bigint | long |
numeric | text |
real | float |
double precision | float |
character varying | text |
bytea | text |
date | date |
time without time zone | text |
timestamp without time zone | text |
timestamp with time zone | text |
interval | text |
struct | object |
array | array |
JSONB | object (RisingWave’s Elasticsearch sink will send JSONB as a JSON string, and Elasticsearch will convert it into an object) |
CREATE TABLE
. Instead, it infers the schema on-the-fly based on the first record ingested. For example, if a record contains a jsonb {v1: 100}
, v1 will be inferred as a long type. However, if the next record is {v1: "abc"}
, the ingestion will fail because "abc"
is inferred as a string and the two types are incompatible.
This behavior should be noted, or your data may be less than it should be. In terms of monitoring, you can check out Grafana, where there is a panel for all sink write errors.