Accessing Amazon Redshift with Built-in Data API

Amazon Redshift is a fully managed data warehouse solution designed to handle petabyte-scale data. Amazon Redshift offers better performance up to 3x compared to other cloud data warehouses. Since the service provides solutions for organizations of all sizes, you can simply scale up and down your clusters in minutes. While data is growing exponentially, Amazon Redshift offers stability and flexibility to handle business intelligence operations and helps businesses to keep the costs low compared to the alternatives.

In September 2020, AWS announced Data API for Amazon Redshift. With the built-in Data API, accessing Amazon Redshift becomes simpler and more secure. Let's look at some advantages and use cases of Data API for Amazon Redshift:

There is no need to configure the database driver or manage connections.
Data API uses IAM or AWS Secrets Manager therefore there is no need to use credentials inside API calls.
Data API is asynchronous and it caches the query results for 24 hours. You can retrieve your results later or multiple times without running the query again.
Lambda functions can now access Amazon Redshift API without depending on 3^rd party drivers.

Before using Data API it is necessary to mention that there are few limitations:

The size limit for a query is 100 MB.
The size limit for a query statement is 100 KB.
The size limit is 64 KB per row in the result set.

Let's take a look at how to use Data API from AWS CLI and AWS SDK.

Accessing Data API from the AWS CLI

AWS CLI now enables calling Data API with redshift-data command. To use --secret-arn parameter as an authentication credential, you must configure the AWS Secrets Manager first.

Running SQL commands

You use the aws redshift-data execute-statement to run an SQL statement.

    aws redshift-data execute-statement \
    --database  \
    --cluster-identifier  \
    --secret-arn   \
    --region  \
    --sql "select * from test_table limit 1;"

Example response for the execute statement:

    {
        "Id": "0541583f-ffb2-4023-bf9e-9db50c95851e",
        "ClusterIdentifier": "",
        "CreatedAt": 1598306924.632,
        "Database": "",
        "SecretArn": ""
    }

You can later fetch the result of your SQL statement with redshift-data get-statement-result:

    aws redshift-data get-statement-result 
        --id 0541583f-ffb2-4023-bf9e-9db50c95851e
        --region

Accessing Data API from AWS SDK

Data API is also available in AWS SDK. Below you can find an example of how to call Data API using boto3 and Python.

    import boto3
    client = boto3.client('redshift-data')

Running SQL commands

After generating the client, it is possible to run SQL commands through it. You can use the SecretArn parameter to access the database after configuring AWS Secrets Manager.

    response = client.execute_statement(
        ClusterIdentifier='’,
        SecretArn='’,
        Sql='select * from test_table limit 1;’,
    )

Example response for the statement above:

    {
        'ClusterIdentifier': ,
        'Id': '88770b8d-b2ca-4311-bd04-0c39da81b479’,
        'CreatedAt': datetime(2020, 10, 1),
        'Database': '’,
        'DbUser': ‘’,
        'SecretArn': ‘’
    }

To fetch the results of the query, you need to use the ID generated by the execute statement:

    response = client.get_statement_result(
        Id=’88770b8d-b2ca-4311-bd04-0c39da81b479’
    )

Done! As shown above, it is pretty easy to access your Amazon Redshift resources through Data API with AWS CLI or AWS SDK, depending on your preference.

Accessing Amazon Redshift with Built-in DataAPI

About the Author:

Start Your Digital Transformation

Accessing Amazon Redshift with Built-in DataAPI

Accessing Data API from the AWS CLI

Running SQL commands

Accessing Data API from AWS SDK

Running SQL commands

About the Author:

Burak Balta

Start Your Digital Transformation

What are cookies