Amazon Redshift is a fully managed data warehouse solution designed to handle petabyte-scale data. Amazon Redshift offers better performance up to 3x compared to other cloud data warehouses. Since the service provides solutions for organizations of all sizes, you can simply scale up and down your clusters in minutes. While data is growing exponentially, Amazon Redshift offers stability and flexibility to handle business intelligence operations and helps businesses to keep the costs low compared to the alternatives.
In September 2020, AWS announced Data API for Amazon Redshift. With the built-in Data API, accessing Amazon Redshift becomes simpler and more secure. Let's look at some advantages and use cases of Data API for Amazon Redshift:
- There is no need to configure the database driver or manage connections.
- Data API uses IAM or AWS Secrets Manager therefore there is no need to use credentials inside API calls.
- Data API is asynchronous and it caches the query results for 24 hours. You can retrieve your results later or multiple times without running the query again.
- Lambda functions can now access Amazon Redshift API without depending on 3rd party drivers.
Before using Data API it is necessary to mention that there are few limitations:
- The size limit for a query is 100 MB.
- The size limit for a query statement is 100 KB.
- The size limit is 64 KB per row in the result set.
Let's take a look at how to use Data API from AWS CLI and AWS SDK.
Accessing Data API from the AWS CLI
AWS CLI now enables calling Data API with redshift-data
command. To use --secret-arn
parameter as an authentication credential, you must configure the AWS Secrets Manager first.
Running SQL commands
You use the aws redshift-data execute-statement
to run an SQL statement.
aws redshift-data execute-statement \
--database <db-name> \
--cluster-identifier <cluster-id> \
--secret-arn <secret-arn> \
--region <region> \
--sql "select * from test_table limit 1;"
Example response for the execute statement:
{
"Id": "0541583f-ffb2-4023-bf9e-9db50c95851e",
"ClusterIdentifier": "<cluster-id>",
"CreatedAt": 1598306924.632,
"Database": "<db-name>",
"SecretArn": "<secret-arn>"
}
You can later fetch the result of your SQL statement with redshift-data get-statement-result
:
aws redshift-data get-statement-result
--id 0541583f-ffb2-4023-bf9e-9db50c95851e
--region <region>
Accessing Data API from AWS SDK
Data API is also available in AWS SDK. Below you can find an example of how to call Data API using boto3
and Python
.
import boto3
client = boto3.client('redshift-data')
Running SQL commands
After generating the client, it is possible to run SQL commands through it. You can use the SecretArn
parameter to access the database after configuring AWS Secrets Manager.
response = client.execute_statement(
ClusterIdentifier='<cluster-id>’,
SecretArn='<secret-arn>’,
Sql='select * from test_table limit 1;’,
)
Example response for the statement above:
{
'ClusterIdentifier': <cluster-id>,
'Id': '88770b8d-b2ca-4311-bd04-0c39da81b479’,
'CreatedAt': datetime(2020, 10, 1),
'Database': '<db-name>’,
'DbUser': ‘<db-user>’,
'SecretArn': ‘<secret-arn>’
}
To fetch the results of the query, you need to use the ID generated by the execute statement:
response = client.get_statement_result(
Id=’88770b8d-b2ca-4311-bd04-0c39da81b479’
)
Done! As shown above, it is pretty easy to access your Amazon Redshift resources through Data API with AWS CLI or AWS SDK, depending on your preference.
About the Author:
Burak Balta
Lead Software EngineerAn AWS Certified Developer Associate, Burak is an experienced software engineer. With his experience in various industries including global technology companies, he follows his passion for going beyond the limits to build excellent products with collaboration and knowledge sharing.