When it comes to using databases, there are some important points to consider when designing and operating databases to avoid bottlenecks and performance issues. When you use any kind of other databases on any platform you need to optimize your queries, table schemas, and your data before or after you design your database system. If you are not careful in these steps this will result in more cost and more operational problems at the end of the day.
Item size is one of optimization points because if you save unnecessarily large items in your table it will reduce your write speed. Also your database size will increase too fast as a result performance will be degraded which will affect your next read-write speed and at some point, your database will crash. In this blog, I will focus upon item sizes for DynamoDB.
DynamoDB is a NoSQL type database and with the help of AWS, it is fast and flexible. If you are not familiar with DynamoDB you can read our blog post DynamoDB Basics. Here we will focus on some features of DynamoDB to understand the importance of item sizes for the performance and cost. If you want to know how to optimize your costs of DynamoDB beyond this, check our DynamoDB Pricing Optimization blog post.
The first feature is Capacity Modes in DynamoDB. There are two read/write capacity modes; On-demand and Provisioned.
In on-demand capacity mode you don't have to worry about and set read and write capacity for your application; you pay per request for data reads and writes.
In provisioned capacity mode you set the read and write capacity that you require for your application, but you need to be careful in this mode because you are manually configuring the minimum and the maximum number of capacity units. If you configure your limits below your application needed, this can cause throttling. When you face throttling this means when you try to read or write a request to your table these operations will be rejected due to an inaccessible table. Don't worry if you are thinking about exceeding the capacity that you set. At this point you can enable the auto scaling mode with this mode you can adjust your table capacity automatically based upon the response to traffic changes. Additionally, this mode has a free tier option.
The second feature is the Consistency models I’d like to discuss in DynamoDB. There are two models, eventual consistency and strongly consistent.
By nature of DynamoDB data is stored as multiple copies to ensure durability and changes are applied in the background on the data. Eventual consistency is the default choice when you use DynamoDB without making any changes. This model uses less compute power but when you read the data they may not reflect recently completed operations. As an example when you delete an item using this model when you query after a short amount of time this deleted item you may see the item still there, then deleted later. This is the most important feature that makes DynamoDB fast.
In this method, you can understand what is given to you from its name. When you choose a strongly consistent model over an eventually consistent model you are using more capacity but getting results for all writes that received a successful response prior to the read. That means you can be sure you will get the latest version of the data you successfully write before reading. For this method, you can specify optional parameters in your requests.
Relationship between capacity usage of them is:
After this point I will explain examples and terms based on above parameters.
Transactional read and requests in DynamoDB are different from a standard read and write operation. The reason for this is that all operations in a single transaction are guaranteed to succeed or fail.
Firstly I want to talk about item sizes while reading requests in DynamoDB. Read requests are actually GetItem API calls to read a single item from a table. Definitions and terms change slightly depending on which capacity mode is used. Let's explain this.
Items can be up to 400 KB, so reads can range from 0.5 to 100 RRU/RCUs or from 1 to 200 RRU/RCUs for transactional read requests. DynamoDB uses 1 RRU or RCU even if the requested item doesn't exist.
DynamoDB rounds item sizes up to the next 4 KB multiple. For example, if you read an item that is 2.5 KB, DynamoDB rounds the item size to 4 KB. If you read an item of 9 KB, DynamoDB rounds the item size to 12 KB. DynamoDB uses 0.5 - 1 RRU/RCU for the first example read, and 1.5 - 3 RRU/RCU for the second example item size depending on the capacity mode and consistency models. Let's look at the other read request types.
BatchGetItem helps you to read up to 100 items from a single table or multiple tables. Each request in the BatchGetItem is actually an individual GetItem request. In this process, each item size rounds up the next 4 KB multiple and then the total size is calculated.
For example, you have three items to read, and assuming the item sizes are 1 KB, 5 KB, and 9.5 KB when you use BatchGetItem, DynamoDB will first round the total size to a multiple of 4KB each and then aggregate. For this example, the calculation would be as follows. 4 KB + 8 KB + 12 KB = 24 KB.
You can use Query for reading multiple items that have the same partition key. When you use this operation, consumed capacity will depend on the total size of items being accessed. For example, the query accessed 20 items with a combined size of 49 KB and 3 KB. DynamoDB will round the item size to 52 KB.
With Scan you can read all items on a table. Consumed capacity will be calculated for the size of all accessed items not the size of the returned items.
Let’s take a look at item sizes while making write requests. Write requests have different behavior than the read request mentioned above. For the write requests there is no difference between on-demand WRU (write request unit and provisioned WCU (write capacity unit) modes.
Items can be up to 400 KB, so writes can range from 1 to 400 WRU/WCUs, from 2 to 800 WRU/WCUs for transactional write requests. Also DynamoDB rounds item sizes up to the next 1 KB multiple. There are other operations for writing requests. Let's explain them also.
When you use PutItem you can write a single item to a table. If the same key exists, PutItem will overwrite the item. A larger item size will be chosen for consumption. For example, replacing a 3 KB item with a 5 KB item will consume 5 WRU/WCUs. Next requests will consume only 1 WRU/WCU.
BatchWriteItem helps you to put or delete up to 25 items to tables. Single BatchWriteItem can transmit up to 16 MB of data. DynamoDB processes individually PutItem or DeleteItem but different than normal. For example, you cannot specify condition expressions for fields on individual put and delete requests like normal put and delete operations. You can put or delete items in multiple tables.
If you want to modify a single item in the table you can use UpdateItem. For consumption, DynamoDB looks at the larger item sizes before and after the update. When you update just one of the item's attributes, UpdateItem will consume the larger size of all of the item's pre-existing attributes before and after.
DeleteItem is simple: you can remove a single item from a table using this. The size will be the size of the deleted item for consumption. If the item doesn't exist, the request will use 1 WRU/WCU.
In DynamoDB, items are based on attributes obviously the item's size will be the total of all these attribute sizes. I will explain data types and how they are calculated.
Strings are Unicode with UTF-8 binary encoding. This means each character uses 1 to 4 bytes. For example alphabets like English, each character is 1 byte but like £ and ₺ currency symbols are 2 bytes but the $ symbol is 1 byte. However, some other language alphabet letters are 2 or 3 bytes.
Numbers are variable in length, with up to 38 significant digits. Leading and trailing zeros are trimmed. The size of a number is approximate and can be calculated with the below formula according to AWS documents. In my test when I tried 4 as a number the size of it is 2 bytes, -4 is calculated to 3 bytes. When the number increases, for example if I put 112, the size of it is 3 bytes, for the -112 number size of it is 4 bytes but when I try to put 1123 as a number size is 3 bytes. This may look a bit confusing but DynamoDB assumes uneven digit numbers rounded up like examples. For negative numbers DynamoDB adds an extra 1 byte.
(length of attribute name) + (1 byte per two significant digits) + (1 byte)
Binary type is easy to calculate because each byte uses 1 byte. The size is the total number of bytes in the attribute. A binary value must be encoded in base64 format before it can be sent to DynamoDB, but the value's raw byte length is used for calculating size.
The Boolean type is also easy to understand; it can take only two values, true or false and uses 1 byte for each one.
Null type actually not null it still indicates absence data and uses 1 byte also displays as a true.
List and Map items size calculation is actually similar. Both of them require 3 bytes; this means even if the map or list is empty it will still use 3 bytes. Total size with values in it will use the sum of sizes according to data types of elements. Lastly for each key-value pair in a map and for each element in a list it will add extra 1 byte.
Let me give an example to you for a much better understanding of item sizes. In this example you can see all types of data that I explain above and you can see the total table size. This result gives you the total size of all tables and we will find this result as a calculation result.
First let's calculate attribute names size:
pk + sk + str + bool +null + bin + num + list + map = 28 bytes
Second calculate strings in attributes:
testid + test + example + QQ==(base64 encoded binary) = 21 bytes
Lastly let's calculate other data types:
bool: 1 bytes, null: 1 bytes , number: 3 bytes, list: 5 bytes, map: 12 bytes. Total 22 bytes
Result = 71 bytes
Let's wrap up what we learned. At this point you can clearly choose which DynamoDB modes and models are suitable for your application and you can estimate and calculate how many resources you will use during DynamoDB operations. If you need a tool you can check it out our DynamoDB Calculator. Thank you for your passion and taking your time!
As a passionate electronics engineer who chose the path to be a software developer, Çetin is thrilled to follow and learn new technologies and different fields for modern application development. His interest in trying new things and making projects follows his passion for learning and sharing skills and emerging technologies.
Subscribe to Our Newsletter
Our Service
Specialties
Copyright © 2018-2024 Sufle
We use cookies to offer you a better experience with personalized content.
Cookies are small files that are sent to and stored in your computer by the websites you visit. Next time you visit the site, your browser will read the cookie and relay the information back to the website or element that originally set the cookie.
Cookies allow us to recognize you automatically whenever you visit our site so that we can personalize your experience and provide you with better service.