Pages

Estimating Database and Table Size in Vertica

Thursday, 14 November 2013

Hello All,

Today we will be looking at how to calculate size of database / tables in a Vertica Cluster.
So how is Vertica different in terms of storage – It is columnar
So Vertica stores data in Columnar format and hence the compression ratio is much better than other databases. I will not emphasize on Columnar storage of Vertica in this article (Will surely have a complete different article for this).
Below are some snippets through which you can get to know how much data is residing in your database or how much space is consumed by your data in database.




We will be calculating two different sizes in this article with respect to Raw data size and Actual Database Size:
1.       Database Size
2.       Table Size

Let’s start with Database size
The function used for calculating the database size is AUDIT                                                                          
AUDIT:Audit performs a full database audit and gives the actual database size in terms of raw data. Audit considers all the data in the database including the one is your WOS (Write optimized Store / Memory of your cluster)
Audit results are given in terms of bytes. It can be supplied with two optional parameters error_tolerance and confidence_level

Lowering the error_tolerance to zero (0) and raising the confidence_level to 100, provides the most accurate size estimate, and increases the performance impact of calling the AUDIT function. During a detailed, low error-tolerant audit, all of the data in the database is dumped to a raw format to calculate its size. Since performing a stringent audit can significantly impact database performance, never perform a full audit of a production database.
Audit function estimates the database size of your database considering the raw database size. What is raw data - raw data means the uncompressed, un-federated data stored in a single Vertica database.

What is excluded in while calculating raw database size
1.       Same data in multiple projections
2.       Data stored in temp tables
3.       Data which is marked for deletion ( Delete Vectors )
4.       Data stored in Monitoring, Data Collector and system tables

Usage of Audit Function

For calculating complete database size
NNANI=> select audit('');
   audit
-----------
 283448493

(1 row)
For calculating complete database size with high accuracy and high performance impact
NNANI=> select audit('',0,100);
   audit
-----------
 282084688

(1 row)

Here the 1st parameter is error_tolerance and second is confidence_level ….so the difference can be seen in both the scenarios regarding accuracy level.
To calculate size of a table
NNANI=> select audit('public.bank_holidays');
 audit
-------
 319
(1 row
The size of table bank holidays  table is 319 bytes

Calculating Actual Database Size:
Actual database size will be the footprint of your database .i.e. it will be including all the projections for same table, data in WOS, Data in system tables. In simple words the space your data is consuming in your database.
For calculating this we can use two monitoring tables
1.       Column_storage
2.       Projection_storage



Using Column_storage table to calculate the database size
NNANI=> select round((sum(column_used_bytes)/1024/1024/1024), 2.0) || ' GB' as database_size_in_GB from
NNANI-> (select anchor_table_schema, anchor_table_name,node_name, column_name, sum(used_bytes) as column_used_bytes
NNANI(> from column_storage where
NNANI(> node_name IN (select node_name from nodes)
NNANI(> group by 1,2,3,4) sub;
 database_size_in_GB
---------------------
 0.1 GB

(1 row)

Using projection_storage table to calculate the database size

NNANI=> select round(sum(used_bytes)/1024/1024/1024, 2.0) || ' GB' as database_size_in_GB  from
NNANI-> (select anchor_table_name,anchor_table_schema,node_name,sum(ros_used_bytes) as used_bytes
NNANI(> from projection_storage
NNANI(> group by 1,2,3
NNANI(> order by 3) sub;
 database_size_in_GB
---------------------
 0.1 GB

(1 row)


Using Column_storage table to calculate the table size

NNANI=> select CASE WHEN round((sum(column_used_bytes)), 2.0) > 0 THEN round((sum(column_used_bytes)), 2.0) || ' MB' END as table_size_in_Bytes,
NNANI-> CASE WHEN round((sum(column_used_bytes)/1024), 2.0) > 0 THEN round((sum(column_used_bytes)/1024), 2.0) || ' KB' END as table_size_in_KB,
NNANI-> CASE WHEN round((sum(column_used_bytes)/1024/1024), 2.0) > 0 THEN round((sum(column_used_bytes)/1024/1024), 2.0) || ' MB' END as table_size_in_MB,
NNANI-> CASE WHEN round((sum(column_used_bytes)/1024/1024/1024), 2.0) > 0 THEN round((sum(column_used_bytes)/1024/1024/1024), 2.0) || ' GB' END as table_size_in_GB
NNANI-> from
NNANI-> (select anchor_table_schema, anchor_table_name,node_name, column_name, sum(used_bytes) as column_used_bytes
NNANI(> from column_storage where anchor_table_schema=’public’ and anchor_table_name='bank_holidays' and
NNANI(> node_name IN (select node_name from nodes)

NNANI(> group by 1,2,3,4) sub;

table_size_in_Bytes | table_size_in_KB | table_size_in_MB | table_size_in_GB
---------------------+------------------+------------------+------------------
 828 MB              | 0.81 KB          |                  |
(1 row)

You can customize the above query according to your needs. You can even use the projection_storage table in same manner as column_storage is used.
This was a short article on calculating table size and database size in Vertica.

Please suggest any corrections if needed to this article
Your feedback is appreciated



No comments:

Post a Comment