Hello All,
Today we will be looking at how to calculate size of
database / tables in a Vertica Cluster.
So how is Vertica different in terms of storage – It is
columnar
So Vertica stores data in Columnar format and hence the
compression ratio is much better than other databases. I will not emphasize on
Columnar storage of Vertica in this article (Will surely have a complete
different article for this).
Below are some snippets through which you can get to know
how much data is residing in your database or how much space is consumed by
your data in database.
We will be calculating two different sizes in this article
with respect to Raw data size and Actual Database Size:
1.
Database Size
2.
Table Size
Let’s start with Database size
The function used for calculating the database size is AUDIT
AUDIT:Audit performs a full database audit and gives the actual
database size in terms of raw data. Audit considers all the data in the
database including the one is your WOS (Write optimized Store / Memory of your
cluster)
Audit results are given in terms of bytes. It can be supplied
with two optional parameters error_tolerance and confidence_level
Lowering the error_tolerance to zero (0) and raising the confidence_level
to 100, provides the most accurate size estimate, and increases the performance
impact of calling the AUDIT function. During a detailed, low error-tolerant
audit, all of the data in the database is dumped to a raw format to calculate
its size. Since performing a stringent audit can significantly impact database
performance, never perform a full audit of a production database.
Audit function estimates the database size of your database considering
the raw database size. What is raw data - raw data means the
uncompressed, un-federated data stored in a single Vertica database.
What is excluded in while calculating raw database size
1.
Same data in multiple projections
2.
Data stored in temp tables
3.
Data which is marked for deletion ( Delete
Vectors )
4.
Data stored in Monitoring, Data Collector and
system tables
Usage of Audit Function
For calculating complete database size
NNANI=> select audit('');
audit
-----------
283448493
(1 row)
For calculating complete database size with high accuracy
and high performance impact
NNANI=> select audit('',0,100);
audit
-----------
282084688
(1 row)
Here the 1st parameter is error_tolerance and
second is confidence_level ….so the difference can be seen in both the
scenarios regarding accuracy level.
To calculate size of a table
NNANI=> select
audit('public.bank_holidays');
audit
-------
319
(1 row
The size of table bank holidays table is 319 bytes
Calculating Actual
Database Size:
Actual database size will be the footprint of your database
.i.e. it will be including all the projections for same table, data in WOS,
Data in system tables. In simple words the space your data is consuming in your
database.
For calculating this we can use two monitoring tables
1.
Column_storage
2.
Projection_storage
Using Column_storage
table to calculate the database size
NNANI=> select
round((sum(column_used_bytes)/1024/1024/1024), 2.0) || ' GB' as
database_size_in_GB from
NNANI-> (select anchor_table_schema,
anchor_table_name,node_name, column_name, sum(used_bytes) as column_used_bytes
NNANI(> from column_storage where
NNANI(> node_name IN (select node_name
from nodes)
NNANI(> group by 1,2,3,4) sub;
database_size_in_GB
---------------------
0.1 GB
(1 row)
Using projection_storage table to calculate the database size
NNANI=> select
round(sum(used_bytes)/1024/1024/1024, 2.0) || ' GB' as database_size_in_GB from
NNANI-> (select
anchor_table_name,anchor_table_schema,node_name,sum(ros_used_bytes) as
used_bytes
NNANI(> from projection_storage
NNANI(> group by 1,2,3
NNANI(> order by 3) sub;
database_size_in_GB
---------------------
0.1 GB
(1 row)
Using Column_storage table to calculate the table size
NNANI=> select CASE WHEN
round((sum(column_used_bytes)), 2.0) > 0 THEN round((sum(column_used_bytes)),
2.0) || ' MB' END as table_size_in_Bytes,
NNANI-> CASE WHEN
round((sum(column_used_bytes)/1024), 2.0) > 0 THEN
round((sum(column_used_bytes)/1024), 2.0) || ' KB' END as table_size_in_KB,
NNANI-> CASE WHEN
round((sum(column_used_bytes)/1024/1024), 2.0) > 0 THEN
round((sum(column_used_bytes)/1024/1024), 2.0) || ' MB' END as
table_size_in_MB,
NNANI-> CASE WHEN
round((sum(column_used_bytes)/1024/1024/1024), 2.0) > 0 THEN
round((sum(column_used_bytes)/1024/1024/1024), 2.0) || ' GB' END as table_size_in_GB
NNANI-> from
NNANI-> (select anchor_table_schema,
anchor_table_name,node_name, column_name, sum(used_bytes) as column_used_bytes
NNANI(> from column_storage where
anchor_table_schema=’public’ and anchor_table_name='bank_holidays' and
NNANI(> node_name IN (select node_name
from nodes)
NNANI(> group by 1,2,3,4) sub;
table_size_in_Bytes | table_size_in_KB |
table_size_in_MB | table_size_in_GB
---------------------+------------------+------------------+------------------
828
MB | 0.81 KB | |
(1 row)
You can customize the
above query according to your needs. You can even use the projection_storage
table in same manner as column_storage is used.
This was a short article
on calculating table size and database size in Vertica.
Please suggest any
corrections if needed to this article
Your feedback is
appreciated
No comments:
Post a Comment