Sunday, December 3, 2023
HomeBig DataSimplify Amazon Redshift monitoring utilizing the brand new unified SYS views

Simplify Amazon Redshift monitoring utilizing the brand new unified SYS views


Amazon Redshift is a completely managed, petabyte-scale knowledge warehouse service within the cloud, offering as much as 5 instances higher price-performance than another cloud knowledge warehouse, with efficiency innovation out of the field at no extra value to you. Tens of 1000’s of consumers use Amazon Redshift to course of exabytes of information daily to energy their analytics workloads.

On this put up, we focus on Amazon Redshift SYS monitoring views and the way they simplify the monitoring of your Amazon Redshift workloads and useful resource utilization.

Overview of SYS monitoring views

SYS monitoring views are system views in Amazon Redshift that can be utilized to watch question and workload useful resource utilization for provisioned clusters in addition to for serverless workgroups. They provide the next advantages:

  • They’re categorized primarily based on purposeful alignment, contemplating question state, efficiency metrics, and question sorts
  • Now we have launched new efficiency metrics like planning_time, lock_wait_time, remote_read_io, and local_read_io to help in efficiency troubleshooting
  • It improves the usability of monitoring views by logging the user-submitted question as an alternative of the Redshift optimizer-rewritten question
  • It offers extra troubleshooting metrics utilizing fewer views
  • It permits unified Amazon Redshift monitoring by enabling you to make use of the identical question throughout provisioned clusters or serverless workgroups

Let’s take a look at among the options of SYS monitoring views and the way they can be utilized for monitoring.

Unify numerous query-level monitoring metrics

The next desk exhibits how one can unify numerous metrics and knowledge for a question from a number of system tables & views into one SYS monitoring view.

STL/SVL/STV Info aspect SYS Monitoring View View columns
STL_QUERY elapsed time, question label, consumer ID, transaction, session, label, stopped queries, database title SYS_QUERY_HISTORY

user_id

query_id

query_label

transaction_id

session_id

database_name

query_type

standing

result_cache_hit

start_time

end_time

elapsed_time

queue_time

execution_time

error_message

returned_rows

returned_bytes

query_text

redshift_version

usage_limit

compute_type

compile_time

planning_time

lock_wait_time

STL_WLM_QUERY queue time, runtime
SVL_QLOG end result cache
STL_ERROR error code, error message
STL_UTILITYTEXT non-SELECT SQL
STL_DDLTEXT DDL statements
SVL_STATEMENTEXT all forms of SQL statements
STL_RETURN return rows and bytes
STL_USAGE_CONTROL utilization restrict
STV_WLM_QUERY_STATE present state of WLM
STV_RECENTS latest and in-flight queries
STV_INFLIGHT in-flight queries
SVL_COMPILE compilation

For added data on SYS to STL/SVL/STV mapping, check with Migrating to SYS monitoring views.

Consumer query-level logging

To boost question efficiency, the Redshift question engine can rewrite user-submitted queries. The user-submitted question identifier is totally different than the rewritten question identifier. We check with the user-submitted question because the mum or dad question and the rewritten question because the youngster question on this put up.

The next diagram illustrates logging on the mum or dad question degree and youngster question degree. The mum or dad question identifier is 1000, and the kid question identifiers are 1001, 1002, and 1003.

Question lifecycle timings

SYS_QUERY_HISTORY has an enhanced record of columns to offer granular time metrics referring to the totally different question lifecycle phases. Be aware all instances are recorded in microseconds. The next desk summarizes these metrics.

Time metrics Description
planning_time The time the question spent previous to working the question, which generally contains question lifecycle phases like parse, analyze, planning and rewriting.
lock_wait_time The time the question spent on buying the locks on the required database objects referenced.
queue_time The time the question spent within the queue ready for assets to be accessible to run.
compile_time The time the question spent compiling.
execution_time The time the question spent working. Within the case of a SELECT question, this additionally contains the return time.
elapsed_time The top-to-end time of the question run.

Resolution overview

We focus on the next eventualities to assist achieve familiarity with the SYS monitoring views:

  • Workload and question lifecycle monitoring
  • Information ingestion monitoring
  • Exterior question monitoring
  • Sluggish question efficiency troubleshooting

Stipulations

It’s best to have the next stipulations to observe together with the examples on this put up:

Moreover, obtain all of the SQL queries which are referenced on this put up as Redshift Question Editor v2 SQL notebooks.

Workload and question lifecycle monitoring

On this part, we focus on how you can monitor the workload and question lifecycle.

Determine in-flight queries

SYS_QUERY_HISTORY offers a singular view to have a look at all of the in-flight queries in addition to historic runs. See the next instance question:

SELECT  
  *
FROM    
  sys_query_history
WHERE    standing IN ('planning', 'queued', 'working', 'returning')
ORDER BY
  start_time;

We get the next output.

Determine high long-running queries

The next question helps retrieve the highest 100 queries which are taking the longest to run. Analyzing (and, if possible, optimizing) these queries may help enhance general efficiency. These metrics are accrued statistics throughout all runs of the question. Be aware that on a regular basis values are in microseconds.

--top lengthy working question by elapsed_time
SELECT  
  user_id
  , transaction_id
  , query_id
  , database_name
  , query_type
  , query_text::VARCHAR(100)
  , lock_wait_time
  , planning_time
  , compile_time
  , execution_time
  , elapsed_time
FROM    
  sys_query_history
ORDER BY
  elapsed_time DESC
LIMIT 100;

We get the next output.

Collect each day counts of queries by question sorts, interval, and standing

The next question offers perception into the distribution of various kinds of queries throughout totally different days and helps consider and observe any modifications within the workload:

--daily breakdown of workload by question sorts and standing
SELECT  
  DATE_TRUNC('day', start_time) period_daily
  , query_type
  , standing
  , COUNT(*)
FROM    
  sys_query_history
GROUP BY
  period_daily
  , query_type
  , standing
ORDER BY
  period_daily
  , query_type
  , standing;

We get the next output.

Collect run particulars of an in-flight question

To find out the run-level particulars of a question that’s in-flight, you should utilize the is_active = ‘t’ filter when querying the SYS_QUERY_DETAIL desk. See the next instance:

SELECT  
  query_id
  , child_query_sequence
  , stream_id
  , segment_id
  , step_id
  , step_name
  , table_id
  , coalesce(table_name,'')|| coalesce(supply,'') as table_name
  , start_time
  , end_time
  , length
  , blocks_read
  , local_read_io
  , remote_read_io
FROM    
  sys_query_detail
WHERE is_active="t"
ORDER BY
  query_id
  , child_query_sequence
  , stream_id
  , segment_id
  , step_id;

To view the most recent 100 COPY queries run, use the next code:

SELECT  
  session_id
  , transaction_id
  , query_id
  , database_name
  , table_name
  , data_source
  , loaded_rows
  , loaded_bytes
  , length / 1000.00 duration_ms
FROM    
  sys_load_history
ORDER BY
  start_time DESC LIMIT 100;

We get the next output.

Collect transaction-level particulars for commits and undo

SYS_TRANSACTION_HISTORY offers transaction-level logging by offering insights into dedicated transactions with particulars like blocks dedicated, standing, and isolation degree (serializable or snapshot used). It additionally logs particulars concerning the rolled again or undo transactions.

The next screenshots illustrate fetching particulars a couple of transaction that was dedicated efficiently.

The next screenshots illustrate fetching particulars a couple of transaction that was rolled again.

Stats and vacuum

The SYS_ANALYZE_HISTORY monitoring view offers particulars just like the final timestamp of analyze queries, the length for which a specific analyze question ran, the variety of rows within the desk, and the variety of rows modified. The next instance question offers an inventory of the most recent analyze queries that ran for all of the everlasting tables:

SELECT  
  TRIM(schema_name) schema_name
  , TRIM(table_name) table_name
  , table_id
  , standing
  , COUNT(*) times_analyze_was_triggered
  , MAX(last_analyze_time) last_analyze_time
  , MAX(end_time) end_time
  , AVG(ROWS) "rows"
  , AVG(modified_rows) modified_rows
FROM    
  sys_analyze_history
WHERE
   standing != 'Skipped'
GROUP BY
  schema_name
  , table_name
  , table_id
  , standing
ORDER BY
  schema_name
  , table_name
  , table_id
  , standing
  , end_time;

We get the next output.

The SYS_VACUUM_HISTORY monitoring view offers an entire set of particulars on VACUUM in a single view. For instance, see the next code:

SELECT  
  user_id
  , transaction_id
  , query_id
  , TRIM(database_name) as database_name
  , TRIM(schema_name) as schema_name
  , TRIM(table_name) table_name
  , table_id
  , vacuum_type
  , is_automatic as is_auto
  , length
  , rows_before_vacuum
  , size_before_vacuum
  , reclaimable_rows
  , reclaimed_rows
  , reclaimed_blocks
  , sortedrows_before_vacuum
  , sortedrows_after_vacuum
FROM    
  sys_vacuum_history
WHERE    standing LIKE '%Completed%'
ORDER BY
  start_time;

We get the next output.

Information ingestion monitoring

On this part, we focus on how you can monitor knowledge ingestion.

Abstract of ingestion

SYS_LOAD_HISTORY offers particulars into the statistics of COPY instructions. Use this view for summarized insights into your ingestion workload. The next instance question offers an hourly abstract of ingestion damaged down by tables by which knowledge was ingested:

SELECT  
  date_trunc('hour', start_time) period_hourly
  , database_name
  , table_name
  , standing
  , file_format
  , SUM(loaded_rows) total_rows_ingested
  , SUM(loaded_bytes) total_bytes_ingested
  , SUM(source_file_count) num_of_files_to_process
  , SUM(file_count_scanned) num_of_files_processed
  , SUM(error_count) total_errors
FROM    
  sys_load_history
GROUP BY
  period_hourly
  , database_name
  , table_name
  , standing
  , file_format
ORDER BY
  table_name
  , period_hourly
  , standing;

We get the next output.

File-level ingress logging

SYS_LOAD_DETAIL offers extra granular insights into how ingestion is carried out on the file degree. For instance, see the next question utilizing sys_load_history:

SELECT  
  *
FROM    
  sys_load_history
WHERE table_name="catalog_sales"
ORDER BY
  start_time;

We get the next output.

The next instance exhibits what detailed file-level monitoring appears like:

 SELECT  
  user_id
  , query_id
  , TRIM(file_name) file_name
  , bytes_scanned
  , lines_scanned
  , splits_scanned
  , record_time
  , start_time
  , end_time
FROM    
  sys_load_detail
WHERE query_id = 1824870
ORDER BY
  start_time;

Test for errors throughout ingress course of

SYS_LOAD_ERROR_DETAIL lets you observe and troubleshoot errors that will have occurred in the course of the ingestion course of. This view logs particulars for the file that encountered the error in the course of the ingestion course of together with the road quantity at which the error occurred and column particulars inside that line. See the next code:

choose * from sys_load_error_detail order by start_time restrict 100;

We get the next output.

Exterior question monitoring

SYS_EXTERNAL_QUERY_DETAIL offers run particulars for exterior queries, which incorporates Amazon Redshift Spectrum and federated queries. This view logs particulars on the section degree and offers helpful insights to troubleshoot and monitor efficiency of exterior queries in a single monitoring view. The next are a couple of helpful metrics and knowledge factors this monitoring view offers:

  • Variety of exterior recordsdata scanned (scanned_files) and format of exterior recordsdata (file_format) reminiscent of Parquet, textual content file, and so forth
  • Information scanned when it comes to rows (returned_rows) and bytes (returned_bytes)
  • Utilization of partitioning (total_partitions and qualified_partitions) by exterior queries and tables
  • Granular insights into time taken in itemizing (s3list_time) and qualifying partitions (get_partition_time) for a given exterior object
  • Exterior file location (file_location) and exterior desk title (table_name)
  • Sort of exterior supply (source_type), reminiscent of Amazon Easy Storage Service (Amazon S3) for Redshift Spectrum, or federated
  • Recursive scan for subdirectories (is_recursive) or entry of nested column knowledge sort (is_nested)

For instance, the next question exhibits the each day abstract of the variety of exterior queries run and knowledge scanned:

SELECT  
  DATE_TRUNC('hour', start_time) period_hourly
  , user_id
  , TRIM(source_type) source_type
  , COUNT (DISTINCT query_id) query_counts
  , SUM(returned_rows) returned_rows
  , ROUND(SUM(returned_bytes) / 1024^3,2) returned_gb
FROM    
  sys_external_query_detail
GROUP BY
  period_hourly
  , user_id
  , source_type
ORDER BY
  period_hourly
  , user_id
  , source_type;

We get the next output.

Utilization of partitions

You may confirm whether or not the exterior queries scanning giant sums of information and recordsdata are partitioned or not. Whenever you use partitions, you possibly can prohibit the quantity of information that your exterior question has to scan by pruning primarily based on the partition key. See the next code:

SELECT  
  file_location
  , CASE
      WHEN NVL(total_partitions,0) = 0
      THEN 'No'
      ELSE 'Sure'
    END is_partitioned
  , SUM(scanned_files) total_scanned_files
  , COUNT(DISTINCT query_id) query_count
FROM    
  sys_external_query_detail
GROUP BY
  file_location
  , is_partitioned
ORDER BY
  total_scanned_files DESC;

We get the next output.

For any errors encountered with exterior queries, look into SYS_EXTERNAL_QUERY_ERROR, which logs particulars on the granularity of file_location, column, and rowid inside that file.

Sluggish question efficiency troubleshooting

Seek advice from the sysview_slow_query_performance_troubleshooting SQL pocket book downloaded as a part of the stipulations for a step-by-step information on how you can carry out query-level troubleshooting utilizing SYS monitoring views and discover solutions to the next questions:

  • Do the queries being in contrast have comparable question textual content?
  • Did the question use the end result cache?
  • Which components of the question lifecycle (queuing, compilation, planning, lock wait) are contributing probably the most to question runtimes?
  • Has the question plan modified?
  • Is the question studying extra knowledge blocks?
  • Is the question spilling to disk? In that case, is it spilling to native or distant storage?
  • Is the question extremely skewed with respect to knowledge (distribution) and time (runtime)?
  • Do you see extra rows processed in be part of steps or nested loops?
  • Are there any alerts indicating staleness in statistics?
  • When was the final vacuum and analyze carried out for the tables concerned within the question?

Clear up

If you happen to created any Redshift provisioned clusters or Redshift Serverless workgroups as a part of this put up and not want them to your workloads, you possibly can delete them to keep away from incurring extra prices.

Conclusion

On this put up, we defined how you should utilize the Redshift SYS monitoring views to watch workloads of provisioned clusters and serverless workgroups. The SYS monitoring views present simplified monitoring of the workloads, entry to varied query-level monitoring metrics from a unified view, and the flexibility to make use of the identical SYS monitoring view question to run throughout each provisioned clusters and serverless workgroups. We additionally coated some key monitoring and troubleshooting eventualities utilizing SYS monitoring views.

We encourage you to begin utilizing the brand new SYS monitoring views to your Redshift workloads. If in case you have any suggestions or questions, please go away them within the feedback.


In regards to the authors

Urvish Shah is a Senior Database Engineer at Amazon Redshift. He has greater than a decade of expertise engaged on databases, knowledge warehousing and in analytics area. Exterior of labor, he enjoys cooking, travelling and spending time together with his daughter.

Ranjan Burman is a Analytics Specialist Options Architect at AWS. He makes a speciality of Amazon Redshift and helps clients construct scalable analytical options. He has greater than 15 years of expertise in numerous database and knowledge warehousing applied sciences. He’s obsessed with automating and fixing buyer issues with the usage of cloud options.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments