Difference between revisions of "IT-SDK-SRE"

From wiki.samerhijazi.net
Jump to navigation Jump to search
(NEW-Work)
(Dynatrace)
Line 22: Line 22:
 
* https://www.dynatrace.com/news/blog/openstack-monitoring-beyond-the-elastic-stack-part-2/
 
* https://www.dynatrace.com/news/blog/openstack-monitoring-beyond-the-elastic-stack-part-2/
  
 +
=Init-Definitions=
 +
https://www.leanix.net/en/wiki/vsm/site-reliability-engineering-sre
 +
* SRE monitor systems in production and analyze their performance to detect areas of improvement.
 +
* SRE observations help them calculate the potential cost of outages and plan for contingency.
 +
* SRE usually split their time between operations and the development of systems and software.
 +
* SRE spent time on building and deploying services that optimize the workflow for IT and support departments.
 +
* SRE determine what new features can be implemented and when this is possible through the help of SLAs, SLIs, SLOs.
 +
* Service Level Agreements (SLAs), Service Level Indicators (SLI), and Service Level Objectives (SLO).
 
=Dynatrace=
 
=Dynatrace=
 
*https://www.dynatrace.com/support/help/
 
*https://www.dynatrace.com/support/help/
Line 27: Line 35:
 
*https://www.dynatrace.com/support/help/
 
*https://www.dynatrace.com/support/help/
 
*https://community.dynatrace.com/
 
*https://community.dynatrace.com/
 +
 
=SRE Toolchain=
 
=SRE Toolchain=
 
* https://www.dynatrace.com/news/blog/sre-vs-devops/
 
* https://www.dynatrace.com/news/blog/sre-vs-devops/

Revision as of 11:02, 25 November 2021

Init-Ref

Init-Notes

  • SRE focuses on improving software system reliability across key categories including availability, performance, latency, efficiency, capacity, and incident response.
  • service-level indicators (SLIs) and service-level objectives (SLOs)
  • Uptime: "five nines" or 99.999%, over five minutes of downtime per year.
  • Uptime: "four nines" or 99.99%, nearly an hour of downtime per year.
  • Dynatrace is both an Application Performance Monitoring and application Management tool, it can be used as Cloud based SaaS offering or installed on-prem and more.
  • APM: application performance management
  • ELK Stack: is the acronym for three open source projects: Elasticsearch, Logstash, and Kibana
  • ELK Stack/Elastic & New Relic & Datadog & Dynatrace
  • Azure, Terraform, Ansible, concourse-ci, Elasticsearch/Kibana, Dynatrace, Prometheus, Graylog, StoreBox
  • NEW-Work: AWS, Azure, concourse, Jenkins, Aurora DB, Dynatrace, New Relic, ElasticSearch, Kibana

Init-Youtube

Init-Definitions

https://www.leanix.net/en/wiki/vsm/site-reliability-engineering-sre

  • SRE monitor systems in production and analyze their performance to detect areas of improvement.
  • SRE observations help them calculate the potential cost of outages and plan for contingency.
  • SRE usually split their time between operations and the development of systems and software.
  • SRE spent time on building and deploying services that optimize the workflow for IT and support departments.
  • SRE determine what new features can be implemented and when this is possible through the help of SLAs, SLIs, SLOs.
  • Service Level Agreements (SLAs), Service Level Indicators (SLI), and Service Level Objectives (SLO).

Dynatrace

SRE Toolchain

Containers for Microservices

  • Docker
  • Kubernetes
  • Swarm
  • Apache Mesos
  • Podman

Source Control Tools

  • Git

CI/CD Tools

Data Storage Tools

  • MySQL
  • PostgreSQL
  • MonogoDB
  • Apache Hadoop
  • Apache Hive
  • Amazon Aurora (MySQL and PostgreSQL-compatible)
  • MariaDB (fork from MySQL)

Configuration Management Tools

  • Ansible
  • Chef
  • Puppet
  • Saltstack

Metrics Collection Tools

  • Prometheus
  • Stackdriver (Google Cloud Operations)
  • InfluxDB
  • Sensu Go

Log Aggregation Tools

  • Fluentd
  • Sentry
  • Logstash

Distributed Tracing Tools

  • OpenTelemetry
  • Jaeger

Application Performance Monitoring Tools

  • Appdynamics
  • New Relic
  • Dynatrace

Dashboarding Tools

  • Grafana
  • Stashboard
  • Redash
  • Metabase

Incident Management

  • Pagerduty
  • Opsgenie
  • Squadcast