• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Home
  • Contact Us

iHash

News and How to's

  • Wireless Bluetooth 5.0 Earbuds, Sport Headphones Matte Design Earbuds with Battery Charging Case for $18

    Wireless Bluetooth 5.0 Earbuds, Sport Headphones Matte Design Earbuds with Battery Charging Case for $18
  • Altec Lansing HydraShock Everything Proof Wireless Bluetooth Speaker, IP67, IMW1500-SJR, Red (Certified Refurbished) for $119

    Altec Lansing HydraShock Everything Proof Wireless Bluetooth Speaker, IP67, IMW1500-SJR, Red (Certified Refurbished) for $119
  • The Essential 2023 Learn Graphic Design Bundle for $39

    The Essential 2023 Learn Graphic Design Bundle for $39
  • SalesKingPin Copywriting Tool: Lifetime License for $39

    SalesKingPin Copywriting Tool: Lifetime License  for $39
  • Abelssoft CryptBox: 3-PC Lifetime License for $19

    Abelssoft CryptBox: 3-PC Lifetime License for $19
  • News
    • Rumor
    • Design
    • Concept
    • WWDC
    • Security
    • BigData
  • Apps
    • Free Apps
    • OS X
    • iOS
    • iTunes
      • Music
      • Movie
      • Books
  • How to
    • OS X
      • OS X Mavericks
      • OS X Yosemite
      • Where Download OS X 10.9 Mavericks
    • iOS
      • iOS 7
      • iOS 8
      • iPhone Firmware
      • iPad Firmware
      • iPod touch
      • AppleTV Firmware
      • Where Download iOS 7 Beta
      • Jailbreak News
      • iOS 8 Beta/GM Download Links (mega links) and How to Upgrade
      • iPhone Recovery Mode
      • iPhone DFU Mode
      • How to Upgrade iOS 6 to iOS 7
      • How To Downgrade From iOS 7 Beta to iOS 6
    • Other
      • Disable Apple Remote Control
      • Pair Apple Remote Control
      • Unpair Apple Remote Control
  • Special Offers
  • Contact us

Guide to AWS Monitoring with Prometheus and Logz.io

Jan 18, 2023 by iHash Leave a Comment


Prometheus is a widely utilized time-series database for monitoring the health and performance of AWS infrastructure. With its ecosystem of data collection, storage, alerting, and analysis capabilities, among others, the open source tool set offers a complete package of monitoring solutions. Prometheus is ideal for scraping metrics from cloud-native services, storing the data for analysis, and monitoring the data with alerts.

In this article, we’ll take a look at the Prometheus ecosystem and offer some key considerations for setting up Prometheus to monitor AWS, highlight some of its shortcomings, and take a look at how to go about solving them with Logz.io.

Table of Contents

  • Prometheus Ecosystem
  • Prometheus Challenges
  • Key AWS Metrics to Monitor
    • Usage
      • CPU
      • Disk
      • Memory
      • Bandwidth
      • Request Count
    • AWS Errors
      • ELB Status Code
      • S3 Access Errors
      • Unhealthy Hosts
    • AWS Performance Metrics
      • Latency Increase
      • Surge Queue Length
  • Integrating Prometheus with your AWS services
    • Integration of EC2 with Prometheus with the CloudWatch Exporter
    • Integration of CloudWatch Metrics with Prometheus
  • Solving Prometheus Issues with Logz.io
    • Send Prometheus Metrics to Logz.io
  • Conclusion

Prometheus Ecosystem

Prometheus has three core components – scraping which is done from the endpoints that exporters expose, a time series database, and an alerting system called Alert Manager.

Using this system, an exporter reads metrics from AWS infrastructure and exposes the data for Prometheus to scrape. For example, you can run a node exporter on EC2 and then configure Prometheus to pull metrics from your machines. A node exporter will collect all ofl your system information and then open a small server to expose these metrics. 

While Prometheus scraping can be used to collect metrics from all kinds of infrastructure, it’s hugely popular based on its comparative ease-of-use for Kubernetes-based environments. Its auto discovery for new Kubernetes services has dramatically simplified Kubernetes monitoring. And we all know how popular Kubernetes is among today’s cloud developers.

Once data is scraped using Prometheus, its time-series database stores these metrics, while AlertManager monitors them, and then pushes notifications to your desired endpoint.

Other tools in this ecosystem of course include Grafana, Trickster, Thanos, M3DB, Cortex, Pushgateway, and a number of other Prometheus exporters.

Trickster is a caching layer on top of Prometheus that can cache queries that are very frequent and /or large in scale; this can prove extremely useful in lowering the pressure on Prometheus itself. 

The Thanos, Cortex, and M3DB databases can be used to extend the functionality of Prometheus features including high availability, horizontal scaling, and historical back up. While Prometheus is a single-node solution, you can write the data to these time series databases to consolidate data from multiple servers for analysis.

Pushgateway enables push-based metrics in your Prometheus setup. By default, Prometheus can only read metrics from defined sources. You can simply push the metrics to Pushgateway, and Prometheus will then pull the metrics from there.

And while Prometheus is a powerful solution for collecting and storing metrics from cloud-native environments, its visualization capabilities are lacking. 

As a result most Prometheus users visualize their data with Grafana – an open source data visualization tool that easily connects to Prometheus. It has great support for Prometheus’ query language and is a highly capable and flexible metric visualization solution. 

Prometheus Challenges

As mentioned, Prometheus runs on a single node so it is inherently not designed for high availability. Since Prometheus stores metrics on a disk in a single machine, as the data grows, many users end up decreasing their related range of fine metrics to accommodate growing scale. In some cases, this comes at the expense of monitoring critical information. 

To scale your system without reducing the cardinality of your metrics, you can however implement tools like Thanos and Trickster to centralize your Prometheus metrics for storage analysis. 

But of course, adding additional components means invoking additional installations, adding infrastructure, creating more configurations, undertaking more upgrades, and increasing other maintenance tasks – all of which requires time. As a result, high availability Prometheus deployments can become increasingly difficult to manage as data volumes grow. 

Finally, metrics is only one piece of the observability puzzle, and Prometheus isn’t purpose built to collect and store logs or traces. For this reason, Prometheus users will inevitably end up isolating their metrics from their log and trace data – which can prove a recipe for observability tool sprawl. Those who want to unify their logs, metrics, and traces in one solution will need a different approach.

Key AWS Metrics to Monitor

Usage

Usage defines the percentage of consumption of any resource. For example, if you’re saving 10 GB of data on a 100 GB disk, the usage percentage is 10%. There are different ways to monitor usage.

CPU

CPU usage is important to monitor because it helps you discover any issue with or high consumption of CPU. This metric is available for AWS services like EC2 machines, load balancers, RDS, etc. The threshold for this, for example, can be when all your CPU cores hit 100% utilization.

Disk

Disk is the permanent storage (secondary storage) available to be consumed. This can be a critical metric to keep an eye on since if there is no disk left, all your software could stop working. Generally, the threshold for this is 90%. If you see 90% consumption, you should quickly extend the disk size. Services like RDS and EC2 have these metrics available.

Memory

Memory is the RAM used during any processing, with 100% memory utilization possibly triggering the OOM killer, terminating your process. The threshold here can be 80% utilization. Services like RDS, Elasticache, EC2, and ECS have these metrics.

Bandwidth

Bandwidth is the network I/O being consumed by your services. You have to make sure that your network I/O doesn’t reach the limit of networking defined by AWS, which is 10 Gbps in most cases. You can monitor this in services like Managed NAT, EC2, Elasticache, and RDS.

Request Count

Request count helps you identify the usage of a given resource. This number tells you the number of times someone requests this resource. You have to watch for any anomaly here. Most AWS services have this metric, with the most important ones being load balancers, Elasticache, RDS, and EC2.

AWS Errors

An error number shows if there is an increase or decrease in errors. Below are a few important error metrics that you should watch.

ELB Status Code

You should keep an eye on Elastic Load Balancer Status codes as well. An increase in error status codes means that your application may not be performing well.

S3 Access Errors

This metric gives the number of requests that resulted in failed states either due to a permission error or “not found” error.

Unhealthy Hosts

ELB and ALB generally have this metric. It is one of the most important metrics to monitor since it tells you how many healthy backends there are to serve requests. Any decline in this number can be a problem, so make sure to configure an alert for it.

AWS Performance Metrics

In the modern era of cloud computing, where latency can also be treated as an error, it is important to keep a watch on performance metrics. These will help let you know if any scaling is required to run your application properly. Below are a few metrics that you should monitor in this space.

Latency Increase

Latency numbers are very important. These can tell you a lot about your application saturation and how it can scale for further requests. If you see latency increase, there may be some problem with your application or you may need to increase the number of instances of your application.

Surge Queue Length

Surge queue length is the number of requests waiting to be served. This metric comes with ELB and ALB. You don’t want your requests to be in a queue, as this can dramatically increase response time.

Integrating Prometheus with your AWS services

Using the CloudWatch Exporter to expose AWS metrics for Prometheus scraping is a popular way to monitor AWS. Let’s go through an example of implementing this exporter to collect EC2 metric data.

Integration of EC2 with Prometheus with the CloudWatch Exporter

To integrate your EC2 machines with Prometheus, first install the CloudWatch agent on them using the following command:

java -jar target/cloudwatch_exporter-*-SNAPSHOT-jar-with-dependencies.jar 9106 example.yml 

Next, configure your Prometheus server to start scraping metrics from these machines:

job_name: cloudwatch
		metrics_path: ip_of_ec2_machine:port/metrics

Now, configure the CloudWatch agent to instruct what metrics to scrape from the machines.

  1. Install the cloud watch agent. You can follow this link to install it or use below command
sudo yum install amazon-cloudwatch-agent
  1. Update the Prometheus scrape config to identify the new metrics sources.
global:
  scrape_interval: 1m
  scrape_timeout: 10s
scrape_configs:
  - job_name: MY_JOB
    sample_limit: 10000
    ec2_sd_configs:
      - region: us-east-1
        port: 9404
        filters:
          - name: instance-id
            values:
              - i-98765432109876543
              - i-12345678901234567

You can get the detailed instructions for the above steps in the AWS documentation.

Integration of CloudWatch Metrics with Prometheus

The easiest way to gather all of your metrics is taking them directly from CloudWatch, as most events are logged there. Simply install a CloudWatch exporter in one of your machines and run it:

java -jar target/cloudwatch_exporter-*-SNAPSHOT-jar-with-dependencies.jar 9106 example.yml 

Input the proper configuration along with AWS credentials; these values can go in the environment variable:

	export AWS_ACCESS_KEY_ID = “aws_key”
	export AWS_SECRET_ACCESS_KEY  = “aws_secret”

Now, configure your Prometheus server to start scraping metrics from the CloudWatch exporter metric endpoints:

	job_name: cloudwatch
		metrics_path: ip_of_cloud_watch_exporter_vm:port/metrics

Further documentation on this from Logz.io is available, plus, you can read about AWS Lambda integration with Prometheus.

Solving Prometheus Issues with Logz.io

As we’ve seen in the above discussion, scaling Prometheus can be a significant challenge and you may end up managing multiple components including Thanos, Trickster, Grafana, and underlying infrastructure. As an alternative, Logz.io can solve this problem for you, and very easily at that.

Using Logz.io, you can configure your existing Prometheus server to forward the metrics and thus offload the management complexity to the Logz.io Open 360™ observability platform. 

To illustrate this process let’s quickly walk through how this is done.

Send Prometheus Metrics to Logz.io

To get started, you can easily configure Prometheus to perform a remote write to Logz.io servers. Using this approach, your Prometheus servers will act as a scraper and then write those metrics to Logz.io for storage and analysis. After taking this step, you can easily build dashboards on top of these metrics within Logz.io.

To start, simply create a Logz.io account, and select the correct region and listener configuration.Next, get your metrics account token from Settings > Manage tokens > Data shipping tokens > Metrics.

Then add the remote write URL in the Prometheus configuration:

                       global:
  external_labels:
    p8s_logzio_name: <labelvalue>
remote_write:
  - url: https://<<LISTENER-HOST>>:8053
    bearer_token: <<PROMETHEUS-METRICS-SHIPPING-TOKEN>> 
    remote_timeout: 30s
    queue_config:
      batch_send_deadline: 5s  #default = 5s
      max_shards: 10  #default = 1000
      min_shards: 1
      max_samples_per_send: 500 #default = 100
      capacity: 10000  #default = 500

Now, simply restart Prometheus and your metrics will begin streaming to Logz.io so you can begin building dashboards or explore metrics using the metrics explorer found here.

Logz.io unifies metrics, traces, and logs in a unified platform, so it’s easy to correlate across all your data – giving you the desired ability to detect and solve issues quickly. When logs, distributed traces, and stack traces are presented with metrics, it becomes much easier to pinpoint the location and time of an issue, decreasing mean time to resolution and increasing your team’s overall efficiency.

Conclusion

Prometheus is a great tool to utilize as you begin your monitoring journey, but as your usage and scale inevitably grow, related complexity can become a significant hurdle.

For many teams, an easier alternative approach is to employ Prometheus but also ship the metrics to a managed SaaS platform such as Logz.io. This way, you can save engineering costs and spend more time building new features – all while retaining the powerful innovation of the open source community. 

Logz.io is designed to be simple to integrate and use, and it also importantly provides PromQL support to build custom dashboards and alerting on top of any metrics that you ship. You can also use AWS Kinesis to send the metrics to Logz.io or use Logz.io’s Telemetry Collector without requiring intermediate Prometheus setup. Get started with a free 14-day trial of Logz.io, and monitor your AWS applications with a modern cloud-native solution based on Prometheus!



Source link

Share this:

  • Facebook
  • Twitter
  • Pinterest
  • LinkedIn

Filed Under: News Tagged With: aws, guide, Logz.io, Monitoring, Prometheus

Special Offers

  • Wireless Bluetooth 5.0 Earbuds, Sport Headphones Matte Design Earbuds with Battery Charging Case for $18

    Wireless Bluetooth 5.0 Earbuds, Sport Headphones Matte Design Earbuds with Battery Charging Case for $18
  • Altec Lansing HydraShock Everything Proof Wireless Bluetooth Speaker, IP67, IMW1500-SJR, Red (Certified Refurbished) for $119

    Altec Lansing HydraShock Everything Proof Wireless Bluetooth Speaker, IP67, IMW1500-SJR, Red (Certified Refurbished) for $119
  • The Essential 2023 Learn Graphic Design Bundle for $39

    The Essential 2023 Learn Graphic Design Bundle for $39
  • SalesKingPin Copywriting Tool: Lifetime License for $39

    SalesKingPin Copywriting Tool: Lifetime License  for $39
  • Abelssoft CryptBox: 3-PC Lifetime License for $19

    Abelssoft CryptBox: 3-PC Lifetime License for $19

Reader Interactions

Leave a Reply Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Primary Sidebar

  • Facebook
  • GitHub
  • Instagram
  • Pinterest
  • Twitter
  • YouTube

More to See

Altec Lansing HydraShock Everything Proof Wireless Bluetooth Speaker, IP67, IMW1500-SJR, Red (Certified Refurbished) for $119

Feb 3, 2023 By iHash

Introducing approximate nearest neighbor search in Elasticsearch 8.0

Elastic Stack 7.17.9 released | Elastic Blog

Feb 3, 2023 By iHash

Tags

* Apple Cisco computer security cyber attacks cyber crime cyber news cybersecurity Cyber Security cyber security news cyber security news today cyber security updates cyber threats cyber updates data breach data breaches google hacker hacker news Hackers hacking hacking news how to hack incident response information security iOS 7 iOS 8 iPhone Malware microsoft network security ransomware ransomware malware risk management Secure security security breaches security vulnerabilities software vulnerability the hacker news Threat update video Vulnerabilities web applications

Latest

Wireless Bluetooth 5.0 Earbuds, Sport Headphones Matte Design Earbuds with Battery Charging Case for $18

Expires January 27, 2123 20:49 PST Buy now and get 78% off PRODUCT SPECS Reduce Unwanted Noise While Enjoying 5 Hours of Wireless Music & Calls in Every Charge Using advanced noise-reduction technology, Earphones have been designed to reduce unwanted noise during exercise. With an onboard 2,000mAh polymer lithium battery that offers 5 hours of […]

Apple reports first quarter results

Apple periodically provides information for investors on its corporate website, apple.com, and its investor relations website, investor.apple.com. This includes press releases and other information about financial performance, reports filed or furnished with the SEC, information on corporate governance, and details related to its annual meeting of shareholders. This press release contains forward-looking statements, within the meaning of […]

The Essential 2023 Learn Graphic Design Bundle for $39

Expires February 03, 2024 23:59 PST Buy now and get 98% off Logo Animation Ideas for Minimalist Logos KEY FEATURES This course is designed to help you learn the basics of Adobe Illustrator, After Effects, and Adobe Flash. You’ll start by making a minimalist logo using only simple shapes, then we’ll add transitions with some […]

Clarity and Transparency: How to Build Trust for Zero Trust

Clarity and Transparency: How to Build Trust for Zero Trust

Be impeccable with your words. It’s the first of the Four Agreements – a set of universal life principles outlined in the bestselling book by Don Miguel Ruiz. ‘Being impeccable with your words’ is my favorite, and it’s no surprise. As a product marketer, I spend most of my daily existence casting about for the […]

Abelssoft CryptBox: 3-PC Lifetime License for $19

Expires May 03, 2023 23:59 PST Buy now and get 61% off KEY FEATURES Your own files are not safe on your personal computer. It is much too easy to get access to the computer from the outside. The easiest and most secure way to protect your data is CryptBox. CryptBox is an encryption software […]

Building a secure and scalable multi-cloud environment with Cisco Secure Firewall Threat Defense on Alkira Cloud

Building a secure and scalable multi-cloud environment with Cisco Secure Firewall Threat Defense on Alkira Cloud

In today’s security climate, NetOps and SecOps teams are witnessing increased attack surface area as applications and workloads move far beyond the boundaries of their data center. These applications/workloads move to, and reside in multi-cloud architecture, adding complexity to connectivity, visibility, and control. In the multi-cloud world, the SecOps teams use a distributed security model […]

Jailbreak

Pangu Releases Updated Jailbreak of iOS 9 Pangu9 v1.2.0

Pangu has updated its jailbreak utility for iOS 9.0 to 9.0.2 with a fix for the manage storage bug and the latest version of Cydia. Change log V1.2.0 (2015-10-27) 1. Bundle latest Cydia with new Patcyh which fixed failure to open url scheme in MobileSafari 2. Fixed the bug that “preferences -> Storage&iCloud Usage -> […]

Apple Blocks Pangu Jailbreak Exploits With Release of iOS 9.1

Apple has blocked exploits used by the Pangu Jailbreak with the release of iOS 9.1. Pangu was able to jailbreak iOS 9.0 to 9.0.2; however, in Apple’s document on the security content of iOS 9.1, PanguTeam is credited with discovering two vulnerabilities that have been patched.

Pangu Releases Updated Jailbreak of iOS 9 Pangu9 v1.1.0

  Pangu has released an update to its jailbreak utility for iOS 9 that improves its reliability and success rate.   Change log V1.1.0 (2015-10-21) 1. Improve the success rate and reliability of jailbreak program for 64bit devices 2. Optimize backup process and improve jailbreak speed, and fix an issue that leads to fail to […]

Activator 1.9.6 Released With Support for iOS 9, 3D Touch

  Ryan Petrich has released Activator 1.9.6, an update to the centralized gesture, button, and shortcut manager, that brings support for iOS 9 and 3D Touch.

Copyright iHash.eu © 2023
We use cookies on this website. By using this site, you agree that we may store and access cookies on your device. Accept Read More
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT