• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Home
  • Contact Us

iHash

News and How to's

  • HP ProDesk 600G5 Desktop Hexa Core Intel i5 (3.2GHz) 16GB DDR4 RAM 500GB SSD Windows 11 Pro (Refurbished) for $584

    HP ProDesk 600G5 Desktop Hexa Core Intel i5 (3.2GHz) 16GB DDR4 RAM 500GB SSD Windows 11 Pro (Refurbished) for $584
  • Dell OptiPlex 7010 RGB Desktop Quad Core Intel i5 (3.2GHz) 8GB DDR3 RAM 250GB SSD Windows 10 Pro (Refurbished) for $162

    Dell OptiPlex 7010 RGB Desktop Quad Core Intel i5 (3.2GHz) 8GB DDR3 RAM 250GB SSD Windows 10 Pro (Refurbished) for $162
  • Dell OptiPlex 5040 (RGB) Desktop Quad Core Intel i5 (3.2GHz) 16GB DDR3 RAM 500GB SSD Windows 10 Pro (Refurbished) for $249

    Dell OptiPlex 5040 (RGB) Desktop Quad Core Intel i5 (3.2GHz) 16GB DDR3 RAM 500GB SSD Windows 10 Pro (Refurbished) for $249
  • Zerrio: The Ultimate All-In-One Business Management Toolkit (Lifetime Subscription) for $59

    Zerrio: The Ultimate All-In-One Business Management Toolkit (Lifetime Subscription) for $59
  • DNS FireWall: Lifetime Subscription for $59

    DNS FireWall: Lifetime Subscription for $59
  • News
    • Rumor
    • Design
    • Concept
    • WWDC
    • Security
    • BigData
  • Apps
    • Free Apps
    • OS X
    • iOS
    • iTunes
      • Music
      • Movie
      • Books
  • How to
    • OS X
      • OS X Mavericks
      • OS X Yosemite
      • Where Download OS X 10.9 Mavericks
    • iOS
      • iOS 7
      • iOS 8
      • iPhone Firmware
      • iPad Firmware
      • iPod touch
      • AppleTV Firmware
      • Where Download iOS 7 Beta
      • Jailbreak News
      • iOS 8 Beta/GM Download Links (mega links) and How to Upgrade
      • iPhone Recovery Mode
      • iPhone DFU Mode
      • How to Upgrade iOS 6 to iOS 7
      • How To Downgrade From iOS 7 Beta to iOS 6
    • Other
      • Disable Apple Remote Control
      • Pair Apple Remote Control
      • Unpair Apple Remote Control
  • Special Offers
  • Contact us

Gain insights into Kubernetes errors with Elastic Observability logs and OpenAI

May 18, 2023 by iHash Leave a Comment


blog-elastic-configuration.png

As we’ve shown in previous blogs, Elastic® provides a way to ingest and manage telemetry from the Kubernetes cluster and the application running on it. Elastic provides out-of-the-box dashboards to help with tracking metrics, log management and analytics, APM functionality (which also supports native OpenTelemetry), and the ability to analyze everything with AIOps features and machine learning (ML). While you can use pre-existing ML models in Elastic, out-of-the-box AIOps features, or your own ML models, there is a need to dig deeper into the root cause of an issue. 

Elastic helps reduce the operational work to support more efficient operations, but users still need a way to investigate and understand everything from the cause of an issue to the meaning of specific error messages. As an operations user, if you haven’t run into a particular error before or it’s part of some runbook, you will likely go to Google and start searching for information. 

OpenAI’s ChatGPT is becoming an interesting generative AI tool that helps provide more information using the models behind it. What if you could use OpenAI to obtain deeper insights (even simple semantics) for an error in your production or development environment? You can easily tie Elastic to OpenAI’s API to achieve this. 

Kubernetes, a mainstay in most deployments (on-prem or in a cloud service provider) requires a significant amount of expertise — even if that expertise is to manage a service like GKE, EKS, or AKS. 

In this blog, I will cover how you can use Elastic’s watcher capability to connect Elastic to OpenAI and ask it for more information about the error logs Elastic is ingesting from a Kubernetes cluster(s). More specifically, we will use Azure’s OpenAI Service. Azure OpenAI is a partnership between Microsoft and OpenAI, so the same models from OpenAI are available in the Microsoft version.

elastic azure openai

While this blog goes over a specific example, it can be modified for other types of errors Elastic receives in logs. Whether it’s from AWS, the application, databases, etc., the configuration and script described in this blog can be modified easily.

If you plan on following this blog, here are some of the components and details we used to set up the configuration:

  • Ensure you have an account on Elastic Cloud and a deployed stack (see instructions here).
  • We used a GCP GKE Kubernetes cluster, but you can use any Kubernetes cluster service (on-prem or cloud based) of your choice.
  • We’re also running with a version of the OpenTelemetry Demo. Directions for using Elastic with OpenTelemetry Demo are here. 
  • We also have an Azure account and Azure OpenAI service configured. You will need to get the appropriate tokens from Azure and the proper URL endpoint from Azure’s OpenAI service.
  • We will use Elastic’s dev tools, the console to be specific, to load up and run the script, which is an Elastic watcher.
  • We will also add a new index to store the results from the OpenAI query.

Here is the configuration we will set up in this blog:

elastic openai configuration
Configuration to analyze Kubernetes cluster errors

As we walk through the setup, we’ll also provide the alternative setup with OpenAI versus Azure OpenAI Service.

Over the next few steps, I’ll walk through:

  • Getting an account on Elastic Cloud and setting up your K8S cluster and application
  • Gaining Azure OpenAI authorization (alternative option with OpenAI)
  • Identifying Kubernetes error logs
  • Configuring the watcher with the right script
  • Comparing the output from Azure OpenAI/OpenAI versus ChatGPT UI
elastic start cloud trial

Once you have the Elastic Cloud login, set up your Kubernetes cluster and application. A complete step-by-step instructions blog is available here. This also provides an overview of how to see Kubernetes cluster metrics in Elastic and how to monitor them with dashboards.

Table of Contents

    • Step 1: Azure OpenAI Service and authorization
    • Step 2: Identifying Kubernetes errors in Elastic logs
    • Step 3: Configuring the watcher with the right script
    • Step 4. Output from Azure OpenAI/OpenAI vs. ChatGPT UI
  • Gaining other insights beyond Kubernetes logs
  • Conclusion
  • Appendix

Step 1: Azure OpenAI Service and authorization

When you log in to your Azure subscription and set up an instance of Azure OpenAI Service, you will be able to get your keys under Manage Keys.

microsoft azure manage keys

There are two keys for your OpenAI instance, but you only need KEY 1 .

pme openai keys and endpoint
Used with permission from Microsoft.

Additionally, you will need to get the service URL. See the image above with our service URL blanked out to understand where to get the KEY 1 and URL. 

If you are not using Azure OpenAI Service and the standard OpenAI service, then you can get your keys at:

https://platform.openai.com/account/api-keys
api keys

You will need to create a key and save it. Once you have the key, you can go to Step 2.

Step 2: Identifying Kubernetes errors in Elastic logs

As your Kubernetes cluster is running, Elastic’s Kubernetes integration running on the Elastic agent daemon set on your cluster is sending logs and metrics to Elastic. The telemetry is ingested, processed, and indexed. Kubernetes logs are stored in an index called .ds-logs-kubernetes.container_logs-default-* (* is for the date), and an automatic data stream logs-kubernetes.container_logs is also pre-loaded. So while you can use some of the out-of-the-box dashboards to investigate the metrics, you can also look at all the logs in Elastic Discover. 

While any error from Kubernetes can be daunting, the more nuanced issues occur with errors from the pods running in the kube-system namespace. Take the pod konnectivity agent, which is essentially a network proxy agent running on the node to help establish tunnels and is a vital component in Kubernetes. Any error will cause the cluster to have connectivity issues and lead to a cascade of issues, so it’s important to understand and troubleshoot these errors.

When we filter out for error logs from the konnectivity agent, we see a good number of errors.

expanded document

But unfortunately, we still can’t understand what these errors mean. 

Enter OpenAI to help us understand the issue better. Generally, you would take the error message from Discover and paste it with a question in ChatGPT (or run a Google search on the message).

One error in particular that we’ve run into but do not understand is:

E0510 02:51:47.138292       1 client.go:388] could not read stream err=rpc error: code = Unavailable desc = error reading from server: read tcp 10.120.0.8:46156->35.230.74.219:8132: read: connection timed out serverID=632d489f-9306-4851-b96b-9204b48f5587 agentID=e305f823-5b03-47d3-a898-70031d9f4768

The OpenAI output is as follows:

openai output

ChatGPT has given us a fairly nice set of ideas on why this rpc error is occurring against our konnectivity-agent.

So how can we get this output automatically for any error when those errors occur?

Step 3: Configuring the watcher with the right script

What is an Elastic watcher? Watcher is an Elasticsearch feature that you can use to create actions based on conditions, which are periodically evaluated using queries on your data. Watchers are helpful for analyzing mission-critical and business-critical streaming data. For example, you might watch application logs for errors causing larger operational issues.

Once a watcher is configured, it can be:

  1. Manually triggered
  2. Run periodically 
  3. Created using a UI or a script

In this scenario, we will use a script, as we can modify it easily and run it as needed. 

We’re using the DevTools Console to enter the script and test it out:

test script

The script is listed at the end of the blog in the appendix. It can also be downloaded here. 

The script does the following:

1. It runs continuously every five minutes.
2. It will search the logs for errors from the container konnectivity-agent
. 
3. It will take the first error’s message, transform it (re-format and clean up), and place it into a variable first_hit
.

"script": "return ['first_hit': ctx.payload.first.hits.hits.0._source.message.replace('\"', \"\")]"

4. The error message is sent into OpenAI with a query:

 What are the potential reasons for the following kubernetes error: {{ctx.payload.second.first_hit}}

5. If the search yielded an error, it will proceed to then create an index and place the error message, pod.name (which is konnectivity-agent-6676d5695b-ccsmx in our setup), and OpenAI output into a new index called chatgpt_k8_analyzed.

To see the results, we created a new data view called chatgpt_k8_analyzed against the newly created index:

edit data view

In Discover, the output on the data view provides us with the analysis of the errors.

analysis of errors

For every error the script sees in the five minute interval, it will get an analysis of the error. We could alternatively also use a range as needed to analyze during a specific time frame. The script would just need to be modified accordingly.

Step 4. Output from Azure OpenAI/OpenAI vs. ChatGPT UI

As you noticed above, we got relatively the same result from the Azure OpenAI API call as we did by testing out our query in the ChatGPT UI. This is because we configured the API call to run the same/similar model as what was selected in the UI.

For the API call, we used the following parameters:

"request": {
             "method" : "POST",
             "Url": "https://XXX.openai.azure.com/openai/deployments/pme-gpt-35-turbo/chat/completions?api-version=2023-03-15-preview",
             "headers": {"api-key" : "XXXXXXX",
                         "content-type" : "application/json"
                        },
             "body" : "{ \"messages\": [ { \"role\": \"system\", \"content\": \"You are a helpful assistant.\"}, { \"role\": \"user\", \"content\": \"What are the potential reasons for the following kubernetes error: {{ctx.payload.second.first_hit}}\"}], \"temperature\": 0.5, \"max_tokens\": 2048}" ,
              "connection_timeout": "60s",
               "read_timeout": "60s"
                            }

By setting the role: system with You are a helpful assistant and using the gpt-35-turbo url portion, we are essentially setting the API to use the davinci model, which is the same as the ChatGPT UI model set by default.

Additionally, for Azure OpenAI Service, you will need to set the URL to something similar the following:

https://YOURSERVICENAME.openai.azure.com/openai/deployments/pme-gpt-35-turbo/chat/completions?api-version=2023-03-15-preview

If you use OpenAI (versus Azure OpenAI Service), the request call (against https://api.openai.com/v1/completions) would be as such:

"request": {
            "scheme": "https",
            "host": "api.openai.com",
            "port": 443,
            "method": "post",
            "path": "\/v1\/completions",
            "params": {},
            "headers": {
               "content-type": "application\/json",
               "authorization": "Bearer YOUR_ACCESS_TOKEN"
                        },
            "body": "{ \"model\": \"text-davinci-003\",  \"prompt\": \"What are the potential reasons for the following kubernetes error: {{ctx.payload.second.first_hit}}\",  \"temperature\": 1,  \"max_tokens\": 512,     \"top_p\": 1.0,      \"frequency_penalty\": 0.0,   \"presence_penalty\": 0.0 }",
            "connection_timeout_in_millis": 60000,
            "read_timeout_millis": 60000
          }Read more

Gaining other insights beyond Kubernetes logs

Now that the script is up and running, you can modify it using different:

  • Inputs
  • Conditions
  • Actions
  • Transforms

Learn more on how to modify it here. Some examples of modifications could include:

  1. Look for error logs from application components (e.g., cartService, frontEnd, from the OTel demo), cloud service providers (e.g., AWS/Azure/GCP logs), and even logs from components such as Kafka, databases, etc.
  2. Vary the time frame from running continuously to running over a specific range. 
  3. Look for specific errors in the logs.
  4. Query for analysis on a set of errors at once versus just one, which we demonstrated.

The modifications are endless, and of course you can run this with OpenAI rather than Azure OpenAI Service.

Conclusion

I hope you’ve gotten an appreciation for how Elastic Observability can help you connect to OpenAI services (Azure OpenAI, as we showed, or even OpenAI) to better analyze an error log message instead of having to run several Google searches and hunt for possible insights.

Here’s a quick recap of what we covered:

  • Developing an Elastic watcher script that can be used to find and send Kubernetes errors into OpenAI and insert them into a new index
  • Configuring Azure OpenAI Service or OpenAI with the right  authorization and request parameters

Ready to get started? Sign up for Elastic Cloud and try out the features and capabilities I’ve outlined above to get the most value and visibility out of your OpenTelemetry data.

Appendix

Watcher script

PUT _watcher/watch/chatgpt_analysis
{
    "trigger": {
      "schedule": {
        "interval": "5m"
      }
    },
    "input": {
      "chain": {
          "inputs": [
              {
                  "first": {
                      "search": {
                          "request": {
                              "search_type": "query_then_fetch",
                              "indices": [
                                "logs-kubernetes*"
                              ],
                              "rest_total_hits_as_int": true,
                              "body": {
                                "query": {
                                  "bool": {
                                    "must": [
                                      {
                                        "match": {
                                          "kubernetes.container.name": "konnectivity-agent"
                                        }
                                      },
                                      {
                                        "match" : {
                                          "message":"error"
                                        }
                                      }
                                    ]
                                  }
                                },
                                "size": "1"
                              }
                            }
                        }
                    }
                },
                {
                    "second": {   
                        "transform": {
                            "script": "return ['first_hit': ctx.payload.first.hits.hits.0._source.message.replace('\"', \"\")]"
                        }
                    }
                },
                {
                    "third": {
                        "http": {
                            "request": {
                                "method" : "POST",
                                "url": "https://XXX.openai.azure.com/openai/deployments/pme-gpt-35-turbo/chat/completions?api-version=2023-03-15-preview",
                                "headers": {
                                    "api-key" : "XXX",
                                    "content-type" : "application/json"
                                },
                                "body" : "{ \"messages\": [ { \"role\": \"system\", \"content\": \"You are a helpful assistant.\"}, { \"role\": \"user\", \"content\": \"What are the potential reasons for the following kubernetes error: {{ctx.payload.second.first_hit}}\"}], \"temperature\": 0.5, \"max_tokens\": 2048}" ,
                                "connection_timeout": "60s",
                                "read_timeout": "60s"
                            }
                        }
                    }
                }
            ]
        }
    },
    "condition": {
      "compare": {
        "ctx.payload.first.hits.total": {
          "gt": 0
        }
      }
    },
    "actions": {
        "index_payload" : {
            "transform": {
                "script": {
                    "source": """
                        def payload = [:];
                        payload.timestamp = new Date();
                        payload.pod_name = ctx.payload.first.hits.hits[0]._source.kubernetes.pod.name;
                        payload.error_message = ctx.payload.second.first_hit;
                        payload.chatgpt_analysis = ctx.payload.third.choices[0].message.content;
                        return payload;
                    """
                }
            },
            "index" : {
                "index" : "chatgpt_k8s_analyzed"
            }
        }
    }
}Read more

In this blog post, we may have used third party generative AI tools, which are owned and operated by their respective owners. Elastic does not have any control over the third party tools and we have no responsibility or liability for their content, operation or use, nor for any loss or damage that may arise from your use of such tools. Please exercise caution when using AI tools with personal, sensitive or confidential information. Any data you submit may be used for AI training or other purposes. There is no guarantee that information you provide will be kept secure or confidential. You should familiarize yourself with the privacy practices and terms of use of any generative AI tools prior to use. 

Elastic, Elasticsearch and associated marks are trademarks, logos or registered trademarks of Elasticsearch N.V. in the United States and other countries. All other company and product names are trademarks, logos or registered trademarks of their respective owners.

Screenshots of Microsoft products used with permission from Microsoft.

icon-toc-16-blue.svg



Source link

Share this:

  • Facebook
  • Twitter
  • Pinterest
  • LinkedIn

Filed Under: News Tagged With: Elastic, errors, Gain, Insights, Kubernetes, logs, Observability, OpenAI

Special Offers

  • HP ProDesk 600G5 Desktop Hexa Core Intel i5 (3.2GHz) 16GB DDR4 RAM 500GB SSD Windows 11 Pro (Refurbished) for $584

    HP ProDesk 600G5 Desktop Hexa Core Intel i5 (3.2GHz) 16GB DDR4 RAM 500GB SSD Windows 11 Pro (Refurbished) for $584
  • Dell OptiPlex 7010 RGB Desktop Quad Core Intel i5 (3.2GHz) 8GB DDR3 RAM 250GB SSD Windows 10 Pro (Refurbished) for $162

    Dell OptiPlex 7010 RGB Desktop Quad Core Intel i5 (3.2GHz) 8GB DDR3 RAM 250GB SSD Windows 10 Pro (Refurbished) for $162
  • Dell OptiPlex 5040 (RGB) Desktop Quad Core Intel i5 (3.2GHz) 16GB DDR3 RAM 500GB SSD Windows 10 Pro (Refurbished) for $249

    Dell OptiPlex 5040 (RGB) Desktop Quad Core Intel i5 (3.2GHz) 16GB DDR3 RAM 500GB SSD Windows 10 Pro (Refurbished) for $249
  • Zerrio: The Ultimate All-In-One Business Management Toolkit (Lifetime Subscription) for $59

    Zerrio: The Ultimate All-In-One Business Management Toolkit (Lifetime Subscription) for $59
  • DNS FireWall: Lifetime Subscription for $59

    DNS FireWall: Lifetime Subscription for $59

Reader Interactions

Leave a Reply Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Primary Sidebar

  • Facebook
  • GitHub
  • Instagram
  • Pinterest
  • Twitter
  • YouTube

More to See

HP ProDesk 600G5 Desktop Hexa Core Intel i5 (3.2GHz) 16GB DDR4 RAM 500GB SSD Windows 11 Pro (Refurbished) for $584

Jun 7, 2023 By iHash

Dell OptiPlex 7010 RGB Desktop Quad Core Intel i5 (3.2GHz) 8GB DDR3 RAM 250GB SSD Windows 10 Pro (Refurbished) for $162

Jun 7, 2023 By iHash

Tags

* Apple Cisco computer security cyber attacks cyber crime cyber news cybersecurity Cyber Security cyber security news cyber security news today cyber security updates cyber threats cyber updates data data breach data breaches google hacker hacker news Hackers hacking hacking news how to hack incident response information security iOS 7 iOS 8 iPhone Malware microsoft network security ransomware ransomware malware risk management Secure security security breaches security vulnerabilities software vulnerability the hacker news Threat update video web applications

Latest

The Role of the Ransomware Negotiator

Get exclusive insights from a real ransomware negotiator who shares authentic stories from network hostage situations and how he managed them. The Ransomware Industry Ransomware is an industry. As such, it has its own business logic: organizations pay money, in crypto-currency, in order to regain control over their systems and data. This industry’s landscape is […]

Dell OptiPlex 5040 (RGB) Desktop Quad Core Intel i5 (3.2GHz) 16GB DDR3 RAM 500GB SSD Windows 10 Pro (Refurbished) for $249

Expires April 11, 2123 22:07 PST Buy now and get 52% off KEY FEATURES The Dell OptiPlex 5040 Desktop is the perfect home or office PC for professional multitasking, high-speed web browsing, and multimedia applications. Powered by a Quad-Core Intel i5 processor running at 3.2GHz, this refurbished desktop boasts 16GB DDR3 RAM and a 500GB […]

Apple announces winners of the 2023 Apple Design Awards

June 5, 2023 UPDATE Apple announces winners of the 2023 Apple Design Awards At WWDC23, winners are recognized for excellence in innovation, ingenuity, and technical achievement in app and game design Today, Apple proudly unveiled the winners of its annual Apple Design Awards, celebrating 12 best-in-class apps and games. This year’s winners, spanning development teams around […]

DNS FireWall: Lifetime Subscription for $59

Expires June 04, 2024 23:59 PST Buy now and get 70% off KEY FEATURES DNS Firewall is a security app developed to protect users from online threats, such as malware, phishing, and botnets. It operates at the DNS level, filtering and blocking malicious websites before they can reach the user’s network. DNS Firewall maintains an […]

KeepSolid SmartDNS: Lifetime Subscription for $59

Expires June 05, 2024 23:59 PST Buy now and get 70% off KEY FEATURES Seamless streaming made possible! SmartDNS is a solution to bypass geo-restrictions and access regionally blocked content on various streaming platforms and video services. By redirecting DNS queries through their SmartDNS servers, users can unlock access to content that is typically unavailable […]

Dotan Horovits

From Spotify to Open Source: The Backstory of Backstage

Technology juggernauts–despite their larger staffs and budgets–still face the “cognitive load” for DevOps that many organizations deal with day-to-day. That’s what led Spotify to build Backstage, which supports DevOps and platform engineering practices for the creation of developer portals. Eventually, Spotify made the decision to open source Backstage and donate it to the Cloud Native […]

Jailbreak

Pangu Releases Updated Jailbreak of iOS 9 Pangu9 v1.2.0

Pangu has updated its jailbreak utility for iOS 9.0 to 9.0.2 with a fix for the manage storage bug and the latest version of Cydia. Change log V1.2.0 (2015-10-27) 1. Bundle latest Cydia with new Patcyh which fixed failure to open url scheme in MobileSafari 2. Fixed the bug that “preferences -> Storage&iCloud Usage -> […]

Apple Blocks Pangu Jailbreak Exploits With Release of iOS 9.1

Apple has blocked exploits used by the Pangu Jailbreak with the release of iOS 9.1. Pangu was able to jailbreak iOS 9.0 to 9.0.2; however, in Apple’s document on the security content of iOS 9.1, PanguTeam is credited with discovering two vulnerabilities that have been patched.

Pangu Releases Updated Jailbreak of iOS 9 Pangu9 v1.1.0

  Pangu has released an update to its jailbreak utility for iOS 9 that improves its reliability and success rate.   Change log V1.1.0 (2015-10-21) 1. Improve the success rate and reliability of jailbreak program for 64bit devices 2. Optimize backup process and improve jailbreak speed, and fix an issue that leads to fail to […]

Activator 1.9.6 Released With Support for iOS 9, 3D Touch

  Ryan Petrich has released Activator 1.9.6, an update to the centralized gesture, button, and shortcut manager, that brings support for iOS 9 and 3D Touch.

Copyright iHash.eu © 2023
We use cookies on this website. By using this site, you agree that we may store and access cookies on your device. Accept Read More
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT