• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Home
  • Contact Us

iHash

News and How to's

  • Apple iPhone XS Max (A1921) 64GB – Gold (Grade A+ Refurbished: Wi-Fi + Unlocked) for $349

    Apple iPhone XS Max (A1921) 64GB – Gold (Grade A+ Refurbished: Wi-Fi + Unlocked)  for $349
  • Apple iPhone XR (A1984) 256GB – White (Grade A+ Refurbished: Wi-Fi + Unlocked) for $329

    Apple iPhone XR (A1984) 256GB  – White (Grade A+ Refurbished: Wi-Fi + Unlocked) for $329
  • The 2024 Google Sheets Formulas & Automation Bundle for $39

    The 2024 Google Sheets Formulas & Automation Bundle for $39
  • MEAZOR 3D Laser Measurer for $299

    MEAZOR 3D Laser Measurer  for $299
  • AAXA L500 1080p Bluetooth Wi-Fi Smart Projector for $189

    AAXA L500 1080p Bluetooth Wi-Fi Smart Projector for $189
  • News
    • Rumor
    • Design
    • Concept
    • WWDC
    • Security
    • BigData
  • Apps
    • Free Apps
    • OS X
    • iOS
    • iTunes
      • Music
      • Movie
      • Books
  • How to
    • OS X
      • OS X Mavericks
      • OS X Yosemite
      • Where Download OS X 10.9 Mavericks
    • iOS
      • iOS 7
      • iOS 8
      • iPhone Firmware
      • iPad Firmware
      • iPod touch
      • AppleTV Firmware
      • Where Download iOS 7 Beta
      • Jailbreak News
      • iOS 8 Beta/GM Download Links (mega links) and How to Upgrade
      • iPhone Recovery Mode
      • iPhone DFU Mode
      • How to Upgrade iOS 6 to iOS 7
      • How To Downgrade From iOS 7 Beta to iOS 6
    • Other
      • Disable Apple Remote Control
      • Pair Apple Remote Control
      • Unpair Apple Remote Control
  • Special Offers
  • Contact us

How to Prevent ChatGPT From Stealing Your Content & Traffic

Aug 30, 2023 by iHash Leave a Comment

ChatGPT Plugins

ChatGPT and similar large language models (LLMs) have added further complexity to the ever-growing online threat landscape. Cybercriminals no longer need advanced coding skills to execute fraud and other damaging attacks against online businesses and customers, thanks to bots-as-a-service, residential proxies, CAPTCHA farms, and other easily accessible tools.

Now, the latest technology damaging businesses’ bottom line is ChatGPT.

Not only have ChatGPT, OpenAI, and other LLMs raised ethical issues by training their models on scraped data from across the internet. LLMs are negatively impacting enterprises’ web traffic, which can be extremely damaging to business.

Table of Contents

  • 3 Risks Presented by LLMs, ChatGPT, & ChatGPT Plugins
  • 3 Most Impacted Industries
  • How ChatGPT Gets Training Data
  • 3 Ways to Block CCBot
  • Scrapers Can Always Find Workarounds
  • Using Plugins to Access Live Data
  • How to Identify ChatGPT Plugin Requests
  • How to Block ChatGPT Plugin Requests
  • Determining Your Next Steps

3 Risks Presented by LLMs, ChatGPT, & ChatGPT Plugins

Among the threats ChatGPT and ChatGPT plugins can pose against online businesses, there are three key risks we will focus on:

  1. Content theft (or republishing data without permission from the original source)can hurt the authority, SEO rankings, and perceived value of your original content.
  2. Reduced traffic to your website or app becomes problematic, as users getting answers directly through ChatGPT and its plugins no longer need to find or visit your pages.
  3. Data breaches, or even the accidental broad distribution of sensitive data, are becoming more likely by the second. Not all “public-facing” data is intended to be redistributed or shared outside of the original context, but scrapers do not know the difference. The results can include anything from a loss in competitive advantage to severe damages to your brand reputation.

Depending on your business model, your company should consider ways to opt out of having your data used to train LLMs.

3 Most Impacted Industries

The most at-risk industries for ChatGPT-driven damage are those in which data privacy is a top concern, unique content and intellectual property are key differentiators, and ads, eyes, and unique visitors are an important source of revenue. These industries include:

  1. E-Commerce: Product descriptions and pricing models can be key differentiators.
  2. Streaming, Media, & Publishing: All about providing the audience with unique, creative, and entertaining content.
  3. Classified Ads: Pay per click (PPC) advertising revenue can be severely impacted by a decrease in website traffic (as well as other bot issues like click fraud or skewed site analytics due to scrapers).
UPCOMING WEBINAR

Guard Your Brand: Defending Against ChatGPT’s Content Scraping

Worried about ChatGPT scraping your content? Learn how to outsmart AI bots, defend your content, and secure your web traffic.

Join the Session

How ChatGPT Gets Training Data

According to a research paper published by OpenAI, ChatGPT3 was trained on several datasets:

  • Common Crawl
  • WebText2
  • Books1 and Books2
  • Wikipedia

The largest amount of training data comes from Common Crawl, which provides access to web information through an open repository of web crawl data. The Common Crawl crawler bot, also known as CCBot, leverages Apache Nutch to enable developers to build large-scale scrapers.

The most current version of CCBot crawls from Amazon AWS and identifies itself with a user agent of ‘CCBot/2.0’. But businesses who want to allow CCBot should not rely solely on the user agent to identify it, because many bad bots spoof their user agents to disguise themselves as good bots and avoid being blocked.

To allow CCBot on your website, use attributes such as IP ranges or reverse DNS. To block ChatGPT, your website should, at minimum, block traffic from CCBot.

3 Ways to Block CCBot

  1. Robots.txt: Since CCBot respects robots.txt files, you can block it with the following lines of code:
  2. User-agent: CCBot
    Disallow: /

  3. Blocking CCBot User Agent: You can safely block an unwanted bot through user agent. (Not that, in contrast, allowing bot traffic through user agent can be unsafe, easily abused by attackers.)
  4. Bot Management Software: Whether it’s for ChatGPT or a dark web database, the best way to prevent bots from scraping your websites, apps, and APIs is with specialized bot protection that uses machine learning to keep up with evolving threat tactics in real time.

Scrapers Can Always Find Workarounds

LLMs use scraper bots to gather training data. While blocking CCBot might be effective for blocking ChatGPT scrapers today, there is no telling what the future holds for LLM scrapers. Moving forward, if too many websites block OpenAI (for example) from accessing their content, the developers could decide to stop respecting robots.txt and could stop declaring their crawler identity in the user agent.

Another possibility is OpenAI could use its partnership with Microsoft to access Microsoft Bing’s scraper data, making the situation more challenging for website owners. Bing’s bots identify as Bingbot, but blocking them could cause problems by preventing your site from being indexed on the Bing search engine, resulting in fewer human visitors.

You could face similar issues by blocking Google’s LLM Bard (competitor to ChatGPT). Google is vague about the origin and collection of the public data used to train Bard, but it is possible that Bard is, or will be, trained with data collected by Googlebot scrapers. Like with Bingbot, blocking Googlebot would likely be unwise, impacting how your website gets indexed and how the Google search engine drives traffic to your site. The result could mean a serious drop in visitors.

Using Plugins to Access Live Data

One of the main limits of models like ChatGPT is the lack of access to live data. Since it was trained on a dataset that stops in 2021, it is unable to provide the most relevant, up-to-date information. That’s where plugins come in.

Plugins are used to connect LLMs like ChatGPT to external tools and allow the LLMs to access external data available online, which can include private data and real-time news. Plugins also let users complete actions online (e.g. booking a flight or ordering groceries) through API calls.

Some businesses are developing their own plugins to provide a new way for users to interact with their content/services via ChatGPT. But, depending on your industry, letting users interact with your website through third-party ChatGPT plugins can mean fewer ads seen by your users, as well as lower traffic to your website.

You may also notice that users are less willing to pay for your premium features once your features can be replicated through third-party ChatGPT plugins. For example, an unofficial web client interacting with your site could offer premium features through their UI.

How to Identify ChatGPT Plugin Requests

OpenAI documentation states that requests with a specific user agent HTTP header (with token: “ChatGPT-User”) come from ChatGPT plugins. But the documentation does not state that the disclosed user agent is the only user agent that can be used by plugins when making HTTP requests.

Therefore, as ChatGPT plugins interact with third-party APIs, the APIs can then do any kind of HTTP requests from their own infrastructure. The diagram below shows what happens when a fictitious “Live Sport Plugin” is used with ChatGPT to get an update about a sporting event.

ChatGPT Plugins
  1. ChatGPT triggers the Live Sport Plugin, making a request to the API endpoints based on parameters from the user prompt.
  2. The plugin makes an HTTP request to scrape a sports website to get the latest information about the event.
  3. The information is then passed back to the end user through ChatGPT.

A plugin can actually make a request to a sport API without having to scrape the sports website. In fact, when requests are made directly from the server hosting the plugin API, there is no constraint on the user agent.

How to Block ChatGPT Plugin Requests

In a process similar to blocking ChatGPT’s web scrapers, you can block requests from plugins that declare their presence with the “ChatGPT-User” substring by user agent. But blocking the user agent could also block ChatGPT users with the “browsing” mode activated. And, contrary to what OpenAI documentation might indicate, blocking requests from “ChatGPT-User” does not guarantee that ChatGPT and its plugins can’t reach your data under different user agent tokens.

In fact, ChatGPT plugins can make requests directly from the servers hosting their APIs using any user agent, and even using automated (headless) browsers. Detecting plugins that do not declare their identity in the user agent requires advanced bot detection techniques.

Determining Your Next Steps

Obtaining high-quality datasets of human-generated content will remain of critical importance to LLMs. In the long term, companies like OpenAI (funded partially by Microsoft) and Google may be tempted to use Bingbots and Googlebots to build datasets to train their LLMs. That would make it more difficult for websites to simply opt out of having their data collected, since most online businesses rely heavily on Bing and Google to index their content and drive traffic to their site.

Websites with valuable data will either want to look for ways to monetize the use of their data or opt out of AI model training to avoid losing web traffic and ad revenue to ChatGPT and its plugins. If you wish to opt out, you’ll need advanced bot detection techniques, such as fingerprinting, proxy detection, and behavioral analysis, to stop bots before they can access your data.

Advanced solutions for bot and fraud protection leverage AI and machine learning (ML) to detect and stop unfamiliar bots from the first request, keeping your content safe from LLM scrapers, unknown plugins, and other rapidly evolving AI technologies.

Note: This article is expertly written and contributed by Antoine Vastel, PhD, Head of Research at DataDome.

Found this article interesting? Follow us on Twitter  and LinkedIn to read more exclusive content we post.

Source link

Share this:

  • Facebook
  • Twitter
  • Pinterest
  • LinkedIn

Filed Under: Security Tagged With: ChatGPT, computer security, Content, cyber attacks, cyber news, cyber security news, cyber security news today, cyber security updates, cyber updates, data breach, hacker news, hacking news, how to hack, information security, network security, Prevent, ransomware malware, software vulnerability, stealing, the hacker news, Traffic

Special Offers

  • Apple iPhone XS Max (A1921) 64GB – Gold (Grade A+ Refurbished: Wi-Fi + Unlocked) for $349

    Apple iPhone XS Max (A1921) 64GB – Gold (Grade A+ Refurbished: Wi-Fi + Unlocked)  for $349
  • Apple iPhone XR (A1984) 256GB – White (Grade A+ Refurbished: Wi-Fi + Unlocked) for $329

    Apple iPhone XR (A1984) 256GB  – White (Grade A+ Refurbished: Wi-Fi + Unlocked) for $329
  • The 2024 Google Sheets Formulas & Automation Bundle for $39

    The 2024 Google Sheets Formulas & Automation Bundle for $39
  • MEAZOR 3D Laser Measurer for $299

    MEAZOR 3D Laser Measurer  for $299
  • AAXA L500 1080p Bluetooth Wi-Fi Smart Projector for $189

    AAXA L500 1080p Bluetooth Wi-Fi Smart Projector for $189

Reader Interactions

Leave a ReplyCancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Primary Sidebar

  • Facebook
  • GitHub
  • Instagram
  • Pinterest
  • Twitter
  • YouTube

More to See

What's New in Open Telemetry

Terraform is No Longer Open Source. Is OpenTofu (ex OpenTF) the Successor?

Sep 21, 2023 By iHash

insideBIGDATA Latest News – 9/21/2023

Sep 21, 2023 By iHash

Tags

* Apple attacks Cisco computer security cyber attacks cyber crime cyber news cybersecurity Cyber Security cyber security news cyber security news today cyber security updates cyber threats cyber updates data data breach data breaches google hacker hacker news Hackers hacking hacking news how to hack incident response information security iOS 7 iOS 8 iPhone Malware microsoft network security ransomware ransomware malware risk management security security breaches security vulnerabilities software vulnerability the hacker news Threat update video web applications

Latest

Apple iPhone XS Max (A1921) 64GB – Gold (Grade A+ Refurbished: Wi-Fi + Unlocked) for $349

Expires August 28, 2123 23:59 PST KEY FEATURES The iPhone XS Max features a 6.5-inch Super Retina display with custom-built OLED panels for an HDR display that provides the industry’s best color accuracy, true blacks, and remarkable brightness. Advanced Face ID lets you securely unlock your iPhone, log in to apps, and pay with just […]

tvOS 17 available now, bringing FaceTime to Apple TV 4K

Through the powerful integration of hardware and software, Apple TV 4K becomes an even more versatile living room device with the launch of FaceTime on tvOS 17 today, bringing new ways to connect with family and friends.1 Users can make calls directly from Apple TV 4K, or start calls on iPhone or iPad, and hand […]

Apple iPhone XR (A1984) 256GB – White (Grade A+ Refurbished: Wi-Fi + Unlocked) for $329

Expires August 28, 2123 23:59 PST Buy now and get 63% off KEY FEATURES With the iPhone XR you get a roomy 6.1-inch display, fast enough performance from Apple’s A12 Bionic processor, and good camera quality in a colorful design and affordable package. Apple has included the all-new Liquid Retina LCD as the display on […]

iPadOS 17 is now available

iPadOS 17 brings new levels of personalization and versatility to iPad, and is available today as a free software update. Users can now customize the Lock Screen with stunning wallpapers, new ways to showcase their favorite photos, and expressive fonts and colors to personalize the look of the date and time. Interactive widgets take glanceable […]

AAXA L500 1080p Bluetooth Wi-Fi Smart Projector for $189

Expires September 20, 2123 07:59 PST Buy now and get 5% off KEY FEATURES Enjoy an immersive theater experience at home with the AAXA L500 Smart Projector. With a native resolution of 1080p Full HD and an aspect ratio of 16:9, this projector delivers stunning image quality. The 1.2:1 throw ratio allows for flexible placement […]

Critical Security Flaws Exposed in Nagios XI Network Monitoring Software

Sep 20, 2023THNNetwork Security / Vulnerability Multiple security flaws have been disclosed in the Nagios XI network monitoring software that could result in privilege escalation and information disclosure. The four security vulnerabilities, tracked from CVE-2023-40931 through CVE-2023-40934, impact Nagios XI versions 5.11.1 and lower. Following responsible disclosure on August 4, 2023, They have been patched […]

Jailbreak

Pangu Releases Updated Jailbreak of iOS 9 Pangu9 v1.2.0

Pangu has updated its jailbreak utility for iOS 9.0 to 9.0.2 with a fix for the manage storage bug and the latest version of Cydia. Change log V1.2.0 (2015-10-27) 1. Bundle latest Cydia with new Patcyh which fixed failure to open url scheme in MobileSafari 2. Fixed the bug that “preferences -> Storage&iCloud Usage -> […]

Apple Blocks Pangu Jailbreak Exploits With Release of iOS 9.1

Apple has blocked exploits used by the Pangu Jailbreak with the release of iOS 9.1. Pangu was able to jailbreak iOS 9.0 to 9.0.2; however, in Apple’s document on the security content of iOS 9.1, PanguTeam is credited with discovering two vulnerabilities that have been patched.

Pangu Releases Updated Jailbreak of iOS 9 Pangu9 v1.1.0

  Pangu has released an update to its jailbreak utility for iOS 9 that improves its reliability and success rate.   Change log V1.1.0 (2015-10-21) 1. Improve the success rate and reliability of jailbreak program for 64bit devices 2. Optimize backup process and improve jailbreak speed, and fix an issue that leads to fail to […]

Activator 1.9.6 Released With Support for iOS 9, 3D Touch

  Ryan Petrich has released Activator 1.9.6, an update to the centralized gesture, button, and shortcut manager, that brings support for iOS 9 and 3D Touch.

Copyright iHash.eu © 2023
We use cookies on this website. By using this site, you agree that we may store and access cookies on your device. Accept Read More
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT