• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Home
  • Contact Us

iHash

News and How to's

  • Apple iPhone XS Max (A1921) 64GB – Gold (Grade A+ Refurbished: Wi-Fi + Unlocked) for $349

    Apple iPhone XS Max (A1921) 64GB – Gold (Grade A+ Refurbished: Wi-Fi + Unlocked)  for $349
  • Apple iPhone XR (A1984) 256GB – White (Grade A+ Refurbished: Wi-Fi + Unlocked) for $329

    Apple iPhone XR (A1984) 256GB  – White (Grade A+ Refurbished: Wi-Fi + Unlocked) for $329
  • The 2024 Google Sheets Formulas & Automation Bundle for $39

    The 2024 Google Sheets Formulas & Automation Bundle for $39
  • MEAZOR 3D Laser Measurer for $299

    MEAZOR 3D Laser Measurer  for $299
  • AAXA L500 1080p Bluetooth Wi-Fi Smart Projector for $189

    AAXA L500 1080p Bluetooth Wi-Fi Smart Projector for $189
  • News
    • Rumor
    • Design
    • Concept
    • WWDC
    • Security
    • BigData
  • Apps
    • Free Apps
    • OS X
    • iOS
    • iTunes
      • Music
      • Movie
      • Books
  • How to
    • OS X
      • OS X Mavericks
      • OS X Yosemite
      • Where Download OS X 10.9 Mavericks
    • iOS
      • iOS 7
      • iOS 8
      • iPhone Firmware
      • iPad Firmware
      • iPod touch
      • AppleTV Firmware
      • Where Download iOS 7 Beta
      • Jailbreak News
      • iOS 8 Beta/GM Download Links (mega links) and How to Upgrade
      • iPhone Recovery Mode
      • iPhone DFU Mode
      • How to Upgrade iOS 6 to iOS 7
      • How To Downgrade From iOS 7 Beta to iOS 6
    • Other
      • Disable Apple Remote Control
      • Pair Apple Remote Control
      • Unpair Apple Remote Control
  • Special Offers
  • Contact us

Tackling Data Distribution Shift – insideBIGDATA

Jul 22, 2023 by iHash Leave a Comment

Data-driven decision-making, particularly through machine-learned algorithms, is more prevalent now than ever. Being data-driven only matters if you have the right data, which raises the topic of “distribution shift.” Distribution shift is a mismatch between training and real-world data, and it can arise several ways. For example, circumstances may evolve over time, resulting in changes to incoming data, or a lack of true data samples can force a company to rely on narrowly-constructed artificial datasets. Distribution shift is a real challenge that requires a thoughtful mitigation strategy, but it can be addressed.

Generalizability is a classic and persistent problem in AI
The remarkable capabilities of ChatGPT, especially in zero-shot and few-shot learning, may give the impression that the field of artificial intelligence (AI) has advanced to a stage where training data is merely a formality. But the rules for AI development haven’t changed– AI is still sensitive to spurious characteristics in its training data that can cause it to “miss the big picture”:

  • Correlations in the data that arise by chance
  • Imperceptible artifacts in images from the camera system used to take them
  • Specific phrases disproportionately present in training text

Gradient descent-based training can exhibit a greedy nature that exploits these spurious characteristics. As a result, training may appear successful, but the model, latching onto these spurious features, fails to generalize adequately to real-world data that lacks them. In this sense, the model “misses the big picture.”

Fortunately, techniques like dropout, gradient clipping, multi-task learning, data augmentation, and large-scale pretraining can help to overcome these data limitations to improve generalizability. However, distribution shift is a greater challenge as it results from a fundamental information gap between training and real-world data.

What is distribution shift?

Imagine training an algorithm to differentiate drivers’ licenses from various states. If the model was trained on decades-old drivers’ licenses and applied to contemporary formats, how effective would it be? If the training set was heavily imbalanced, with the majority of licenses coming from high-population states and only a few from low-population states, how well would the model perform on a large collection from the latter?

This example captures the two types of distribution shift: one resulting from changes over time, and another resulting from disparities in data proportions. Despite the difficulties this presents to an AI system, humans effortlessly make sense of varying license formats, including new ones. How can we train our AI systems to adapt to these distribution shifts nearly as effectively?

Do you have a distribution shift problem?

One straightforward way to detect distribution shift is through its deleterious effects on accuracy. Another strategy is to examine the distribution of class labels on real-world data and compare it with the distribution of class labels on the training data. If significant deviations are present, there is likely a distribution shift.

In addition to analyzing class labels, distribution shift can be detected by model confidence levels. Even if the distribution of the class labels does not significantly change, the confidences behind those labels may. If the distribution of confidences exhibits a significant change, it might indicate the presence of distribution shift.

Statistics of the data independent of the model can be considered as well. This can be done by analyzing manually-developed features or features from an unsupervised model. These can increase the chances of finding even subtle distribution shifts over time.

Mitigating distribution shift

If distribution shifts in time are a concern, retraining can serve as a reliable mitigation strategy. This can be done periodically or when distribution shift is detected.

However, if distribution shift results from lacking sufficient real-world data, different strategies are needed. These strategies require some knowledge about the expected types of distribution shifts to make up for the information gap. We can demonstrate two straight-forward strategies using the prior case of drivers licenses:

  1. Apply a two-stage classification algorithm of (1) optical character recognition (OCR) to extract text followed by (2) a classifier to identify the issuing state based on that text.
  2. Augment the training dataset with synthetic data. The synthetic data can be made by moving elements of a driver’s license to various positions, altering foreground and background imagery, and incorporating other variations that appear to be reasonable modifications to a driver’s license.

In the first case, we are using a feature (the license’s text) that we expect to be immune from distribution shifts over time. In the second case, we are leveraging our knowledge of drivers’ license formats to create new formats that are within reason. In both cases, we are explicitly accounting for an information gap by incorporating information from our intuitive understanding of real-world data.

We live in a world run by technology where being data-driven is synonymous with success. Data-driven machine-learned algorithms can be essential to accuracy in important situations, but also for routine frontline work allowing companies to produce real-time insights that can help predict and improve performance. As AI systems continue to become more prevalent in decision making, it is increasingly important for companies to understand both their limitations and the strategies to mitigate those limitations so that they can be effectively leveraged for sustained growth.

About the Author

Michael Rinehart is the VP of Artificial Intelligence at Securiti.ai, a unified data control company that manages security, compliance, and privacy risks. Throughout his career, Michael has deployed machine learning and data science systems to numerous domains, including Internet security, health care, power electronics, automotives and marketing. Prior to joining Securiti, he led the research and development of a machine learning-based wireless communications jamming technology at BAE Systems. Michael has also held roles in cloud security, big data and engineering at companies including Elastica (acquired by Symantec) and Verizon. Michael holds a Ph.D. in electrical engineering from MIT.

Sign up for the free insideBIGDATA newsletter.

Join us on Twitter: https://twitter.com/InsideBigData1

Join us on LinkedIn: https://www.linkedin.com/company/insidebigdata/

Join us on Facebook: https://www.facebook.com/insideBIGDATANOW

Source link

Share this:

  • Facebook
  • Twitter
  • Pinterest
  • LinkedIn

Filed Under: BigData

Special Offers

  • Apple iPhone XS Max (A1921) 64GB – Gold (Grade A+ Refurbished: Wi-Fi + Unlocked) for $349

    Apple iPhone XS Max (A1921) 64GB – Gold (Grade A+ Refurbished: Wi-Fi + Unlocked)  for $349
  • Apple iPhone XR (A1984) 256GB – White (Grade A+ Refurbished: Wi-Fi + Unlocked) for $329

    Apple iPhone XR (A1984) 256GB  – White (Grade A+ Refurbished: Wi-Fi + Unlocked) for $329
  • The 2024 Google Sheets Formulas & Automation Bundle for $39

    The 2024 Google Sheets Formulas & Automation Bundle for $39
  • MEAZOR 3D Laser Measurer for $299

    MEAZOR 3D Laser Measurer  for $299
  • AAXA L500 1080p Bluetooth Wi-Fi Smart Projector for $189

    AAXA L500 1080p Bluetooth Wi-Fi Smart Projector for $189

Reader Interactions

Leave a ReplyCancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Primary Sidebar

  • Facebook
  • GitHub
  • Instagram
  • Pinterest
  • Twitter
  • YouTube

More to See

What's New in Open Telemetry

Terraform is No Longer Open Source. Is OpenTofu (ex OpenTF) the Successor?

Sep 21, 2023 By iHash

insideBIGDATA Latest News – 9/21/2023

Sep 21, 2023 By iHash

Tags

* Apple attacks Cisco computer security cyber attacks cyber crime cyber news cybersecurity Cyber Security cyber security news cyber security news today cyber security updates cyber threats cyber updates data data breach data breaches google hacker hacker news Hackers hacking hacking news how to hack incident response information security iOS 7 iOS 8 iPhone Malware microsoft network security ransomware ransomware malware risk management security security breaches security vulnerabilities software vulnerability the hacker news Threat update video web applications

Latest

Apple iPhone XS Max (A1921) 64GB – Gold (Grade A+ Refurbished: Wi-Fi + Unlocked) for $349

Expires August 28, 2123 23:59 PST KEY FEATURES The iPhone XS Max features a 6.5-inch Super Retina display with custom-built OLED panels for an HDR display that provides the industry’s best color accuracy, true blacks, and remarkable brightness. Advanced Face ID lets you securely unlock your iPhone, log in to apps, and pay with just […]

tvOS 17 available now, bringing FaceTime to Apple TV 4K

Through the powerful integration of hardware and software, Apple TV 4K becomes an even more versatile living room device with the launch of FaceTime on tvOS 17 today, bringing new ways to connect with family and friends.1 Users can make calls directly from Apple TV 4K, or start calls on iPhone or iPad, and hand […]

Apple iPhone XR (A1984) 256GB – White (Grade A+ Refurbished: Wi-Fi + Unlocked) for $329

Expires August 28, 2123 23:59 PST Buy now and get 63% off KEY FEATURES With the iPhone XR you get a roomy 6.1-inch display, fast enough performance from Apple’s A12 Bionic processor, and good camera quality in a colorful design and affordable package. Apple has included the all-new Liquid Retina LCD as the display on […]

iPadOS 17 is now available

iPadOS 17 brings new levels of personalization and versatility to iPad, and is available today as a free software update. Users can now customize the Lock Screen with stunning wallpapers, new ways to showcase their favorite photos, and expressive fonts and colors to personalize the look of the date and time. Interactive widgets take glanceable […]

AAXA L500 1080p Bluetooth Wi-Fi Smart Projector for $189

Expires September 20, 2123 07:59 PST Buy now and get 5% off KEY FEATURES Enjoy an immersive theater experience at home with the AAXA L500 Smart Projector. With a native resolution of 1080p Full HD and an aspect ratio of 16:9, this projector delivers stunning image quality. The 1.2:1 throw ratio allows for flexible placement […]

Critical Security Flaws Exposed in Nagios XI Network Monitoring Software

Sep 20, 2023THNNetwork Security / Vulnerability Multiple security flaws have been disclosed in the Nagios XI network monitoring software that could result in privilege escalation and information disclosure. The four security vulnerabilities, tracked from CVE-2023-40931 through CVE-2023-40934, impact Nagios XI versions 5.11.1 and lower. Following responsible disclosure on August 4, 2023, They have been patched […]

Jailbreak

Pangu Releases Updated Jailbreak of iOS 9 Pangu9 v1.2.0

Pangu has updated its jailbreak utility for iOS 9.0 to 9.0.2 with a fix for the manage storage bug and the latest version of Cydia. Change log V1.2.0 (2015-10-27) 1. Bundle latest Cydia with new Patcyh which fixed failure to open url scheme in MobileSafari 2. Fixed the bug that “preferences -> Storage&iCloud Usage -> […]

Apple Blocks Pangu Jailbreak Exploits With Release of iOS 9.1

Apple has blocked exploits used by the Pangu Jailbreak with the release of iOS 9.1. Pangu was able to jailbreak iOS 9.0 to 9.0.2; however, in Apple’s document on the security content of iOS 9.1, PanguTeam is credited with discovering two vulnerabilities that have been patched.

Pangu Releases Updated Jailbreak of iOS 9 Pangu9 v1.1.0

  Pangu has released an update to its jailbreak utility for iOS 9 that improves its reliability and success rate.   Change log V1.1.0 (2015-10-21) 1. Improve the success rate and reliability of jailbreak program for 64bit devices 2. Optimize backup process and improve jailbreak speed, and fix an issue that leads to fail to […]

Activator 1.9.6 Released With Support for iOS 9, 3D Touch

  Ryan Petrich has released Activator 1.9.6, an update to the centralized gesture, button, and shortcut manager, that brings support for iOS 9 and 3D Touch.

Copyright iHash.eu © 2023
We use cookies on this website. By using this site, you agree that we may store and access cookies on your device. Accept Read More
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT