• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Home
  • Contact Us

iHash

News and How to's

  • MEAZOR 3D Laser Measurer for $299

    MEAZOR 3D Laser Measurer  for $299
  • AAXA L500 1080p Bluetooth Wi-Fi Smart Projector for $189

    AAXA L500 1080p Bluetooth Wi-Fi Smart Projector for $189
  • AI-Powered Productivity & Learning Bundle for $29

    AI-Powered Productivity & Learning Bundle for $29
  • Flux 7 TWS Earbuds w/ Wireless Charging Case & Power Bank (White/2-Pack) for $39

    Flux 7 TWS Earbuds  w/ Wireless Charging Case & Power Bank (White/2-Pack) for $39
  • Mashvisor: Lifetime Subscription for $39

    Mashvisor: Lifetime Subscription for $39
  • News
    • Rumor
    • Design
    • Concept
    • WWDC
    • Security
    • BigData
  • Apps
    • Free Apps
    • OS X
    • iOS
    • iTunes
      • Music
      • Movie
      • Books
  • How to
    • OS X
      • OS X Mavericks
      • OS X Yosemite
      • Where Download OS X 10.9 Mavericks
    • iOS
      • iOS 7
      • iOS 8
      • iPhone Firmware
      • iPad Firmware
      • iPod touch
      • AppleTV Firmware
      • Where Download iOS 7 Beta
      • Jailbreak News
      • iOS 8 Beta/GM Download Links (mega links) and How to Upgrade
      • iPhone Recovery Mode
      • iPhone DFU Mode
      • How to Upgrade iOS 6 to iOS 7
      • How To Downgrade From iOS 7 Beta to iOS 6
    • Other
      • Disable Apple Remote Control
      • Pair Apple Remote Control
      • Unpair Apple Remote Control
  • Special Offers
  • Contact us

Brief History of LLMs – insideBIGDATA

Jul 17, 2023 by iHash Leave a Comment

By Matt Casey, Snorkel AI

Large language models (LLMs) have fascinated the public and upended data team priorities since ChatGPT arrived in November 2022. While the roots of technology stretch further into the past than you might think—all the way to the 1950s, when researchers at IBM and Georgetown University developed a system to automatically translate a collection of phrases from Russian to English—the modern age of large language models began only a few years ago.

The early days of natural language processing saw researchers experiment with many different approaches, including conceptual ontologies and rule-based systems. While some of these methods proved narrowly useful, none yielded robust results. That changed in the 2010s when NLP research intersected with the then-bustling field of neural networks. The collision laid the ground for the first large language models.

This post, adapted and excerpted from one on Snorkel.ai entitled “Large language models: their history, capabilities, and limitations,” follows the history of LLMs from that first intersection to their current state.

BERT, the first breakout large language model

In 2019, a team of researchers at Goole introduced BERT (which stands for bidirectional encoder representations from transformers).

Their new model combined several ideas into something surprisingly simple and powerful. By making BERT bidirectional, it allowed the inputs and outputs to take each others’ context into account. By using a neural network architecture with a consistent width throughout, the researchers allowed the model to adapt to a variety of tasks. And, by pre-training BERT in a self-supervised manner on a wide variety of unstructured data, the researchers created a model rich with the understanding of relationships between words.

All of this made it easy for researchers and practitioners to use BERT. As the original researchers explained, “the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.”

At its debut, BERT shattered the records for a suite of NLP benchmark tests. Within a short time, BERT became the standard tool for NLP tasks. Researchers adapted and built upon it in a way that made it one of the original foundation models. Less than 18 months after its debut, BERT powered nearly every English-language query processed by Google Search.

Bigger than BERT

At the time of its debut, BERT’s 340 million parameters tied it as the largest language model of its kind. (The tie was a deliberate choice; the researchers wanted it to have the same number of parameters as GPT to simplify performance comparisons.) That size is quaint according to modern comparisons.

From 2018 to the modern day, NLP researchers have engaged in a steady march toward ever-larger models. Hugging Face’s Julien Simon called this steady increase a “new Moore’s Law.”

Credit: Julien Simon, Hugging Face

As large language models grew, they improved. OpenAI’s GPT-2, finalized in 2019 at 1.5 billion parameters, raised eyebrows by producing convincing prose. GPT-2’s impressive performance gave OpenAI pause; the company announced in February of that year that it wouldn’t release the full-sized version of the model immediately, due to “concerns about large language models being used to generate deceptive, biased, or abusive language at scale.” Instead, they released a smaller, less-compelling version of the model at first, and followed up with several increasingly-large variations.

Next, OpenAI released GPT-3 in June of 2020. At 175 billion parameters, GPT-3 set the new size standard for large language models. It quickly became the focal point for large language model research and served as the original underpinning of ChatGPT.

Most recently, OpenAI debuted GPT-4. At the time of this writing, OpenAI has not publicly stated GPT-4’s parameter count, but one estimate based on conversations with OpenAI employees put it at one trillion parameters—five times the size of GPT-3 and nearly 3,000 times the size of the original large version of BERT. The gargantuan model represented a meaningful improvement over its predecessors, allowing users to process up to 50 pages of text at once and reducing the incidence of “hallucinations” that plagued GPT-3.

The ChatGPT moment

While researchers and practitioners devised and deployed variations of BERT, GPT-2, GPT-3, and T5, the public took little notice. Evidence of the models’ impacts surfaced on websites in the form of summarized reviews and better search results. The most direct examples of the existing LLMs, as far as the general public was concerned, was a scattering of news stories written in whole or in part by GPT variations.

Then OpenAI released ChatGPT in Novermber 2022. The interactive chat simulator allowed non-technical users to prompt the LLM and quickly receive a response. As the user sent additional prompts, the system would take previous prompts and responses into account, giving the interaction conversation-like continuity.

The new tool caused a stir. The scattering of LLM-aided news reports became a tidal wave, as local reporters across the U.S. produced stories about ChatGPT, most of which revealed the reporter had used ChatGPT to write a portion of the story.

Following the frenzy, Microsoft, who had partnered with OpenAI in 2019, built a version of its Bing search engine powered by ChatGPT. Meanwhile, business leaders took a sudden interest in how this technology could improve profits.

Chasing ChatGPT

Researchers and tech companies responded to the ChatGPT moment by showing their own capabilities with large language models.

In February 2023, Cohere introduced the beta version of its summarization product. The new endpoint, built on a large language model customized specifically for summarization, allowed users to enter up to 18-20 pages of text to summarize, which was considerably more than users could summarize through ChatGPT or directly through GPT-3.

A week later, Google introduced Bard, its own LLM-backed chatbot. The event announcing Bard pre-empted Microsoft and OpenAI’s first public demonstration of a new, ChatGPT-powered Bing search engine, news of which had reached publications in January.

Meta rounded out the month when it introduced LLaMA (Large Language Model Meta AI). LLaMA wasn’t a direct duplication of GPT-3 (Meta AI had introduced their direct GPT-3 competitor, OPT-175B, in May of 2020). Instead, the LLaMA project aimed to enable the research community with powerful large language models of a manageable size. LLaMA came in four size varieties, the largest of which had 65 billion parameters—which was still barely more than a third of the size of GPT-3.

In April, DataBricks released Dolly 2.0. Databricks CEO Ali Ghodsi told Bloomberg that the open-sourced LLM replicated a lot of the functionality of “these existing other models,” a not-so-subtle wink to GPT-3. In the same interview, Ghodsi noted that his company chose the name “Dolly” in honor of the cloned sheep, and because it sounded a bit like Dall-E, another prominent model from OpenAI. 

The age of giant LLMs is already over?

Shortly after OpenAI released GPT-4, OpenAI CEO Sam Altman told a crowd at the Massachusetts Institute of Technology that he thought the era of “giant, giant” models was over. The strategy of throwing ever more text at ever more neurons had reached a point of diminishing returns. Among other challenges, he said, OpenAI was bumping up against the physical limits of how many data centers the company owned or could build.

“We’ll make them better in other ways,” he told the crowd.

Altman isn’t alone. In the same piece, Wired cited agreement from Nick Frosst, a cofounder at Cohere.

The future of LLMs

If the future of LLMs isn’t about being bigger, then what is it about? It’s too early to say for certain, but the answer may be specialization, data curation, and distillation.

While the largest LLMs yield results that can look like magic at first glance, the value of their responses diminishes as their importance increases. Imagine a bank deploying a GPT model to handle customer inquiries and it hallucinating a balance. 

Machine learning practitioners can minimize the risk of off-target responses by creating a specialized version of the model with targeted pre-training and fine-tuning. That’s where data curation comes in. By building a data set of likely prompts and high-quality responses, practitioners can train the model to answer the right questions in the right way. They can use additional software layers to keep their custom LLM from responding to prompts outside of its core focus.

However, LLMs will still be very expensive. Running hundreds of billions—or even trillions—of calculations to answer a question adds up. That’s where distillation comes in. Researchers at Snorkel AI pioneered work that has shown that smaller, specialized large language models focussed on specific tasks or domains can be made more effective than their behemoth siblings on the same tasks.

Regardless of what the next chapter of LLMs looks like, there will be a next chapter. While LLM projects right now are a novelty, they will soon work their way into enterprise deployments; their value is too obvious to ignore.

Sign up for the free insideBIGDATA newsletter.

Join us on Twitter: https://twitter.com/InsideBigData1

Join us on LinkedIn: https://www.linkedin.com/company/insidebigdata/

Join us on Facebook: https://www.facebook.com/insideBIGDATANOW

Source link

Share this:

  • Facebook
  • Twitter
  • Pinterest
  • LinkedIn

Filed Under: BigData

Special Offers

  • MEAZOR 3D Laser Measurer for $299

    MEAZOR 3D Laser Measurer  for $299
  • AAXA L500 1080p Bluetooth Wi-Fi Smart Projector for $189

    AAXA L500 1080p Bluetooth Wi-Fi Smart Projector for $189
  • AI-Powered Productivity & Learning Bundle for $29

    AI-Powered Productivity & Learning Bundle for $29
  • Flux 7 TWS Earbuds w/ Wireless Charging Case & Power Bank (White/2-Pack) for $39

    Flux 7 TWS Earbuds  w/ Wireless Charging Case & Power Bank (White/2-Pack) for $39
  • Mashvisor: Lifetime Subscription for $39

    Mashvisor: Lifetime Subscription for $39

Reader Interactions

Leave a ReplyCancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Primary Sidebar

  • Facebook
  • GitHub
  • Instagram
  • Pinterest
  • Twitter
  • YouTube

More to See

26 Years Since its Inception, Postgres is Just Getting Started 

Sep 20, 2023 By iHash

iPadOS 17 is now available

Sep 20, 2023 By iHash

Tags

* Apple attacks Cisco computer security cyber attacks cyber crime cyber news cybersecurity Cyber Security cyber security news cyber security news today cyber security updates cyber threats cyber updates data data breach data breaches google hacker hacker news Hackers hacking hacking news how to hack incident response information security iOS 7 iOS 8 iPhone Malware microsoft network security ransomware ransomware malware risk management security security breaches security vulnerabilities software vulnerability the hacker news Threat update video web applications

Latest

MEAZOR 3D Laser Measurer for $299

Expires July 29, 2123 01:27 PST Buy now and get 0% off KEY FEATURES Whether you need to measure a room, calculate distances, or create complex floor plans, the MEAZOR 3D Laser Measurer is the perfect tool for the job. The MEAZOR 3D Laser Measurer is a versatile and accurate measuring tool that provides 3D […]

AAXA L500 1080p Bluetooth Wi-Fi Smart Projector for $189

Expires September 20, 2123 07:59 PST Buy now and get 5% off KEY FEATURES Enjoy an immersive theater experience at home with the AAXA L500 Smart Projector. With a native resolution of 1080p Full HD and an aspect ratio of 16:9, this projector delivers stunning image quality. The 1.2:1 throw ratio allows for flexible placement […]

Critical Security Flaws Exposed in Nagios XI Network Monitoring Software

Sep 20, 2023THNNetwork Security / Vulnerability Multiple security flaws have been disclosed in the Nagios XI network monitoring software that could result in privilege escalation and information disclosure. The four security vulnerabilities, tracked from CVE-2023-40931 through CVE-2023-40934, impact Nagios XI versions 5.11.1 and lower. Following responsible disclosure on August 4, 2023, They have been patched […]

watchOS 10 is available today

September 18, 2023 UPDATE watchOS 10 is available today Apple today released watchOS 10, a milestone update bringing a new visual language to apps to see more information at a glance, a new Smart Stack to show relevant widgets right when they’re needed, and delightful new watch faces. Bluetooth connectivity for power meters, speed sensors, […]

Mashvisor: Lifetime Subscription for $39

Expires September 20, 2123 07:59 PST Buy now and get 95% off KEY FEATURES Optimize your property analysis with accurate real estate market data. Mashvisor lets you stay ahead in real estate with up-to-date and accurate data, enabling informed decisions and maximizing opportunities. Mashvisor gives you the most up-to-date data from different trustworthy sources, many […]

Generative AI Report – 9/19/2023

Welcome to the Generative AI Report round-up feature here on insideBIGDATA with a special focus on all the new applications and integrations tied to generative AI technologies. We’ve been receiving so many cool news items relating to applications and deployments centered on large language models (LLMs), we thought it would be a timely service for […]

Jailbreak

Pangu Releases Updated Jailbreak of iOS 9 Pangu9 v1.2.0

Pangu has updated its jailbreak utility for iOS 9.0 to 9.0.2 with a fix for the manage storage bug and the latest version of Cydia. Change log V1.2.0 (2015-10-27) 1. Bundle latest Cydia with new Patcyh which fixed failure to open url scheme in MobileSafari 2. Fixed the bug that “preferences -> Storage&iCloud Usage -> […]

Apple Blocks Pangu Jailbreak Exploits With Release of iOS 9.1

Apple has blocked exploits used by the Pangu Jailbreak with the release of iOS 9.1. Pangu was able to jailbreak iOS 9.0 to 9.0.2; however, in Apple’s document on the security content of iOS 9.1, PanguTeam is credited with discovering two vulnerabilities that have been patched.

Pangu Releases Updated Jailbreak of iOS 9 Pangu9 v1.1.0

  Pangu has released an update to its jailbreak utility for iOS 9 that improves its reliability and success rate.   Change log V1.1.0 (2015-10-21) 1. Improve the success rate and reliability of jailbreak program for 64bit devices 2. Optimize backup process and improve jailbreak speed, and fix an issue that leads to fail to […]

Activator 1.9.6 Released With Support for iOS 9, 3D Touch

  Ryan Petrich has released Activator 1.9.6, an update to the centralized gesture, button, and shortcut manager, that brings support for iOS 9 and 3D Touch.

Copyright iHash.eu © 2023
We use cookies on this website. By using this site, you agree that we may store and access cookies on your device. Accept Read More
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT