While the media, general public, and practitioners of Artificial Intelligence are delighting in the newfound possibilities of Chat GPT, most are missing what this application of natural language technologies means to data science.
They’ve failed to see how far this discipline has come—and what it now means to everyday users of previously arcane, advanced analytics techniques that have become normalized.
According to Abhishek Gupta, Principal Data Scientist and Engineer at Talentica Software, the underlying language model for Chat GPT is GPT-3.5. This model is more utilitarian than Chat GPT. It’s more proficient at generating software code and is applicable to a range of natural language technology tasks other than question answering and language generation, including document classification, summarization, and analysis of textual organization.
Most of all, this language model is extremely amenable to prompt engineering and few shot learning, frameworks that all but obsolete data science’s previous limitations around feature engineering and training data amounts.
By tailoring GPT-3.5 with prompt engineering and few shot learning, “Common tasks don’t require a data scientist,” Gupta pointed out. “A common person can do them just by knowing how to create the prompt and, to some extent, knowing some knowledge about GPT-3.5.”
Prompt engineering epitomizes how GPT-3.5 has revolutionized data science, making it easy for non-technical users. Before they could perform prompt engineering with this language model, expensive, hard-to-find data scientists predominantly had to build individual models for each application of natural language technologies.
But with the availability of GPT-3.5, “We can speed time-to-market now that we have this single model that we can do more intelligent prompt engineering over,” Gupta revealed. “And, it’s the same model that we can use for different tasks.” Thus, no matter how disparate the tasks—such as reading emails and writing responses or summarizing a research article in five lines—users simply have to sufficiently engineer the prompt to teach the model to perform it.
“A prompt is a certain command we give to the model,” Gupta explained. “And, in modeling the commands, we also give it certain examples which can identify patterns. Based on these commands and patterns, the model can understand what the task is all about.” For instance, one would simply have to give a model a specific text and write TL;DR (Too Long; Didn’t Read) and the model would understand that the task was text summarization—then perform it.
Prompt Engineering Stores
Prompt engineering’s capital advantage is it replaces the need to engineer features for individual models trained for one task. Feature engineering is often time consuming, arduous, and demanding of specialized statistical and coding knowledge. Conversely, any user can issue a natural language prompt, rendering this aspect of model tuning accessible to a much broader user base, including laymen. It’s effectiveness hinges upon creating the right prompt.
“If you give a good prompt, the output will be much better than a casually given prompt,” Gupta advised. “There are certain words that will help the model understand better about the task compared to other words. There are certain automated ways to create these prompts.”
A best practice for prompt engineering is to employ a prompt engineering database, which is roughly equivalent to a feature store, in that it houses prompts that can be reused and modified for different purposes. “People have come up with a database of prompts which can be used for certain tasks which are usually commonly known,” Gupta mentioned.
Few Shot Learning
In addition to giving commands via prompts, organizations can also provide examples in prompts to train GPT-3.5 for a given task. The latter is part of the few shot learning phenomenon in which the amount of training data for teaching models is lowered to a few (few shot learning), single (single shot learning) or zero (zero shot learning) examples. This example reduction is remarkable compared to all of the training data—and annotations required for training data—that can otherwise hamper machine learning tasks.
In this case, one “just gives some examples of the patterns to the model and it auto-generates similar kinds of patterns for the solution’s task,” Gupta commented. If the task is for the system to identify the capitals of every country, the user could give an example that New Delhi is the capital of India before asking for capitals of other countries. The example of this single shot learning use case would train the system, then “by giving the pattern to the model you can ask any question based on that pattern,” Gupta concluded.
Although such an example may seem trivial, it attests to the ease of use, lack of specialized knowledge, and dearth of technical skills required to tune GPT-3.5 for almost any natural language technology task. Ultimately, this utilitarian nature of GPT-3.5 evinces the effectiveness of multitask learning, and the expanding accessibility of advanced machine learning models.
About the Author
Jelani Harper is an editorial consultant servicing the information technology market. He specializes in data-driven applications focused on semantic technologies, data governance and analytics.
Sign up for the free insideBIGDATA newsletter.
Join us on Twitter: https://twitter.com/InsideBigData1
Join us on LinkedIn: https://www.linkedin.com/company/insidebigdata/
Join us on Facebook: https://www.facebook.com/insideBIGDATANOW
Leave a Reply