Welcome to insideBIGDATA’s “Heard on the Street” round-up column! In this regular feature, we highlight thought-leadership commentaries from members of the big data ecosystem. Each edition covers the trends of the day with compelling perspectives that can provide important insights to give you a competitive advantage in the marketplace. We invite submissions with a focus on our favored technology topics areas: big data, data science, machine learning, AI and deep learning. Enjoy!
Vector data will be foundational for next-generation AI applications. Commentary by Avthar Sewrathan, AI Lead at Timescale
“The rise of large language models (LLMs) like GPT-4 and Llama 2 is driving explosive growth in AI applications powered by vector data. Developers need to efficiently store and query vectors to power next-generation AI systems. Vector data is useful for tasks like text and image generation, providing LLMs long-term memory beyond the current conversation, and giving LLMs relevant context from company-specific or private datasets (via retrieval augmented generation).
There are a myriad of vector databases in the market, with new ones popping up seemingly every week. This leaves developers facing a paradox of choice. Do they adopt new, niche databases built specifically for vector data? Or do they use familiar, general-purpose databases with extensions for vector support?
I have seen this dynamic before in other markets, namely time-series. Despite the existence of niche time-series databases, millions of developers opted to use a general-purpose database in PostgreSQL with extensions for time-series. The robustness, familiarity, and ecosystem of PostgreSQL outweighed switching to a completely new database.
With vector data, developers face a similar choice. It’s currently an open question about which option developers will prefer. Developer needs for LLM applications are still emerging and as many LLM applications are still experimental or growing toward scale. The best option likely depends on the application needs beyond vectors, as well as scale and performance requirements.
But one thing is clear: vector data will be foundational for next-generation AI applications. And the rapid pace of innovation, both from open source projects like pgvector, and development teams, means that the general purpose databases of today have a strong likelihood of being a foundational part of the AI applications of the future, as they have been in applications over the past decades.”
Generative AI: Excitement or Fear? Commentary by Lucas Bonatto, Director of Engineering at Semantix
“With the rise of generative AI, jobs are going to change. OpenAI’s research estimates that 80% of jobs can incorporate it into activities that happen today in work. But allowing this prediction to fuel anxiety will only limit what humans are capable of. Many human skills remain irreplaceable and are often more pleasurable when given the freedom to explore them—creativity, critical thinking, interpersonal communication, and leadership to name a few. Those who embrace the technology and learn how to reap the benefits will find their lives easier, and their value to companies will grow.
McKinsey estimates that jobs across various sectors could gain up to 30% of their time back through generative AI tools. Instead, roles will begin to adapt and expand. Take the applied AI engineer; it’s an example of a job that the revolution has created, but a year ago, no one would know what that was. It’s an indication that although there is an impact on workers’ roles, what’s really going on is that generative AI is opening doors for people to take more strategic, analytical positions, and new jobs will appear like they are already.
Still, you don’t need to be a computer scientist to leverage the technology—it can support many job types. For instance, generative technology can help HR managers pull the skills required for positions and write job descriptions. That doesn’t mean recruiters will lose their role because someone needs to check the end product. With that human in the loop, making sure the generative AI created a job description that’s fair and inclusive, the technology and human collaboration can dramatically improve speed and quality.
So the robotic revolution is a matter of leaving our comfort zones and reskilling people to use AI effectively. AI doesn’t work by itself; those that adapt will be the key to companies using the technology successfully.”
AI Enhancing SEO: Unveiling the Truth. Commentary by Matthew Goulart, Founder of Ignite Digital
“Forget the fear-mongering, because AI is not here to kill SEO, it’s here to revolutionize it! In fact, AI is all set to boost the power of White Hat SEO, taking it to new heights of effectiveness. While there’s chatter about AI, let’s set the record straight: most AI offerings in the market are just fancy algorithms with a sparkle. But true AI is a game-changer, helping both consumers and companies seamlessly discover the precise information they crave. Say hello to the future of search!
Discover the power of AI in revolutionizing keyword selection for search engine experts. AI can identify the ideal keywords with intent, aligning corporate goals with consumer desires. Unleash the potential of AI in predicting innovative trends and identifying patterns faster than ever before. While SEO may become more complex, when harnessed correctly, it can lead to remarkable effectiveness.
AI in the SEO world is just the beginning. As True AI technology becomes more accessible to smaller SEO agencies, the competition to reach Google’s Top 10 will become tougher than ever. This means that smaller organizations might struggle to keep up. AI is not going away, so let’s embrace it and discover the thrilling possibilities it offers.”
Incorporating AI? Involve Your Security Team from the Start. Commentary by Tony Liau, VP, Product at Object First
“The rise of AI’s accessibility in 2023 has spurred departments across organizations, ranging from marketing to finance, to eagerly integrate it into their operations. This eagerness, while promising innovation and efficiency, simultaneously ushers in distinct challenges in data protection and security. Each department’s adoption of AI tools, tailored to their specific functions, can inadvertently create intricate risk profiles, potentially opening doors to unforeseen vulnerabilities.
Recognizing these challenges, the proactive involvement of security teams becomes not just beneficial but indispensable. These teams bring to the table expertise in establishing secure frameworks, ensuring robust data protection, and conducting targeted threat assessments for each AI application. To harness the transformative power of AI while safeguarding an organization’s assets, a seamless collaboration between individual departments and their security counterparts is essential.”
Embracing AI for Supply Chain Optimization. Commentary by Padhu Raman, CPO of Osa Commerce
“The rapid advancements of generative AI have left no industry untouched – and the supply chain sector is no exception. Generative AI’s ability to analyze vast datasets has many use cases in the supply chain, including identifying order patterns, predicting demand and optimizing inventory levels. With these capabilities, companies can better manage inventory levels to avoid stockouts. AI-backed insights uncover bottlenecks, recommend efficient routes and predict maintenance needs, helping companies make informed business decisions and improve supply chain operations.
While there are concerns about generative AI leading to job displacement, it is more likely to augment human capabilities rather than replace workers entirely. As a result, professionals can thrive through supply chain disruptions and achieve new levels of efficiency and success.”
Fake News Just Got an Upgrade — Generative AI’s Influence on the 2024 Election. Commentary by Ralph Pisani, President, Exabeam
“In the lead-up to the 2024 election, conditions are ripe for people to fall for something that is not real – and that’s not to fearmonger. It should go without saying, but we must use caution when evaluating information shared via social media or across the Internet and employ the principle of verifying its origin or legitimacy. This is especially important when discerning deep fakes at a time when AI has become very powerful. Beginning in 2016, we have seen just how powerful social media is for spreading falsehoods and rumors, and that leaves me wondering if the average person will use a verification process before resharing what they see or read online. Nation-states will undoubtedly seize this as an opportunity to continue influencing Americans and widening their divisions.
If they succeed again, the danger is that our country becomes more susceptible than ever to cyber threats. We need strong leaders to say that while we may not agree on every issue, we should all want to preserve our American way of life. Unfortunately, I don’t foresee an achievable consensus on regulating AI in such a divided political environment. But maybe, if we create a bipartisan task force to keep a watchful eye on the progress of AI, we can pre-empt deep fakes and generative AI from getting out of control and posing genuine dangers.”
Using generative AI in conjunction with time-series, spatial, and graph analytics. Commentary by Nima Negahban, Cofounder and CEO, Kinetica
“Generative AI tools like ChatGPT are not limited to working with rich media like language, images, audio, and video; they can also be effectively applied to structured enterprise data. By integrating advanced analytic techniques with vector similarity search, organizations are beginning to ask new questions of their data and get answers immediately. In particular, opportunities for using generative AI in conjunction with time-series, spatial, and graph analytics are rapidly emerging. For example, generative AI techniques can identify anomalies within noisy sensor and machine data. This aids in promptly identifying irregularities. By integrating generative models with time-series, spatial, and graph analysis, the enhancement of predictive maintenance approaches becomes possible. Additionally, these models can contribute to the creation of synthetic fault scenarios and the simulation of system responses under diverse circumstances, thereby advancing the accuracy of accident recreation.”
The AI / ML / Big Data Gold Rush can Overwhelm Scarce Tech Resources – Apply MVP Product / Program Management to Reduce Duplication and Wasted Efforts. Commentary by Valerie Hart, Senior Managing Director at Ankura Consulting
“Whether it’s security concerns, competitive risk or growth opportunities, AI / ML / Big Data technology can be a unifying or dividing force in any organization. No function or rank is now immune to the “gold rush” of perceived opportunity, efficiency and risks promised and inherent in these technologies.
However, many technology functions are not equipped to handle the onslaught of requests related to these tools and their application. Almost uniformly, companies report high expectations for true transformation of how their work will be performed leveraging these technologies. Yet 60 –70% of companies struggle to implement large transformational changes.
What’s an overburdened, overextended CIO / CTO to do? Socializing and implementing the concept of Minimum Viable Product / Solution mindset can help with the prioritization of where to apply AI / ML / Big Data for potential internal and external uses. Here are some tactical tips to achieve that mindset among business colleagues: (i) Start with business needs and benefits – not describing the how of the
technology. Most business colleagues just expect tech to work; (ii) Seek out partners in the business units who have an experimental mindset and query them on their priorities; (iii) Once you have their priorities, you can supply useful context and conditions with which business partners can then suitably modify their needs and goals. Help them think about ‘what is the smallest piece of functionality’ that they believe can help their firm leap forward in productivity and thus value creation. You can also frame the prototype as “low-hanging fruit” (aka the most value for smallest level of effort); (iv) Once that small piece of prioritized and valuable functionality is defined, work as fast as you can to prototype something that your business partner can see. Use Agile or at least iterative development and check with your partner
repeatedly to confirm “is this what you’re wanting?”
By breaking their needs into small chunks of functionality, you can demonstrate what it will take to build real applications of value and help close the gap between business stakeholders’ appetites and tech’s capacities and capabilities.
Such small proofs of concept can also begin the work of governance as not all ideas are created equal nor supported by all parts of the business. Tying limited technology resources to those ideas which support value creation, address real needs internally or externally can help reduce noise and wasted effort and build a coalition of consensus to help drive adoption when the tools go live.”
ChatGPT’s behavior over time has gotten worse by failing to monitor how its behavior is affected by the data and feedback users are feeding it. Commentary by Ramon Chen, CPO of Acceldata
“I personally use ChatGPT 4 Plus every day in some form, whether it’s for research, creating blog content, or summarizing articles. Though recently, I have noticed progressively less efficiency and responsiveness to my prompts. Coincidentally, recent research from Stanford University and the University of California, Berkeley suggests that the performance of GPT-4, particularly in coding and compositional tasks, has declined over the past few months, sparking discussions about the potential drift in the performance of large language models (LLM) over time.
Although the study itself has been criticized as speculation since it was published, it still highlights one of the leading challenges for every modern data team – establishing data reliability as early as possible in the data journey. Enterprise leaders have long understood the importance of using reliable data to train their AI and machine learning models, but generative AI is taking things a step further. Data teams need to start shifting data reliability and observability left in their process to create optimized data pipelines, capable of performing and scaling to meet the business and technical needs of the enterprise.”
Optimizing IT budgets to prepare for a data-driven future. Commentary by Joerg Tewes, CEO, Exasol
“In today’s data-driven world, leaders want to base their business decisions off of the freshest,
most valuable data in possession – but this isn’t possible if they cannot reliably and quickly
access data. Simply put, employees won’t access data if dashboards take too long to load,
ultimately stifling insights and innovation. The volume of business data today is outpacing what
legacy analytics databases can handle, forcing CIOs to make major compromises when it
comes to their tech stack.
The current market is forcing CIOs to do more with less, amid headcount constraints and the
need to reduce operational costs across the enterprise. The trickle effect then hits data teams,
as they seek new ways to optimize their databases with fewer resources, without moving or
duplicating data. As if that’s not enough to keep them up at night, CIOs are also evaluating AI –
where does it securely fit into the tech stack? How can it help with efficiency, productivity, and
cost effectiveness? It becomes quite a tall task for the average CIO, but they should never have
to compromise on their tech stack.
As actionable advice, CIOs should question costs with existing tech providers to determine if
they’re getting the best possible price and performance ratio. They shouldn’t compromise
between cost, performance and flexibility. For instance, as the system gets overloaded with
more users and data, queries should still take seconds to run, NOT minutes or hours. Any type
of lag is simply unacceptable. Growing businesses need performance scalability, but it shouldn’t
come with sticker shock or vendor lock-in, often hidden in the fine print. CIOs should work
closely with finance leaders to not only identify current costs, but most importantly, project what
the cost of their legacy technology is going to be down the road, given the scalability required
for business growth.
Overall, it’s important to recognize that data decision-making is an imperative for the enterprise.
CIOs must align with the C-suite, especially when it comes to emerging technology like AI, and
plan for the future by evaluating their current tech stack and not compromising on performance,
flexibility and cost. Exasol is designed to address all of these needs, working with major brands
like T-Mobile, Verizon and Dell to ensure they get the freshest data in near real-time, breaking
down any operational barriers to become a data-driven organization.”
Sign up for the free insideBIGDATA newsletter.
Join us on Twitter: https://twitter.com/InsideBigData1
Join us on LinkedIn: https://www.linkedin.com/company/insidebigdata/
Join us on Facebook: https://www.facebook.com/insideBIGDATANOW