Welcome to insideBIGDATA’s “Heard on the Street” round-up column! In this regular feature, we highlight thought-leadership commentaries from members of the big data ecosystem. Each edition covers the trends of the day with compelling perspectives that can provide important insights to give you a competitive advantage in the marketplace. We invite submissions with a focus on our favored technology topics areas: big data, data science, machine learning, AI and deep learning. Enjoy!
Securing the future leveraging AI & ML: How to respond to malicious actors taking advantage of AI. Commentary by Ramprakash Ramamoorthy, Director of AI research at ManageEngine
“Malicious actors have always embraced newer technologies to con vulnerable targets, so it’s no surprise that they’ve started using generative AI to launch more effective attacks. To defend against AI-powered cybercrime, all relevant stakeholders (employees, organizations, institutions, and agencies) must adopt an integrated approach in which all parties are involved in the risk-response framework. As the intelligence of malware has grown with the introduction of AI, so has AI-enabled defense technology. To match the scope and sophistication of upcoming threats, organizations must employ AI tools in addition to their traditional cybersecurity measures.
AI can currently assist IT teams in reducing the harm of AI-driven attacks by identifying anomalies and detecting early intrusion, and eventually, it might even be capable of independently preventing breaches in real time. Just like the hackers using it, AI can improve with experience and feedback on its choices, learning over time how to better identify a significant threat. Through the collaboration of stakeholders, emerging security risks associated with AI can be identified and frameworks can be built to enhance capabilities of managing these risks. To pave the way to ethical AI, all stakeholders must engage with the broader AI community and encourage responsible societal uses of the technology. As AI advances and organizations develop a deeper understanding of the technology, they can become better prepared to manage its security risks.”
What It Takes to Spot Synthetic Content. Commentary by Stuart Wells, CTO of Jumio
“Synthetic content, such as deepfake images and synthetically generated identity documents, has rapidly increased recently, making it much more difficult to verify valid identities and discern between legitimate and fake content. The quality of synthetic content has also reached a level where human detection can be nearly impossible. Synthetic content can provide an easy way for bad actors to wreak havoc by spinning false narratives, conducting identity theft, generating fake adult content and more—all at scale. Unfortunately, most people exhibit overconfidence when it comes to their ability to spot synthetic content, with 52% of consumers believing they could detect a deepfake video.
Solely relying on consumers to detect fraudulent synthetic content isn’t enough. It’s up to businesses to take the lead by deploying technologies that can keep fraudsters off their platforms and protect consumers’ sensitive data. For instance, digital identity verification tools, like biometric authentication, confirm every user is who they say they are. This technology leverages biometrics and AI to compare a real-time selfie to a government-issued identification document. As the quality of synthetic documents and images improves, organizations should specifically seek tools that use multi-modal biometrics such as voice and face, to make it much more difficult for fraudsters to spoof their systems. This approach can help distinguish whether the person’s voice matches their facial movements, such as lower chin and ear movement in order to differentiate between a legitimate user and a deepfake or face morph. Using deepfake detection models as part of a layered defense is also advised.”
Enable Better Generative AI with Knowledge Graphs. Commentary by Eliud Polanco, President of Fluree
“One of the biggest concerns with generative AI is the quality of the data that each tool is built on. The complexity of heterogeneous structured and semi-structured data sources can lead to quality issues in any kind of data consumption pattern — including LLMS. In the early internet age, companies bought software that would enable their business processes and functions to execute as efficiently and profitably as possible. Companies with many diverse lines of business soon ended up with siloed data that can include many different records for the same customer. A 2018 Gartner report found organizations believe that unstructured data causes an average of $15 million a year in losses. As generative AI becomes further entrenched in business processes, these losses will balloon without cleaned, high-quality, and interoperable data to feed large language models.
Fortunately, these companies can access knowledge graphs that remove data-silo isolation by integrating customer data more effectively through the use of smart, machine-learning based systems. Knowledge graphs find patterns in data across an organization and eliminate the need for multiple data copies with one reliable and secure internal data resource. This eliminates the need to track down or compare multiple datasets to create a comprehensive picture of customers and their needs. Knowledge graphs create an easily accessible data language that a large language model can be trained to understand and subsequently generate actionable information. A company can increase its capacity for profitability by truly bringing the customer to the forefront of all business operations with these insights.”
How CTOs should be thinking about implementing Gen AI. Commentary by Michael Schmidt, Chief Technology Officer, DataRobot
“After talking with hundreds of our customers, it’s clear generative AI opens the door to an enormous amount of new use-cases for AI within organizations. There are new data sources that can now be used, new people that can create these applications, and new people that can consume them due to their natural language data, input, and output content.
However, it is not always as easy to do as it appears. The most mature companies can create generative AI apps overnight, but getting results that are actually valuable, safe, and maintainable in production is much harder. Some companies think it’s all about fine-tuning the model, but the reality is it’s mostly about the quality of your prompting strategy, moderation, and governance of the generative process in production. Most generative use-cases require iteration to get high-quality results, monitoring the performance of the model in production, as well as integrating predictive models to enrich or audit the generative outputs as well. CTOs need to be aware of the real needs, and leverage technologies that will help them realize value from the new generative AI application opportunities without having to build everything from scratch in a rapidly changing environment.”
Generative AI will change the world, but not if data infrastructure can’t keep up. Commentary by Ami Gal, CEO and co-founder of SQream
“While the buzz around OpenAI and ChatGPT is still swirling, there is an elephant in the room that people aren’t talking about when it comes to generative AI- what are the infrastructures and platforms that will support this technology in the long-term ? As of now, generative AI tools simply don’t have the scalability element to truly change the world without the right infrastructure supporting it. The astronomical costs are already telling this story, but there are other factors at play here.
In order to truly allow the AI revolution to advance, enterprises across the board need to focus on creating and connecting the right tools to leverage it, such as using Large Language Models (LLMs). In these cases, the cost-effectiveness of your data platforms is critical. These models have to be scalable, allowing multiple cycles of training which is both convenient and fast. Another important factor is the ability of your infrastructure to assist with orchestration, model tuning, and resilience of the processes against faults and resource crashes. When training an LLM on organizational data, consolidating and harmonizing the data in a unified way is critical, because if the data is left in its raw structure, the LLM will be leaning or biased.
Therefore, the ability of your data infrastructure to keep all of the data accessible and in the same place, without the need for designated pipelines to “feed” the LLM, is crucial. It gives enterprises flexibility and agility in the learning process. Generative AI can only be as strong as the data processing platform supporting it, and to do so it is critical to find a platform that can process petabytes of data- the quantities needed to run these huge AI platforms- quickly, efficiently, and in a cost-effective manner. Simply put, this revolution is only sustainable if we lay the right foundation now to support it.”
Dirty data can have profound consequences for companies. Commentary by Matthew Furneaux, Director of Location Intelligence at Loqate, a GBG company
“In today’s economy, all businesses must find ways to reduce costs, improve efficiency and strengthen customer relationships. Yet, nothing eats into a business’s bottom line more than inaccurate data.
Dirty data, inaccurate or flawed information, can have profound consequences for companies. At a macro level, IBM estimates that poor data quality costs the US economy $3.1 trillion. It also has implications for individual businesses. For instance, one report found that data engineers spend 40 percent of their workweek evaluating data quality, negatively impacting 26 percent of their companies’ revenue.
Additionally, erroneous data results in an 8 percent failure rate for domestic first-time deliveries, costing retailers an average of $17.20 per order, or about $197,730 annually. More than half of consumers say they have stopped shopping with a company because of data concerns. Taken together, it’s clear that dirty data can cause significant operational and financial challenges for businesses, underscoring the need for accurate data collection and verification.“
Agile teams are critical to business success. Cindi Howson, Chief Data Strategy Officer, ThoughtSpot, Host of the Data Chief Podcast
“As we enter a new era of technology, agility is essential to the success and growth of any organization. Where there is change, there is opportunity for organizations who adapt – and risks for those that don’t. We are living in the decade of data and the beginning of the generative AI era, where a company’s data can be two or three times more valuable than the company itself. Global Enterprise Agility Month is an opportunity to spotlight how companies can create the most value from their data and achieve business agility, through real-time, actionable insights.
To be truly agile, data teams must be empowered to confidently transform business intelligence and deliver value to the business quickly – which the availability of Large Language Models (LLMs) and cloud data and analytics platforms have allowed us to do.
We are entering the third wave of business intelligence, defined by these developments in generative AI. Businesses can now intuitively search existing analytical content, or automatically create new insights, charts and visualizations based on natural language search. This allows any organization to react to changes in seconds, instead of days and weeks, which is vital to remain agile and competitive in today’s digital economy. This is all without sacrificing accuracy, reliability, governance or security in the process – ensuring businesses are leveraging their data to take them to the next level.”
Open-source leaders petition EU AI lawmakers. Commentary by Victor Botev, CTO and Co-Founder at Iris.ai
“Europe has some of the greatest open-source credentials in the world – and EU regulators must take steps to keep it there. With companies like Meta releasing commercial open-source licenses for AI models like LLaMA 2, even US industry giants have pivoted to harness the power of the open-source community.
We are at a crossroads in AI development. One way points towards closed models and black box development. Another road leads to an ascendant European tech scene, with the open-source movement iterating and experimenting with AI far faster and more effectively than any one company – no matter how large – can do alone.
If we can foster awareness amongst the wider population about how AI systems actually operate, we’ll spawn more fruitful discussions on the right way to regulate them and apply these regulations – rather than resorting to reactionary or hyperbolic dialogue. In this sense, the community itself can act as an ally to regulators, ensuring that we have the right regulations and that they’re actionable, sensible, and fair to all.”
How conversation AI can help champion data democratization. Commentary by Clay McNaught, COO at Gryphon.ai
“To be effective, conversation AI must occur in real-time to provide organizations with comprehensive, relevant insights leveraging a more holistic, shared process stemming from voice technology. As organizations look to democratize data, enabling conversational AI to deliver key consumer insights empowers teams with the necessary content to positively influence every customer interaction on a daily basis.
A form of conversation AI, also known as conversation intelligence, is essential in helping customer facing teams democratize data. Conversation intelligence analyzes conversations identifying real-time insights within an interaction, storing these insights in a secure, cloud-based solution. This enables the AI to evolve and deliver more comprehensive engagements in future interactions. Historical conversations by experienced representatives provide a tailored blueprint for enablement and future communications, enhancing the overall customer experience and journey.
Data is only valuable when structured in a manner in which it is relevant and can be easily understood. Within most organizations, data is unstructured and often times siloed. These issues make it difficult, if not impossible, to provide instrumental, relevant, and sometimes crucial information needed to complete a successful interaction.
Conversation AI leverages Natural Language Models to structure and deliver data in a more complete and consumable fashion. For example, customer service reps speak with hundreds of customers each week. Without relevant and key insights, it becomes difficult for these representives to deliver a complete customized experience, leaving their customers frustrated and dissatisfied, resulting in a negative impact to customer satisfaction and retention.
Conversation Al delivers real-time data and insights, including customer sentiment, to drive a guided, personalized experience as well as key metrics associated with the overall success and quality of the conversation. Reps no longer have to search for the information they need during calls. Instead, AI technology presents the necessary data and analytics in real-time at the right time. Ultimately, by promoting transparency and seamless communication, organizations can enhance the customer experience through the accessibility and timeliness of shared data.”
Seamless tech integrations in today’s dynamic business landscape. Commentary by Gary Sabin, VP of Product at Impartner
“Before adopting the ‘integrate with ChatGPT roadmap strategy’ product teams and company leaders need to ask themselves, ‘are we unlocking 1 + 1 = 3?’ In today’s competitive SaaS landscape, integrations that fuel growth through efficiency are focused around data syncing and data movement; it’s about integrated processes spanning end-to-end across multiple platforms and it’s not an SSO connection. The race to integration without the “why” will cost many.
If the integration ‘why’ provides value then the next priority is security. Integration starts with authentication. Credential storage is no longer palatable with modern security postures, which demand smart solutions for the management of tokens to ensure seamless, low-friction and low-disruption user experiences. Successful integration – first and foremost – needs robust authentication practices. Outdated credential storage methods and high-friction experiences simply won’t cut it with the security and CX demands of today. Look to implement smart solutions where possible to alleviate pain points and allow for a frictionless experience. Integration shortcuts make security loopholes that can cause major vulnerabilities. No integration, AI tools or technologies is otherwise worth a data breach or security threat.”
Sign up for the free insideBIGDATA newsletter.
Join us on Twitter: https://twitter.com/InsideBigData1
Join us on LinkedIn: https://www.linkedin.com/company/insidebigdata/
Join us on Facebook: https://www.facebook.com/insideBIGDATANOW