Welcome to insideBIGDATA’s “Heard on the Street” round-up column! In this regular feature, we highlight thought-leadership commentaries from members of the big data ecosystem. Each edition covers the trends of the day with compelling perspectives that can provide important insights to give you a competitive advantage in the marketplace. We invite submissions with a focus on our favored technology topics areas: big data, data science, machine learning, AI and deep learning. Enjoy!
Measuring the Inflation Reduction Act’s Impact with Accelerated Big Data Analytics. Commentary by Ray Falcione, VP and GM at HEAVY.AI
The $760 billion Inflation Reduction Act passed earlier this year aims to lower costs for families, combat the climate crisis, reduce the budget deficit and more. While inflation has slightly eased, there are many other critical variables that must be measured to monitor the impact of this legislation (i.e., healthcare costs, energy costs, levels of harmful pollution and the growth of the clean energy economy). Government officials need an up-to-date, accurate view of the nationwide outcomes of this investment, yet it’s extraordinarily complex to collect and correlate the massive amounts of related data needed to answer those questions with confidence. With massive data sets, fast and powerful data visualization and analysis is only possible leveraging AI and accelerated data analytics powered by graphics processing units (GPUs) that can process millions of data points in minutes. Whether it’s analyzing financial reports, geospatial data or infrastructure data, accelerated analytics can provide policymakers with the visibility into their data that is necessary to make high-impact decisions and ensure taxpayer’s money is being used effectively for the Inflation Reduction Act.
Interoperability is key for medical data. Commentary by Oleg Bess, MD, CEO of 4medica
A typical patient medical record is full of test results, measurements, diagnoses and prescriptions. Much of the information is standard, no matter where the patient is treated or where a lab test is performed: cholesterol, blood pressure, A1C etc. But there is little standardization in how that information is recorded. The Wall Street Journal reported that there were more than 60 versions of how white blood cell counts were recorded. Those discrepancies make it difficult for medical data to be shared between organizations with different systems, a hindrance which hurts patient care and medical research. While much of the news about machine learning in healthcare has centered on improvements in diagnoses, image scanning and the like, ML also is being used to normalize medical records and make data sharing much easier. The growth of medical data is exploding and it’s essential that healthcare organizations be able to share information easily and with confidence that records match.
Dismantling data silos in the pharma industry. Commentary by Kelly Doering, Senior Director, Pharma Industry Marketing at AspenTech
The COVID-19 pandemic has acted as an accelerator for pharmaceutical manufacturers’ digital transformation initiatives. The industry has since adopted smarter manufacturing processes to improve operational efficiency and agility, and to meet current and future patient needs. Despite the progress, recent research suggests that pharma manufacturers are still navigating internal obstacles that hinder their ability to fully embrace an integrated digital strategy. Nearly half of pharma companies today admit that data silos derail cross-functional collaboration, and this is even more evident for larger pharma organizations with annual revenues greater than $1bn. A lack of connectivity between data sources and departments is especially challenging when moving from drug design through tech transfer and scale-up, to commercial manufacture and quality assurance release. Time is wasted, communication errors abound, and data integrity and quality can be put at risk – not to mention the costs associated with failed batches. Forward-thinking pharma manufacturers are overcoming this challenge by improving data accessibility and visibility by centralizing data management and connecting IT and OT infrastructures. With better collaboration across the value chain, pharma manufacturers can improve operational agility that will ultimately help bring drugs to market faster globally.
How should we be thinking about and analyzing online content? Commentary by Gideon Blocq, CEO and Co-Founder of VineSight
The threat of online misinformation is soaring, harming reputations, influencing elections, and even costing a growing number of human lives through the spread of misleading medical advice. Disinformation spreads far faster online than the truth. It is estimated that a quarter of a billion dollars goes toward the funding of disinformation websites each year, with detection becoming ever more difficult due to shifting online ‘languages’ such as memes, GIFs, and sophisticated deepfakes. As brands are under greater pressure than ever to connect with consumers, gaining their trust is crucial. With 6/10 consumers leaving a brand if it’s advertised next to ‘misinformation’, let alone the subject of it, toxicity is exactly what its name suggests. The only possible way to analyze the rapidly evolving online landscape and confront misinformation is with big data and sophisticated AI, ML and deep learning, which doesn’t analyze the content itself, but instead collects data about the share patterns. Unknowingly paving the way for faster action before the spread of bad-intentioned content goes viral is a battle that every industry is fighting today.
Future-proof for recession fears? How about ML? Commentary by Shashank Dubey, Co-founder and Chief Revenue Officer, Tredence Inc.
A quote from Littlefinger in Game of Thrones inspires me as recession fears rise: “Chaos isn’t a pit; chaos is a ladder.” As the economy spirals downward, we can expect a fundamental shift in how organizations approach AI investments. In order to be future-proof, forward looking companies will invest in AI solutions and machine learning models contextually. The shift will focus on building business endurance, non-liner revenue models, cost restructuring, optimization capabilities, and faster value realization. In addition, providers of AI solutions should embrace the crisis and focus on developing AI capabilities to full-fill functional and domain expectations of enterprises.
The Three Branches of AI Drug Discovery. Commentary by Jinhan Kim, CEO & Co-founder at Standigm Inc.
There are three simultaneous trends unfolding in the current and future state of AI drug discovery: The first is that AI drug discovery has defined tasks that use specific AI drug discovery technology to solve the problem. Second, the new materials centered on AI technology have moved into clinical practice. Lastly, the industry is seeing more active grafting of new technologies. Representatively, AI is used more extensively to understand biology and accelerate drug discovery through experimental automation. In the future, such three-pronged developments will occur more, and they will be used more closely in the pharmaceutical industry depending on the degree of introduction and maturity of each technology.
McGraw Hill’s S3 buckets exposed 100K students’ data. Commentary by Amit Shaked, co-founder & CEO, Laminar
One in five publicly facing cloud storage buckets contains sensitive data. This means that legacy security infrastructure is no longer sufficient enough to defend such sensitive data. Often these exposure incidents are blamed on ‘misconfiguration,’ but more often than not it is more about misplaced data that should never have been stored in an open bucket. The rapid shift to the cloud has enabled organizations to quickly spin up data stores, especially in buckets or blob storage. Unfortunately, however, many companies don’t have full visibility into where their sensitive data resides. This shadow data is growing, and is a top concern for 82% of data security professionals. Organizations must have complete observability of their data. With monitoring and control of valuable data, enterprises will have the clarity they need to keep up with today’s fast-paced, cloud environment and avoid similar exposures.
McGraw Hill’s S3 buckets exposed 100K students’ data. Commentary by Neil Jones, director of cybersecurity evangelism, Egnyte
The S3 bucket misconfiguration that was recently revealed at McGraw Hill is a classic example of the need to isolate data based on “business need to know.” Although details of the misconfiguration are still emerging, it is surprising that the company’s source code and digital keys were made available in the same location as students’ names, addresses, performance progress reports and grades. On the positive side, this is also a solid example of responsible disclosure by vpnMentor, who notified McGraw Hill of the misconfigurations. Best practices to reduce the impact of vulnerabilities like this one include the following: (i) Implement an effective incident response plan, and practice it via “tabletop-exercises” before an actual event occurs; (ii) Restrict access to data based on a user’s need to access the information. That approach makes access to sensitive content more time-consuming for potential attackers, and gives the organization additional time to identify potential intrusions; (iii) Respond immediately to communications that you receive from responsible disclosure sources like vpnMentor, especially when your data breach may violate legislative mandates such as the US Family Education Rights and Privacy Act (FERPA). According to published reports, McGraw Hill was notified of the misconfigurations up to nine times during a three-week timeframe in June and July 2022, without responding to vpnMentor’s communications.”
McGraw Hill’s S3 buckets exposed 100K students’ data. Commentary by Arti Raman, CEO and founder, Titaniam
Data is the lifeblood of the modern enterprise, and as we continue our move towards processing and storing enormous amounts of data across hundreds of platforms, thousands of applications, and millions of users, data exposure is inevitable. It takes a single exploitable vulnerability or a single vulnerable user to render multiple layers of security ineffective. AWS S3 is one of the most useful and heavily utilized cloud object stores and consequently one that attackers continuously probe for misconfigurations and exposure. These days we find that even the best defended enterprises and ones with massive investments in data security are falling prey to cyber attacks. So what can a company do to mitigate the risk of losing sensitive data inside S3 to external attackers, malicious insiders, or simply to human error? There are three sets of controls that can be used to combat AWS S3 data compromise in increasing order of effectiveness: native encryption-at-rest, access control, and app level encryption/encryption-in-use. An obvious place to start is native encryption that comes with AWS S3. This helps to ensure that valuable data cannot be stolen from your S3 buckets via platform compromise or via AWS employees. Given that this is not how attacks typically take place, let us look at the next level, which would be access control. This helps to ensure that only authorized users have access to S3 buckets. Again, modern attackers easily bypass this by stealing access credentials. The final and most effective recommendation is to utilize app-level granular (object level) encryption and/or encryption-in-use where any direct access to S3 buckets never yields unencrypted data. This eliminates large scale data exposure and exfiltration, reduces ransomware and extortion risk, and also enforces strong privacy compliance.
Data Privacy Week insights. Commentary by Eve Maler, CTO, ForgeRock
Data Privacy Week insights. Theresa Lanowitz, Head of Evangelism, AT&T Business
Edge computing is all about data – collecting, using, and enriching. In 2023, we should expect more emphasis and focus placed on this data including its collection, management, use, and governance. This means that from a security perspective, we can expect to see solutions that focus on the data lifecycle to help ensure data governance policies are automated and enforced. As more edge applications are deployed, the sheer amount of data will multiply at a rapid scale. Data, at the heart of the edge app, needs to be protected, intact/trusted, and usable. All of an organization’s edges and edge use cases by design will connect across an increasingly distributed network architecture. Gone are the days in which enterprise network architecture included two distinct places in the network: the campus and the data center. Today’s enterprise has an expanded geographic footprint, along with an increasingly global dispersion of applications, workloads, and employees. This reality requires a reexamination of network architectures and how network architectures align with current business dynamics, which includes planning for extraordinary volume, velocity, and variety of data, while determining what a data life cycle means for the organization. By placing IT resources on the edge, closer to where data is generated and consumed, organizations can more effectively drive business, technology, and operational outcomes. In response, it is critical to make sure that this data lifecycle is managed with the proper data governance policies.
Data Privacy Week insights. Commentary by Carl D’Halluin, CTO, Datadobi
A staggering amount of unstructured data has been and continues to be created. In response, a variety of innovative new tools and techniques have been developed so that IT professionals can better get their arms around it. Savvy IT professionals know that effective and efficient management of unstructured data is critical in order to maximize revenue potential, control costs, and minimize risk across today’s heterogeneous, hybrid-cloud environments. However, savvy IT professionals also know this can be easier said than done, without the right unstructured data management solution(s) in place. And, on Data Privacy Day we are reminded that data privacy is among the many business-critical objectives being faced by those trying to rein-in their unstructured data. The ideal unstructured data management platform is one that enables companies to assess, organize, and act on their data, regardless of the platform or cloud environment in which it is being stored. From the second it is installed, users should be able to garner insights into their unstructured data. From there, users should be able to quickly and easily organize the data in a way that makes sense and to enable them to achieve their highest priorities, whether it is controlling costs, CO2, or risk – or ensuring end-to-end data privacy.
Data Privacy Week insights. Commentary by Tilo Weigandt, COO and co-founder of Vaultree
It is important to note that data privacy is a complex issue and there is no one-size-fits-all solution. For example, a zero-trust framework powered by AI and machine learning is not the only solution to best protect your data. Other approaches include using encryption, implementing strict access controls, and regular monitoring and auditing systems. Organizations should consult experts to determine the best approach for their specific needs and requirements, especially with data privacy rules certain to get more strict. State-level momentum for privacy bills is at an all-time high to regulate how consumer data is shared. Recent developments such as the California Privacy Rights Act, the quantum computing security legislation, and Virginia Consumer Data Protection Act clearly show that protecting consumer privacy is a growing priority in the U.S. Compliance with relevant data privacy regulations such as GDPR or HIPAA is also crucial. One tactic able to support all of the above and the essential basis of all cybersecurity practices is data-in-use encryption because working with data in a fully encrypted format opens up numerous possibilities for companies. Data Privacy is a complex and ongoing process, but it is worth it. Protecting your data properly will mitigate a data breach’sfinancial, cyber, legal, reputational, and business risk.
Sign up for the free insideBIGDATA newsletter.
Join us on Twitter: https://twitter.com/InsideBigData1
Join us on LinkedIn: https://www.linkedin.com/company/insidebigdata/
Join us on Facebook: https://www.facebook.com/insideBIGDATANOW