In the hybrid cloud, AI-enhanced enterprise, unstructured data is everywhere … and growing exponentially. Unstructured data mobility is not a one-time event, but an opportunity to continually right place data to meet organizational needs.
Many enterprise IT leaders are storing petabytes of data, spread across silos in their data centers, at edge locations and in the cloud. Most of this data is unstructured data and stored as files of many types and sizes such as documents, images, video, genomics, IoT and research data.
Unstructured data is expensive to store, protect and manage due to its sheer volume and pace of growth. IT organizations are realizing that since 80% of unstructured data typically gets cold within months of creation, by treating cold data differently, they can cut significant costs without compromising user access. New threats such as ransomware are adding to the urgency of addressing unstructured data management efficiently.
The result is that unstructured data is increasingly in motion through its lifecycle to less expensive storage and backup options and to data lakes and analytics applications. You need a strategy to manage that ongoing mobility.
First, here’s why data mobility is so critical:
- Data growth: Unstructured data can be large files and lots of small files and it’s growing exponentially every year. The days of yesteryear in which you could buy one or two storage appliances and set them in the data center without worry are over. Enterprises regularly need to add capacity to their NAS, SAN or other storage devices—and supply chain disruptions since the pandemic have made this process much slower. Therefore, it’s imperative to have a nuanced approach to data and not treat it all the same. It’s not sustainable, it’s too expensive, and it’s wasteful.
- Cutting overall costs: Most enterprises are spending at least 30% of their IT budget on data storage, according to the 2022 State of Unstructured Data Management Report. Storing all your data on Tier 1 storage doesn’t just drive up the primary storage bill but also the cost of backups and disaster recovery. In fact, backups are the larger part of your bill since active data has typically three copies. Therefore, data mobility can significantly reduce active data to dramatically lower your overall storage costs.
- Data lifecycles: Most organizations keep all or most of their data indefinitely but as data ages, its value changes. Some data becomes “cold” or infrequently accessed or not needed after 30 days yet must be retained for a period of time for regulatory or compliance reasons; some data should be deleted; and some data may be required for research or analytics purposes later. A presumably easy answer is to move that data to secure storage in the cloud but choosing the wrong cloud storage class is risky: cloud file storage is often anywhere from 10x-50X more expensive than cheaper cloud tiers. Ensuring easy mobility for the data as it ages and understanding the best options for different data segments is paramount.
- Data reuse: Another reason why unstructured data mobility is imperative is due to growing AI and machine learning adoption. Once data is no longer in active use, it has the potential for a second or third life in big data analytics programs. You might migrate some data to a low-cost cloud tier for archival purposes but IT or other departments with the right permissions should be able to easily discover it later and move it to a cloud data lake or AI tool when needed for many different use cases.
- Technology refresh: Storage architectures typically become obsolete every three to five years and new options are on the horizon. Cloud vendors typically offer new price-performance options every year. Taking advantage of the latest options can make a significant improvement in price-performance, availability and usability of data. Yet it requires data migrations and data lifecycle management across vendors and storage architectures.
- New business strategies: When an organization is undergoing a merger or acquisition or divestiture, they must meet new governance and compliance requirements for data. Similarly, the enterprise may be embarking on a new cloud strategy or adopting a new data architecture. In all these examples data mobility needs will change. You need a flexible unstructured data management architecture to meet new requirements as they come up so you can find, segment and move data to new locations without undue hassle or cost.
What new requirements does ongoing data mobility bring?
Ad hoc strategies to address data mobility no longer work in this complex data environment when requirements and needs are in constant flux. IT leaders need a systematic way to manage data movement and meet new requirements, cut costs, be sustainable and support new projects for unstructured data analytics. Here’s what’s involved:
- Visibility of data: The ability to look at data across storage silos to see trends, patterns, anomalies and do cost modeling is critical to make smart decisions. Similarly having a unified way to search for data across silos is important to find specific data sets and move them to new locations as needed.
- Analysis on data: IT organizations need to understand data across various characteristics to make the right decisions for its management. Age of data and time of last access, file size and type, top data owners, costs, volume of data and data growth rates are some of the top metrics to track
- Cold data tiering: Segment and tier inactive or cold data before you migrate. Too often, organizations will send large data sets to the cloud to save money but will miss out on significant savings because they are lifting and shifting data from one expensive storage location to another. Move the rarely-accessed data to low-cost object storage such as AWS Glacier or Azure Blob. Migrate the hot or warm data to a high-performing tier until it has aged out according to your policies.
- Understand cloud storage classes: Cloud storage options are always changing and maturing for customers and choice can be overwhelming. Partner with a cloud data storage expert to help guide these decisions so that you can efficiently map the right data sets to the right cloud storage service and create a plan for cloud data lifecycle management.
- Departmental collaboration: IT organizations today are focusing on managing data, not storage. To that end, working directly with data owners on strategies is essential to avoid conflicts and to ensure that decisions for data mobility and management are sound.
- Policy automation: In large scale data environments, especially a large enterprise with many different stakeholders, shares and directories, you can’t support data lifecycle management manually. Use an unstructured data management solution that allows you to easily create and automate policies to copy, tier, migrate and confine/delete distinct data sets. Ultimately, policy automation will result in more savings, better compliance and the assurance that data is always living in the right place at the right time.
- Native access to data: Data is your corporate asset. Regardless of where you want to migrate or tier data, you need to ensure that it’s easily accessible and usable in its target destination. The notion of native access to data simply means that if you move data to a new storage location, such as object storage in the cloud, you can access it there and move it somewhere else without needing to go through your file storage layer, which incurs licensing fees and requires adequate capacity. Cloud native access is required for using cloud-based AI and ML services. Otherwise, your data is locked and not available for additional value-added activities.
Unstructured data is both a liability and an asset. Managing it properly with a plan for long-term data mobility should be one of the top initiatives for enterprise IT today. By doing so, you can get more value from massive unstructured data volumes, be as cost-effective as possible and enable new ways of finding and using data to better serve the broader organization.
About the Author
Krishna Subramanian is COO, president, and co-founder of Komprise. In her career, Subramanian has built three successful venture-backed IT businesses and was named a “2021 Top 100 Women of Influence” by Silicon Valley Business Journal.
Sign up for the free insideBIGDATA newsletter.
Join us on Twitter: https://twitter.com/InsideBigData1
Join us on LinkedIn: https://www.linkedin.com/company/insidebigdata/
Join us on Facebook: https://www.facebook.com/insideBIGDATANOW