Together We Can!

Today at #NTAPInsightUS NTAP announced their new Hybrid Cloud utilizing Cloud ONTAP.  This announcement is a very strong move for NTAP.  Today cloud isn’t as much about technology as it is about business agility.  The ability to keep IT cost low, the stuff that doesn’t make businesses money, and build agility and flexibility into the data center helps business drive new revenue models which is really important in today’s competitive world.

Also today, Catalogic Software announced that we work seamlessly with NTAP’s Cloud ONTAP to give clients visibility, insight and control over their environment locally, remote and now in the cloud.

See this short video demonstration of Catalogic Software and NetApp’s Cloud ONTAP.

For more information, go to and check out ECX to get control of you hybrid cloud data.

  • 0

Copy Data Management For Business Advantage

George Crump from Storage Switzerland recently posted an insightful article entitled, “What Is Copy Data?”. In this piece, George provides a detailed outline of how various applications and processes within the data center can create multiple, redundant copies of the same information. From backup and archive copies on tape and disk, to snapshots of data taken on primary storage for test and development and data analytics, copy data is largely responsible for the uncontrollable growth of data storage infrastructure costs in the enterprise. While George talked about ways of corralling this information so that it can be more efficiently managed, one thing he didn’t discuss is how this information can be used to significantly reduce business application development cycles and speed-up time to market. Perhaps this is where the greatest opportunity lies for copy data management.

 Traditionally, test and development and IT operations have existed as separate functional groups within most organizations. IT operations groups have typically served as an infrastructure support and delivery arm for application development teams, while developers have largely worked independently from IT “ops” on designing, upgrading and enhancing business systems. While this model served us well in the past, there is an increasing need to merge development and operations groups (dev/ops) to enable organizations to shorten the time it takes to bring new services and products to market.

 As the “caretakers” of the data infrastructure, operations has the key to providing development teams with the access they need to real-time business data. Data analytic systems, for example, can enable businesses to capitalize on opportunities in the market, however, timely access to this information is critical. Think of the data mining opportunities within social media channels. Think about trends during the Christmas buying season. wal-mart-logo2Retailers like Walmart, for example, may want to mount market basket information from last year’s Christmas purchases and merge them with data streams from Facebook and Twitter, where their clients are talking about the types of items they plan to purchase for their families. This can allow them to make purchase decisions that are much more targeted towards actual customer demand.

 The key to making this all happen, however, is feeding business data analytic systems with the most current data available. If IT operations doesn’t have the sufficient tools to provide their development teams with easily searchable access to data, then these market opportunities could go untapped. And as data continues to proliferate, the need for speedy access to the right information at the right time will only continue to intensify.

Beyond the data analytics use case, application development cycles can also be significantly reduced if real-time or near real-time business data could be easily accessed and quickly presented to test and development platforms. Many database administrators are continuously making changes to their systems in an effort to improve application response time and enhance the end-user experience. But these iterative changes often need to be frequently tested before they can be safely rolled out into production.

 An effective copy data management platform can deliver these types of capabilities by providing a searchable catalogue of primary storage snapshot data that exists across the data center, for example. This can dramatically reduce the development/test process and allow organizations to release more frequent updates and changes to applications as they are needed in real-time. Compare this to the traditional methodology of incorporating numerous application changes into a single large roll-out that may only occur once a quarter or a few times a year. End-users are now conditioned to expect more frequent upgrades and enhancements to their applications. If businesses can’t respond to these demands because they are held back by the limitations of their data infrastructure, it could make them far less competitive.

Here at Catalogic, we believe that copy data management is more than just “getting your house in order”. We see copy data management as a strategic tool for enabling businesses to increase their agility and to cease on market opportunities while they are ripe. Interestingly, by some estimates, those organizations which can successfully integrate the new “dev/ops” functional paradigm can potentially reduce their application development cycles by up to 30x and drive higher code quality. And the right copy data management platform can surely play an integral role in enabling this transition to drive business agility.

  • 0

Easy to Ignore

Our tolerance for what we ignore varies greatly across environments and situations. Take my household as an example. While every male in my household can ignore the full trash can and even go so far as to try to put just one more item in it, I on the other hand must empty it as soon as it gets even close to the top. I wish I could ignore it, but for me, it is easier to just deal with it.

Yet, when it comes to other things, I can ignore some items on a daily basis. Looking at my phone, I realize every other icon has a small number in the upper right corner signaling an alert. Just a very few years ago, this number caught my attention and I immediately opened the app. Once most of my apps starting doing it, it became easy to ignore.

What is easy to ignore in your data center?

Can you remember when it was easy to ignore the growing stacks of disks (dare I saw floppy), then CD’s, and of course tapes. Back in the 90’s, I remember working in a data center at a medical school. We had ventured into digital photography and there was simply no room on the network at the time to store the growing volume of images. However, we could not afford a storage array. Therefore, we tried to create a library out of CD’s, complete with a database and indexing system. But it was hard to manage and over time, it was just easier to ignore the problem. Eventually, disk prices came down so the files all moved to a storage array.

As more users added even more files, the problem of finding the image files on the storage array only grew. And as storage prices came down and more storage space was added, this problem only continued to grow. Does this sound familiar?

Today, an increasing number of data centers use snapshot technology for everything from backup to test/dev environments. While the snapshots are great in terms of using far less disk space, the proliferation of the snapshots can adversely affect the performance of your network. It is the latest growing problem in our data centers that is oh so very easy to ignore.

If you use NetApp FAS arrays or VMware, there is really no reason to ignore this problem any longer. I started using ECX from Catalogic Software to clean up my orphaned snapshots. In a few minutes, I easily found my problems and was able to regain valuable disk space, and increase my performance. With ECX, I monitor my environment and regularly take action in a wide range of situations. Like orphaned snapshots, VM’s that are no longer in use waste my disk space and cause performance issues. ECX makes it so easy to identify these VM’s and much like the trash, I immediately want to clean them up.

We all need to understand how critical data is. For example pictures are things that cannot be recreated. However, it is very, very easy to inadvertently drag and drop a file into the wrong folder and not realize it. Thanks to ECX, I was able to locate and then to restore some valuable images.

Ignoring your copy data management problems will not make them go away. Using a tool that can help solve them in just a few minutes takes away the pain and more importantly, it can save your company real dollars. In my case, I have a small company and I don’t want to keep adding more disk space. You can deliver a solid ROI to your company by no longer ignoring your data copies or snapshots. Now, if I can just figure out the ROI of taking out the trash, I can educate a household of men.

  • 0

Managing the Modern Day Economics of Data:

The separation from Syncsort and the launch of Catalogic Software has been nothing but exciting for everyone involved. While we operated as separate companies for what seems like ever, the unflinching focus of everyone in the new company along with the incredible flexibility to do what makes the most sense for our line of business promises to be a wonderful combination. In the short time frame, we have already seen our systems finally transition away from the world of mainframe to all things mobile and into the cloud. Marketing’s been better targeted to our user base than ever before.

Often the excitement is simply a result of facing and overcoming unique challenges that comes with these transitions. Every time we’ve met as a team to discuss those challenges, the word “No” and the phrase “Figure it out” have been the most popular. In one such conversation, our chairman made an observation that “Scarcity of Resources is the best enforcer of discipline”. It was a profound thought yet something that made total sense to me.

In the IT industry, I wonder whether we are proving the converse of his statement. I wonder whether we are becoming the rich spoilt kids who didn’t know the value of what they had access to. I wonder if Gordon Moore could have foreseen the indiscipline when he predicted that the number of transistors in a chip will approximately double every 18 months. Moore’s law has been so popular that it has now become the basis for the semiconductor industry to plan their long term strategy around.

But, it is arguable what the growth in CPU capacity would have really meant to the world if Storage hadn’t kept pace. Luckily, Storage density has had a steeper growth than even the transistor density (see the chart – courtesy Wikipedia). In fact, even Mr. Moore called this rate “absurd” and “flabbergasting”! While this growth may have given us the unlimited movie streaming option from Netflix and the unlimited email storage of today, I wonder whether it has taken away the discipline in using that storage. How much of our big data is real data? And, how much of our big data is copy data?


I’ve heard my business school professors talk about economics as the science of scarcity and choice. The economics of data today deals with the lack of its scarcity and the choices we make because of that. At Catalogic Software, we believe in addressing the unique problems created by these choices by both measuring and managing the data.

On one end, we provide visibility into data that has sprawled over your enterprise – whether it is in the form of snapshots, VMs, Orphaned LUNs or VMDKs. We can provide visibility and analytics on files that haven’t been accessed or modified in years. For example, an ecommerce giant recently trialed our solutions and identified that over 95% of their data in some of their business divisions haven’t been accessed in over a year. That’s $$$ you can save in terms of not just storage but also power, cooling, software, space, redundancy and more.

On the other end, we are able to integrate with NetApp Flexclone technology and create instant read-write copies of data with zero-footprint on your storage. You can create as many copies of data as you need using a single console and more importantly manage them so that it doesn’t take a life of its own when you don’t intend it to. This is a problem that is so big that IDC predicts that companies are spending $44 billion dollars on storage hardware and software on unnecessary copies of data. That is a problem created purely by our choice of not managing our data. That is a problem created by letting many small instances of indiscipline add up to an explosion. Storage may not be precious anymore, but many a little certainly makes a mickle, if you don’t manage it!

  • 0

Copy Data – Old Problem, New Solutions

copier_-_self_portraitIn 1959 Xerox introduced photocopiers, and so it was that in 1974 I found myself standing in a small room with a fluorescent light, bored out of my mind and feeding papers into a photocopy machine.

I was 13 years old and my father worked in marketing at a pharmaceutical company. During the school holidays he had secured me a part time job at his company copying bits of paper. It was a sole destroying job, but somebody had to do it and I was willing to do it for $10/day. I made 2 copies of everything. The original went to the company information library, one copy went into my father’s filing system, and another was mailed to a colleague. I dreamed of the day when all of these copies were obsolete and we all worked off of a single copy of data held in a distributed cloud of technological marvel. Well I might be exaggerating there a bit. I was probably dreaming about soccer or a girl in my biology class.

Fast forward 40 years and we have the problem of copies solved. Computing introduced centralized repositories for electronic data. Everybody can work from one copy of an original document and changes are simply pointers back to the original. No more paper to move around, no more copies wasting space! It is always nice when a plan comes together. The only problem is that what I just wrote is not true. We now have more copy data than ever and unlike paper copies, which were eventually shredded, today’s copy data hardly ever goes away. IDC has predicted by 2016, spending on storage for copy data will approach $50 billion and copy data capacity will exceed 300 Exabytes or 3 million terabytes

So how did we get here? It is all due to improvements in distributed access and the need to control that access. Because the data is important to many people and cannot be lost it has to be protected and we therefore create copies in case the original is lost. We also have retention policies on data that ensure we keep it for future retrieval. When we give access to data to more people it means they are more likely to retain a copy of the data themselves. This server centric world gave us centralized repositories but there was no control of data. Copies of data were everywhere and no one knew where they were or who had them. In addition virtual machines were guilty of sprawl creating additional management headaches.

Centralized storage devices like SAN and NAS, along with the high speed networks to connect devices should have improved things but did not. In this infrastructure centric world, we had the ability to better manipulate the data and use embedded data management services like snapshots and deduplication and compression to limit the impact of copy data, and better control access to data. Recovery speeds improved as we were able to more efficiently restore from disk and not just tape. However this still did not solve the problems of copy data. Storage devices and servers are independent. Efficient catalogs of data across devices rarely exist, and the embedded copy data services of individual devices are often limited and do not span the infrastructure.

So what about the ‘cloud of technological marvel’ that I dreamt of as a boy?

The good news is that it is now here. As server virtualization, faster networking, and cheaper storage enables disk to be a more cost effective medium for data storage and retrieval, we are rapidly moving into a phase where we can have a more expansive view of data and its value to the business. New tools

  • 0

Seeing is Believing, Catalog Your Data

Yale_card_catalogI recently consulted with a customer with 8 NetApp FAS 2000/3000 storage systems with 500,000+ files spread across 4 locations.  They wanted to improve their storage efficiency and data protection operations across their company.  With data growing at 40-50% per year, it was becoming too difficult to determine what data was current and how many copies were kept across the company.  Their NetApp administrators used NetApp SnapMirror and SnapVault tools to automate data replication and backup, but meeting backup SLA times were becoming a significant challenge.  In addition, they needed to better understand their storage consumption. We recommended that they deploy Catalogic ECX, enterprise catalog software.  It searched across their NetApp filers and cataloged all of their files and VM objects.  Several departments across the company had copies of the data, some with more than 20 snapshot copies that were automatically scheduled.  ECX was able to identify them quickly showing what type of files, when the files were copied, how large the files sizes were and how old the copies were. Having a summary report allowed the administrators to simplify his backup, archive and DR policies to save time and meet required SLAs.  They were able to identify that 68% of their files were important and required meeting their backup SLA, while 32% of their files were deemed as “old”and no longer requiring snapshot copies. Those files were archived. This customer is well on their way to seeing storage efficiency improvements by 30-40%  More and more customers should deploy ECX to improve their data protection management.

  • 0

The Top Five reasons For Creating An Enterprise Data Catalog

Number-5-iconA data catalog is an invaluable asset to any organization. Let’s take a look at what an effective data catalog achieves before identifying the top 5 reasons to implement a data catalog.

Like any organization, you have growing amounts of data that are constantly changing. New data is generated every day and is spread across multiple heterogeneous servers, virtual machines and storage systems. Depending on the size and reach of your organization, this data may be housed across multiple physical locations. How much do you know about the nature of your data?

How many copies do you have? What are the top 100 files in terms of size? How long has it been since these files have been accessed? Are the critical files and applications protected? What files should be archived? These and many more questions are imperative for implementing effective data management and controlling storage costs.

“A catalog should provide you with an accurate and comprehensive view of your data across the organization. It should provide you with a searchable repository of metadata for improved data management which leads to increased business efficiency.”

Here are the top five reasons you should be deploying a data catalog:

Visibility – You cannot manage what you cannot measure! A global catalog of data enables a better understanding of how data flows within your business. This facilitates better decision making. You should be able to BROWSE, SEARCH and QUERY data across your entire organization, similar to managing a database. Achieving visibility across your data enables you to build a dashboard of elements to monitor and improve data management. There are many reasons to query your catalog. I have provided a few examples below:

Do I have data that is not being backed up?

Do I have data I am managing that has not been accessed for years and is not part of my data retention policy?

How many VM’s do I have across my IT environment and how fast are they expanding?

How many copies of a particular file do I have and where are they located?

Who has access to my secure files and is there any data leakage across the company

  1. Utilization – How efficiently is your data being stored?  It is quite possible that the largest drain on your storage capacity is an excess of copy data. Many primary storage and backup products provide deduplication functionality. However, they tend to de-dupe and store data within siloes. You may have de-duplicated data on one storage system, but have unwanted duplicates residing on another. Do you have data on primary storage that should be in an offline archive? Not effectively migrating data through its lifecycle is another way to waste expensive primary storage. Creating an enterprise data catalog enables you to query for duplicate files and locate them across your organization. With an enterprise data catalog you can eliminate unnecessary files, free up capacity and reduce storage costs. You can also query files for age and when they were last access to determine which data should be protected and archived.
  2. Security – Is your data properly protected? Is it housed securely within the right location? Do the right people have access to it? These are common concerns for IT departments. What if you could receive a simple report showing all of your unprotected files? What if you can easily determine if any copy data leaked their way onto unauthorized network segments? With increasing security concerns, a data catalog is a necessary tool to add to your security monitoring solutions. If the NSA can have documents stolen by a contractor, it can happen to anyone. Eliminating data leakage improves data security.
  3. Compliance – Each industry has their own special compliance needs and regulations which impact many aspects of your data. Most industries have strict governmental regulatory compliances such as HIPAA within healthcare, FDIC & SPIC within banking & finance and Sarbanes Oxley (SOX) within public corporations. How long should you keep various types of data? Who should have access to different classes of data? How well protected is your data? How quickly can you retrieve your data for eDiscovery? Which data should be destroyed and when? What are your data privacy requirements? A data catalog becomes a requirement to ensure compliance and governance.
  4. Efficiency – You may already have processes in place to handle security and compliance, but how much easier would those processes be if you had a global data catalog? The more efficiently you can profile, collate, manage, browse, search, query and report on your data, the more time you can spend on other projects that drive your business forward and improve productivity. There are real cost savings from implementing an enterprise cataloging solution as part of your IT data protection and data management strategy.

Is it time for you to get your data under control? Creating an enterprise catalog of your primary and copy data is a great place to start!

  • 0

Satisfying Storage Growth and Data Protection Needs Catalogic DPX and NetApp Clustered Data ONTAP

A foodservice industry company was facing the challenge, just as many growing companies do, of having to continually expand its IT infrastructure to meet data growth requirements. Starting with 100 TB of primary capacity supporting critical applications such as SQL databases, email, financials, and payroll applications, the company recently added 120 TB of new primary storage and 220 TB specifically for data protection. The company had already made a significant commitment to NetApp for its storage needs and most recently purchased a NetApp Cluster Data ONTAP (cDOT) primary storage system and a secondary ONTAP system for data protection.

The foodservice company’s previous data protection solution was local, tape-based over a 1Gb network, and involved a lot of trial, error, and time to create complete accurate backups. Some restores took days, depending on the amount of data and type of applications.

Catalogic DPX™ enabled better data protection and sped the migration to NetApp Clustered Data ONTAP. With Catalogic DPX, they now have the ability to centrally manage their data protection process and perform restores in less than an hour.

See how DPX can help you with your data protection on NTAP cDOT.


  • 0

Effective Management and Migration of Data

Sometimes nothing enables you to appreciate a problem like a real world experience. This week I had an incident with my backup of my home data. I have nothing too sophisticated at home. 3 PCs might be online at any one time but that is really it. Still they contain data that is all too valuable to lose. Work documents, 20 years of financial records, photographs, emails etc. Over the years I have used various methodologies to keep everything protected and have not lost any data in 20 years. This week though a potential disaster struck. My backup server failed and it is scrap. Just to clarify, I have no data in the cloud. My wife is not comfortable with it, and that is the end of that. I will not get into the details of what happened but the server is non recoverable and the data is lost. I am of course lucky enough to have my primary data still intact and my local backups for each PC still in place.

Of course I now had to decide what to do to replace my current backup system, and looking at the infrastructure I realized I had 14TB of backup disk in my current infrastructure for 3.5TB of primary disk in my PCs. On the 3.5TB of primary disk I had approximately 1.5TB of unique data. That is nearly a 10:1 ratio of backup capacity to actual unique data. Of course my problem was I had multiple copies of everything. Too much copy data!

I have a small home system, but in business, copy data can be a killer when it comes to backup and recovery times, and the cost of the total infrastructure.  In IT organizations, data copies are made constantly by procedures such as snapshots, backups, replication, DR processes and test and development. This is made even more problematic by the silos of infrastructure that are involved in protecting and managing the data and each creates their own copies. Indeed IDC has stated that some organizations keep over 100 copies of some forms of data. dpx backup

There are of course technologies attempting to address the issue. Storage vendors may offer zero footprint snapshots and clones. These are read/write virtual volumes that can be instantly mounted to bring data back online. Storage and backup vendors are delivering deduplication technologies to consolidate the data back down at the time of storage or backup. The problem is that each of these is also a silo and may not take into account data in direct attached storage.

One way you could effectively manage your data is to centralize your snapshot infrastructure onto a common target with the ability to quickly mount a snapshot clone from the centralized data. Even better is if this infrastructure has the same support processes, methodologies and toolsets that you currently use for the management of your primary data. Using this methodology, you can get near continuous data protection with extremely fast recovery times, improving both your RPO (Recovery Point Objective) and RTO (Recovery Time Objective).

Recently, NetApp introduced clustered Data ONTAP (cDOT). With the scalability it offers you can effectively consolidate much more of your primary and backup storage infrastructure in a single system. cDOT enables consolidation of 10’s of Petabytes of data and thousands of volumes, onto a multi-protocol clustered system that provides non-disruptive operations. With the recently released NetApp cDOT 8.2 you can support up to 8 clustered nodes in a SAN environment and up to 24 in a NAS deployment. This delivers up to 69PB of data storage within a single manageable clustered system. Of course you can also build smaller clustered systems.

Catalogic DPX not only delivers rapid backup and recovery with data reduction, helping solve your copy data problems with near continuous data protection. DPX also facilitates migration of both virtual and physical infrastructures to new cDOT-based systems by snapshotting and backing up your servers to an existing NetApp system, and then recovering them to a cDOT system. This offers considerable time savings over traditional migration methods. DPX will take care of the disk alignment, and can also migrate iSCSI LUNS to vmdk files on the new clustered storage.

With careful planning and the right infrastructure you can consolidate storage management and get copy data under control. If you don’t, you risk spending more on infrastructure, and wasting valuable time managing duplicate data.

See how one Catalogic customer migrated to a Clustered Data ONTAP infrastructure using DPX.

Screen Shot 2014-10-05 at 8.32.11 PM





Take a look at the Catalogic Solution Paper, “Solving the Copy Data Management Dilema”, for more information.

Screen Shot 2014-10-05 at 8.34.34 PM

  • 0