Getting more value out of your data
“Data is eating the world”, is a well-known variant on Mark Andreesen’s famous quote “Software is eating the world”. In our current output driven society everything is measured by an interpretation of a set of data, even by the people and businesses that were not traditionally driven by data.
This has led to an increasing demand for, and thus increased value of data. But not just any data, it is data that provides us with insights that we value most. Whether data is insightful depends on many facets, most of which vary between different groups of stakeholders. But an overall rule of thumb is that for the data to be insightful, it must have a certain level of quality. To reach this level of quality there are four things that play a key role: collection, storage, availability and analysis. For the data to have decent quality these four facets need to be coordinated correctly.
The first is collection: Not only do you have to decide which data is relevant for the insights you want to gain, but you also must decide the right method to collect this data. Depending on the data you want to collect, you might have to select reliable sources. Do your users provide the data, is it something collected by an (IoT) device, or is it an existing dataset from a third party? Different types of data come with different measures for determining reliability.
You also must define if there is a minimal amount of data required and if so, what that minimal amount is. In case of benchmarking for example, you want to compare a sample against a bigger population. To make reliable assumptions you need to have enough data for each of the benchmarked groups. But whether that is ten or ten million items per group depends on the use case. Finally, you need to know whether your data needs some form of preparation, transformation or optimization. You might want to train some AI models, transform some binary data into something more readable or distill only the useful pieces of data out of a larger set, so it can be retrieved faster and more easily. Depending on the use case data collection can be a repetitive process, so there is not one moment to collect data, but this is done throughout time.
Then there is storage: You must decide how to store the collected data in a secure manner. Think about which regulations apply (GDPR for example) and where you want to store it. Do you want to store your data in the cloud or on premise, and does the physical location matter, or is it OK that it is stored in a datacenter on the other side of the world?
Which type of storage is best for your data? Will a relational database suffice or is your data (partially) free format, in which case a document store will be the more obvious choice, while a graph database is more likely when it is important how parts of your data relate to each other. And is it possible to combine different types of storage if it is required by your data?
Do you want the data to contain state, or does it need to be a set of state changes? While the former is less storage consuming, the latter makes it possible to keep a data trail and recreate the state of the data at any moment in time. In case a data trail is important, event sourcing might be something to look into.
At this point you have collected (some) data and stored it somewhere. But does it already provide the insights you thought it would? In many cases raw data is not really that insightful. In that case Analysis of the data can be a solution. There are many ways of analyzing a set of data. It is possible to use a predefined algorithm to deduct the insightful information from a larger set, or you can use machine learning to make classifications or predictions for you. In this case analysis provides new views on the data. But you can also analyze the data to check the quality or find out if you collected the right data. With the feedback from the latter analysis, you can improve your data and tweak the way you collect data in the future. In this way you improve the data you have already collected while you make sure that the data you collect in the future is less flawed, win-win.
Availability / Accessibility
After you have gotten to the point where your data contains the insights you wanted, it is time to share it with others. The concept of availability might seem contradictory because it is about making your data easily available, but only to the people that are eligible. For the “making your data easily available” part you need to know some things about your users. Who are they? Can you distinguish user groups with different needs? How are you going to visualize data? Do your users want dashboards, or do they want an API to communicate with? In case of dashboards, you need to think about which visualization is best to capture the insights for your users. In case of an API, what is your API strategy? How do you make sure that it is well documented for your users, and do you want to make it easily accessible for applications as well?
For the “only to the people that are eligible” part you must think about how to restrict the access to your data. Do you want people to login first, and do you want to have different access levels? In some cases you want to keep track of who accessed which part of your data, access logs might come in handy. In that case you want to think about how to store this metadata and the regulations that apply when storing it.
As you can see there is way more than meets the eye when it comes to creating more valuable data for your organization. This blog might have left you with more questions than answers it has given you. Partially because these answers vary for each use case, but also because it makes you rethink about choices you have made regarding your own data and the process of making it more valuable.
At Luminis we have developed a set of methods and frameworks that help you find the optimal data solution for your use case. From helping you with answering the questions in this blog on a more functional level, to technical solutions that help you implement your optimal data solution. Over the coming months we will publish more blogs that dive deeper into these methods and frameworks.