IT has become so important that ensuring high levels of availability, performance and cost-effectiveness have become of paramount importance. Yet, at the same time, there is an increased demand for agility and scalability. So-called business agility. Business agility demands more flexibility and nimble operations. And, it usually results in more frequent changes in software systems. These changes not only influence the logic and boundaries of the systems, but also the data of a system. The role and value of data has changed radically over the past few years. Whether it’s called Big Data, is handled by Data Sciences or stored in NoSQL databased, we accepted to look at it differently. This is also the world of data platforms. A data platform collects, stores and secures data that is spread across a distributed infrastructure and multiple data-stores.
There are a number of data platforms available today. Each data platform offers specific features and benefits to its users. However, they all share a number of core capabilities:
Data is not just a schema and contents, it is also about the nature of data in terms of dynamics and volume. Examples of the nature of data are fast-growing collections of measurement data, data that is enriched over time and hierarchies of interconnected data. Therefore, it’s important to think of the nature and structure as separate from the data itself, because a single bit of data can have different manifestations. Take as an example data about a student: a graph could be used for looking at the classes or working groups he or she is part of, the monitored actions in a learning system may best be represented by a stream and a document model would best work for assignments.
In addition to the data itself, it is important to manage the metadata about the data. This means that data management requires governance. So, you have to think about security and compliance issues as well as quality issues. Examples of security and compliance areas include encryption, access control, provenance and auditing. Whereas quality issues include validation, business rules, cleansing and access. Therefore, in order to successfully use a data platform, it is important to ensure that your data is clean and valid and that you have the right data at the right time.
Performance is one of the key distinctive features of data platforms. They need to be scalable, fast, fault tolerant and robust. However, when looking at the entire performance spectrum, there really is no “silver bullet”. Not every type of data requires the performance characteristics. Therefore, it is important to use a data platform that allows you to extend the storage providers over time as well as configure the different performance dimensions.
The increased need for business-agility impacts almost all software systems. It results in more changes and new challenges like historic perspectives and data-provenance. To be able to handle this altered reality, IT professionals are tasked by a number of major stumbling blocks. For example:
Agile business strategies will result in changed definitions and storage of data. Therefore, systems need to be able to handle different versions of data and introduce changes to data without resorting to major migration strategies and suffering from unwanted down-time.
Furthermore, the nature of data itself will evolve overtime. This means that it is almost impossible to design a single data storage strategy or technology that is fully future-proof.
And, last and definitely not least, developers face with the problem of migrating existing systems. The migration of single datamodel/database systems to an agile data-architecture is a challenge in itself which can benefit from a number of proven practices.
After Neal Ford coined the term Polyglot Programming in late 2006, a new way of looking at programming languages emerged which dictates that someone should use the right language for the specific job at hand. Instead of assuming a “default” language like Java or C# and then warring over the many different available frameworks, polyglot programming is all about using the right language for the job rather than just the right framework(s). Over time, this has also impacted the way we look at persistence.
For a lot of organizations, a single relational database seems to be the accepted and default choice for persistence. Other times it is simply what we’re used to doing, and possibly we don’t even consider alternatives. But now, with things like MongoDB, CouchDB and many others we should reconsider our persistence strategy as well. Enter the world of Polyglot Persistence, which like polyglot programming, is all about choosing the right persistence option for the task at hand.
As a way forward, polyglot persistence can solve a number of these challenges, but, as usual, it comes at a price. Because data has a number of inherent complexities, for example ACID transactions, authorization and provenance, it is clear that introducing multiple persistence solutions will bring new complications. Enter the world of Polyglot Data Platforms. A Data Platform is a distributed computing system that collects, integrates and manages large sets of structured and unstructured data from disparate sources. An effective data platform offers a unified development and management environment that provides access to consistent, accurate and timely data.