DUBAI, UAE — The data industry has continued to grow at an unprecedented pace in the past few years, and 2023 has witnessed the most significant shift and push for more democratization in the data analytics space.
Data democratization means that everyone has access to data, and there are no gatekeepers that create a bottleneck at the gateway to the data. The goal is to have anyone use data at any time to make decisions without barriers to access or understanding.
Making virtually all data available to all employees allows businesses to maximize their investment in data and business intelligence architecture. Breaking down data silos can ultimately improve decision-making and empower employees.
Bernard Marr, the bestselling author of Big Data in Practice, said, “We’re seeing a new wave of democracy — of data, as IT departments and organizations are allowing more business users to access data to expedite decision making, influence sales, and customer service, and uncover opportunities.”
Speaking to TRENDS, industry specialists mentioned that ‘data democratization’ will catapult companies to new heights of performance in 2023 if done right.“It’s also vital for employees to have access only to the data they need, and in the right format at the right time to maximize its value,” they added.
Talking about the benefits of ‘data democratization,’ Peter Pugh-Jones, Global Industries Lead, Confluent, said, “Whether a business stretches to every corner of the globe or just sits in one country, it’s important that everyone in a business has access to the data they need to make informed decisions. Fast.”
Hurdles must be overcome
Gregg Petersen, Regional Director – MEA at Cohesity, said, “The shift to true data democratization is a journey, not a light switch. Moving from a culture of “data gatekeepers” and silos to one of “data citizens” requires careful thought and planning.
Petersen mentioned that there are two of the most important things to be considered. Firstly, it requires that leadership sees the value of democratizing data and is consuming it themselves. Many organizations are now hiring a Chief Digital Officer (CDO) to champion the cause. The number of CDOs has increased over 4-fold in the last decade, indicating a strong shift towards embracing this new approach.
Secondly, data literacy training is critical. Gartner defines data literacy as “the ability to read, write, and communicate data in context, including an understanding of data sources and constructs, analytical methods and techniques applied, and the ability to describe the use case, application, and resulting value.” That’s a lot for an organization to handle! While Business Intelligence (BI) systems have been used for decades, BI systems still require specialized training to both ingest and egest data. More recently, machine learning (ML) and natural language processing (NLP) algorithms have simplified the analysis of data. However, to have truly democratized data, all users must be trained in the basics of data literacy. Obviously, not every employee can meet those requirements on Day One.
Peter Pugh-Jones said, “It is really vital that only data that is suitable for a particular use is shared with the correct consumers in the organization. This is typically managed in Confluent with data streaming pipelines that help construct data products aimed at the individuals consuming it.”
Pugh-Jones added: “With our new governance metadata tags and lineage tracking, we can surface data that we want to make available with the correct levels of sensitivity, so the users of that data know, not only where the data came from and how it was constructed, but that what they are seeing is suitable for their intended use. When it comes to data classed as PII sensitive data, for example, it is imperative that data is curated and flagged properly. On the flip side, we have to do all of that while still ensuring the data will be fit for the purpose for the use case. Getting that balance right is really what we are aiming to achieve with data streaming SAAS offerings such as Confluent Cloud.”
Counter the inevitable threat to data security strategies
Gregg Petersen said, “Good data democratization is built on good data governance. There will always be confidential and proprietary data in an organization. Knowing where that data is and who has access through classification is the first step. Training end users can help avoid misuse and misinterpretation, and good data management policies can reduce duplication and potential legal exposure.”
Petersen echoed that while traditional security practices have been seen as a roadblock to the access of data, they still play a very important role in a true data democratization environment. He said, “Threat detection, endpoint protection, and email security are necessary in ensuring data stays in the democracy in which it is intended! Other technologies like encryption and data masking can protect data in motion, while anomaly detection and virus scanning can expose potential bad actors and identify when data is moving in the wrong direction.”
Peter Pugh-Jones said, “Confluent implements and provides layered security controls that data owners can configure specifically designed to protect and secure their customer’s data. Essentially ensuring everyone in a business only ever has access to the information they need and no more, will naturally help with security. He said, “In the unlikely event of something going awry with the governance policies data owners put in place, due to the resilient, immutable log that Confluent Cloud provides, it is possible to rewind time and view the data flowing just as that event happened, easily helping you identify the point in the lineage where you may need to take corrective action and demonstrating to a regulator or auditor the level of governance and granular control you have over your data streaming platform.”
Extracting value from data should be a fundamental part of the job role
Gregg Petersen said, “More and more companies are identifying the value of their data as a line item on their balance sheets. By extrapolation, a company’s data is the vehicle by which they get paid. A company can replace the tangible things they have but take away their data… there’s no more company.”
Petersen added: “It has to be a company’s culture. Every level must be engaged and on board. Training, training, training. Give each user the skills and tools to be successful. Champion success and share where things need improvement. If the draw of data democratization is to find new ways to be successful, extract new value, and establish new efficiencies, many of those insights will come from individuals and departments you would never dream of.”
Peter Pugh-Jones said, “Data Streaming platforms such as ours at Confluent natively provide capabilities for curating data into consumable formats for different types of usage. For example, historically, a data scientist would typically spend way more of their time wrangling data into a shape that they can use for building predictive models, than they would actually spend building and testing new models. In the modern world of streaming pipelines the way they need to receive the data can be managed as a data product designed specifically for modelling, while the data analyst may be looking at some of the same data points, in a different way specifically for their needs, receiving it from their own data pipeline.“
Regional regulatory and compliance concerns
Gregg Petersen said, “Arguably, one of the biggest challenges in the move to the cloud is avoiding cross-border data stores. Each country and region continue to establish and adapt their regulatory policies. First of all, don’t boil the ocean; assess the biggest risks in your plan and prioritize: red, yellow, green, or high, medium, low — however you prefer.”
Peter Pugh-Jones said, “One of the single biggest challenges for a global business today is regulation and the subtleties of differing regulations within sub-regions. Continental Europe is a good example of this. Within the EU, there is the well-known GDPR (General Data Protection Regulation) set of regulations that span the whole of the EU, but then within each individual country within the EU, there are subtleties and differences that must also be taken into account and managed to avoid local liability. In some countries and jurisdictions around the world, it is also not permissible for certain types of consumer and other sensitive data to move outside borders.
“Thankfully, today, all of the above challenges can be addressed in the modern era by taking advantage of Multi-cloud and Hybrid cloud topologies and principles, linking clusters between when needed, obfuscating sensitive data, and ensuring that restricted data is managed as laws require and all are capabilities which are fully supported by Confluent products.”