Digital Transformation » The unstructured data challenge for chief data officers

Given the copious volume of data generated routinely, organisations need to draw upon the intelligence, insight and knowledge that resides across the enterprise, to help ensure fully informed decision making on all aspects of managing the business.

To support this, organisations typically deploy data warehouses, or even ‘data lakes’.  These enable them to collect, and consolidate information from a variety of core enterprise systems, help to facilitate management reporting, decision making, as well as process optimisation.

A major challenge for chief data officers (CDOs) is that there is a vast amount of unstructured data that is held in information sources such as Excel spreadsheets and end user computing (EUC) tools.

The Excel spreadsheet is the ‘go to’ tool that is widely used as a flexible report writer, for storing key information or as a stop gap to replace missing functionality in core enterprise systems, and for business-critical financial and business modelling.

The difference between enterprise systems and spreadsheet-based processes is that the latter are largely unmonitored, uncontrolled and lacking in transparency.

CDOs are faced with the task of identifying, understanding, managing and ensuring the accuracy of this unstructured information, and then consolidating it with data from the enterprise systems to derive value for the business.

Combining uncontrolled, potentially inaccurate data into a data warehouse has the potential to compromise the quality of an organisation’s golden source of data. This could impact the quality of the decisions of the business, and its management reports, which can have a material impact on its reputation.

Poor quality data also carries a regulatory compliance risk too. Confidential and sensitive personal data routinely sits in uncontrolled and unmonitored spreadsheet-based applications, which can weaken the organisation’s GDPR and CCPA compliance efforts.

Similarly, there are other regulatory regimes too such as SOX, IFRS9, SR 11-7, IFRS 16, SR 11-7 and more, where uncontrolled, unstructured, potentially error-prone data can be a threat to compliance.

Spreadsheet and data warehouse integration challenge

Spreadsheets tools don’t naturally align with data warehouses – the flexibility and dynamism of spreadsheets works against the highly controlled and very systematic nature of a data warehouse. This dynamic has the potential to compromise the integrity of the data in the business.

Modellers and other spreadsheet power users use spreadsheets to develop business-critical models and processes. By their nature, these complex models are constantly changing. Regularly importing spreadsheet-based data that is constantly evolving without reconfiguration into the data warehouse can be problematic, and sometimes even impossible.

Moreover, it’s not always possible to integrate some enterprise applications data directly into the data warehouse, due to priority issues, or a lack of APIs or technical skills. In such situations, data management teams, commonly use spreadsheets to consolidate and stage the data, before passing the records into the data warehouse. Nonetheless, changes to the application and the structure of the staging spreadsheet itself can cause data inconsistency and inaccuracy.

A solution: spreadsheet risk management + Excel data warehousing  

Combining spreadsheet risk management with the systematic data warehousing of that Excel data can help CDOs overcome the challenge of managing unstructured data, cost-effectively. This approach will provide them with the same level of control, consistency and quality of their data from the spreadsheet estate, as they enjoy with their enterprise data sources, while allowing users to continue to use Excel as a strategic application.

Spreadsheet risk management can encompass the process of scanning, identifying and monitoring mission-critical spreadsheets, enhancing the accuracy of the spreadsheet environment.

It gives CDOs a precise picture of the spreadsheet landscape. In large corporations, it’s not unusual to see spreadsheet files and applications in the tens of 1000s. Data management teams can identify the business-critical files, to determine where they reside and understand all the intricate linkages across the spreadsheet landscape.

The inventory of spreadsheet applications can be tiered, based on materiality, importance and the potential risk they present to the business. The data management team is then in a good position to track changes in the most-critical files to maintain quality and accuracy, ensuring they meet the data quality standards of the business.

With this in place the data management teams can simply extract the excel data from the spreadsheet risk management platform by utilising ETL (Extract, Transform and Load) capabilities to establish a framework for importing dynamic spreadsheet data in to the data warehouse environment and thus maintain the integrity of the data warehouse.

Modern spreadsheet management solutions allow users to nominate data sets (ranges of cells) within a spreadsheet that need to be imported into the data warehouse. So, when an Excel user makes extensive changes to their spreadsheet – e.g. changing the location of the data/size of the data set or the name and location of the file – the spreadsheet management solution recognises these changes and ensures that the data sets are consistently stored within the solution to make extraction using an ETL process straightforward.

This approach allows data management teams to review the spreadsheet data before it’s imported in to the data warehouse, so that quality assurance can be applied.

A win-win for CDOs and spreadsheet users

Spreadsheet risk management combined with data warehousing facilitates all enterprise data – both structured and unstructured – to reside in a single location, while strictly enforcing the corporate data policy to all applications including the dynamic, business critical spreadsheet applications. It’s a win-win for both the CDO and the spreadsheet users.