Goldilocks Criteria: Selecting Business Intelligence (BI) Platforms

This is first of a new series of posts dedicated to helping people select data tools and infrastructure. I’ve listed out the ‘perfect’ feature set for a dream product. Of course these features rarely exist in a single solution, but if they did, I’d use it! First up: business intelligence platforms.

There are many fiefdoms in the kingdom of data – from product analytics to predictive models to advanced user-facing data applications – and no single platform will address every need. So for the purposes of this discussion, let’s define Business Intelligence platforms as data platforms that non-technical business users to explore, prepare and present data germane to their work. These tools should support data-driven insights and decision making but should not require a STEM degree or General Assembly workshop to operate.

Turns out this is ancient stuff. Business Intelligence (BI) platform date as far back to the ‘60 – the 1860s – when Richard Devens coined the term in his Cyclopædia of commercial and business anecdotes when Devens used the term to describe how Sir Henry Furnese, a banker, gained an advantage over his competitors by using and acting upon the information surrounding him. Over the past few decades BI tools have matured rapidly, shifting from beastly on-premise data warehouses with text-focused UIs to cloud-based, mobile-first, lithe data platforms designed for non-technical users.

https://www.sales-i.com/a-history-of-business-intelligence

BI is defined, generally, as ‘tools for data analysis and report generation on top of data aggregated from multiple disparate systems‘. Some BI platforms sit on top of separate data warehouses, and some modern platforms serve as the data aggregator / data store as well. BI tools pack a ton of functionality, but are typically narrow scoped. You don’t “do” anything within your Business Intelligence platform, instead you investigate, learn and report on how other systems are “doing”. BI surfaces data to guide decisions made elsewhere.

BI is generally defined as ‘tools for data analysis and report generation on top of data aggregated from multiple disparate systems’. Some BI platforms sit on top of separate data warehouses and some modern platforms serve as the data aggregator / data store as well.

BI tools pack a ton of functionality, but are typically narrow scoped. You don’t necessarily “do” anything within your Business Intelligence platform. Instead you investigate, learn and report on how other systems are “doing”. BI surfaces data to guide decisions made elsewhere.

You will also see BI in the form of Embedded Analytics within various tools – like your CRM system or your Web Analytics platform. Generally, Embedded Analytics help steer micro tasks like, which email subject performed best?, vs. providing a holistic view of data across multiple sources. The best BI tools provide this holistic view – pulling in all of your data to support cross functional views and insights.

So how does this work in practice? A great use case for BI platforms is to create easy to digest OKRs dashboards for your company, teams and individuals. Your BI platform should allow teammates from different business units pull up live views of their progress towards their outcomes / goals… anytime / anywhere… on their phones… without support from business analysis or IT.

OK, enough preamble. Here are the goldilocks (aka “just right”) criteria I look for in BI platforms:

Integrated data warehouse

Traditionally, BI tools sit on top of separate data platforms managed by engineering teams. More recently, a new class of products have emerged that allow you to upload / connect to your data without engineering support. I find this to be a huge advantage as it allows moderately-technical users to get up and running without distracting / relying on external resources. (Self service also leads to challenges with data governance but that’s another story.)

As an example, imagine easily joining together all of the spreadsheets you store in Google Drive / Dropbox with live data connections to Google Analytics / Facebook Analytics / financial data / more and then exploring and visualizing this data as you choose. That’s what these new platforms do all without the help of data engineerings resources.

Data engineering for dummies

Some of the best data scientists I’ve worked with estimate that they spend 80-90% of their time on data hygiene before they can begin analysis and exploration. The same goes for business analysts.

You will also see BI in the form of Embedded Analytics within various tools – like your CRM system or your Web Analytics platform. Generally, Embedded Analytics help steer micro tasks like, which email subject performed best?, vs. providing a holistic view of data across multiple sources. The best BI tools pull in all of your data to support cross functional views and insights.

So how does this work in practice? A great use case for BI platforms is to create easy to digest OKRs dashboards for your company, teams and individuals. Your BI platform should allow teammates to pull up a live view of their progress towards their outcomes / goals on their phones – right before they go to sleep every night!

OK, enough preamble. Here are the goldilocks (aka “just right”) criteria I look for in BI platforms:

Integrated data warehouse

Traditionally, BI tools sat on top of separate data platforms managed by IT teams. Recently, a new class of products have emerged that allow you to upload / connect to your data without engineering support. I find this to be a huge advantage as it allows semi-technical users to get up and running without distracting / relying on external parties. Self service also leads to challenges with data governance but that’s another story.

As an example, imagine easily joining together all of the spreadsheets you store in Google Drive / Dropbox with live data connections to Google Analytics / Facebook Analytics / financial data and then exploring and visualizing this data as you choose. That’s what these new platforms do all without the help of data engineerings resources.

Data engineering for dummies

Some of the best data scientists I’ve worked with estimate that they spend 80-90% of their time on data hygiene before they can begin analysis and exploration. 

https://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-survey-says/#183675d26f63

What does that mean for BI tools? Any functionality that support easy data manipulation for the sake of improved clarity is awesome. That means – joining data together via drag and drop, changing data types with a click, deduplicating rows without writing SQL is all a huge value add, extending the range of users who can go deep with the data without external assistance.

What does that mean for BI tools? Well, any functionality that support simple data manipulation is awesome. For example – joining data together via drag and drop, changing data types with a click, deduplicating rows without writing SQL, are all huge value adds, extending the range of users who can go deep with the data without external assistance.

Live data! From the cloud! On your phone!

Data that arrives attached to an emails is DOA. This is one of my absolute pet peeves. Further, once people begin offline discussion and editing of the data, the risk of multiple inaccurate versions / views of the same data set commonplace. 

BI tools need to pull from a live backend at all times. When I pull up a link to view a dashboard the data should be (pseudo) real-time, up-to-date, and time stamped clearly with the data last run.

This also means the platform should be mobile-centric. Old timers still want their landscape printouts, but there is nothing more powerful than conversing with colleagues and pulling up live data views on your phone à la minute. 

AI / ML aware

I don’t want to overstate this one as we’re in the very earliest of innings, but your platform should have the foundation of supporting automated machine-learning driven insights. You may not find these immediately valuable (they rarely are out of the box) but in a few years you be getting voice alerts when your data spikes unpredictably in ways you may not have imagined. There is no sense in investing in a platform that is not actively working on automated data insights.

As a start, I’d like to see my platform present basic statistics around the data that I’ve on-boarded. This means simple distribution and correlation reports. As you play with these statistics you’ll be able to more easily wrap your arms around the data at hand, steering deeper analysis and insights. Simple predictive analytics is another good baby step before full blown AI.

This all said, you separately need to invest in training of your teams to take advantage of these statistical insights. Leveling up the data fluency of your team is always more valuable than standing up a wiz-bang technology solution.

Narrative & collaboration focused

A perfect platform would support for metrics-backed storytelling – and not just the sharing of pie charts. That means as a product owner, I can use a BI platform to explore a set of data and then build a coherent, sharable narrative around it. That could manifest itself as a online presentation with live data at different altitudes, supported by text, images, video and other added insights. It also means that I should be able to drawn / pin annotations within the data itself.

Further, the presentation should support active conversation around what’s being presented. Unlimited named user accounts, threaded comments, open annotations, creating next step action item , @ mentions and more are a natural fit here.

Governance gone wild

Sad to say, this is critical. Like supercritical. Like, as soon as you create your second dashboard you need extreme governance otherwise you’ll never find it again or know if the data set that powers it is up to date, approved and official. 

I’ve seen smart approaches here and they center around clear labeling of the data, its origins, similar / duplicative data and more. Having easy way to validate data as “best” or “official” helps too. Ultimately, ML/AI will be a huge help in this arena.

An integrated, dynamic “data catalog” that shows you the breadth of your data, its lineage, stamps of approval, and error reporting is also must-have.

User-level data FTW

BI tool typically play in the aggregated, anonymous altitude. You can see how all your site visitors behave, customer acquisition by location, sales by campaign, etc. Data is viewed on the content, page, campaign, location level – but rarely at user level. In a perfect world, a graph model would be deployed at the atomic event level allowing pivots at the above altitudes but also down to the user level.

A new breed of system called Customer Data Platforms is jumping into the fray here, promising a single view of the user. These CDPs are being leveraged today by Marketing and Sales team but the application of this user-level view to more typical BI use cases is immense. Perhaps CDPs are the topic of the next post in this series…

Live data! From the cloud! On your phone!

Data that arrives embedded within emails or as an excel attachment is Dead on Arrival. That is one of my absolute pet peeves. Further, once people begin to discuss and edit the data set, the risk of multiple versions / views of the same data becomes legitimized. 

BI tools need to pull from a live server at all times. When I pull up a link to view a dashboard the data should be (pseudo) real-time up to date or time stamped clearly with the data last run.

This also means the platform should be mobile-centric. Old timers still want their desktop-focused printouts, but there is nothing more powerful than conversing with colleagues and pulling out live data views on your phone à la minute. 

AI / ML aware

I don’t want to overstate this one as we’re in the very earliest of innings, but your platform should have the foundation of supporting automated machine-learning driven insights. You may not find these valuable right away (they rarely are) but in a few years you should be getting voice alerts when your data spikes unpredictably. There is not sense in investing in a platform that is ignorant to this coming trend.

To start, I’d like to see a platform present basic statistics around the data that I’ve onboarded. This means basic distribution and correlationsinformation. As you play with these basic metrics you’ll be able to more easily wrap your arms around the data at hand, informing deep analysis and insights. Simple predictive analytics is another good baby step before full blown AI.

This all said, you separately need to invest in training of your teams to take advantage of these statistical insights. Leveling up the data fluency of your team is often more worthwhile than the data platforms that they utilize.

Narrative & collaboration focused

A perfect platform would allow for metrics-backed storytelling, and not just the sharing of data dashboards. That means as a product owner, I could use a platform to explore a set of data and then build a coherent, sharable narrative around it. That could manifest itself as a online presentation with live charts (naturally) surrounded by text, images, video and other added insights. It also means that I should be able to drawn / pin annotations to the data itself.

This also means that the presentation platform should support conversation around what’s being presented. Unlimited named user accounts, threaded comments, open annotations, tasks lists, @ mentions and more are a natural fit here.

Governance gone wild

Sad to say, this is critical. Like supercritical. Like, as soon as you create your second dashboard you need this otherwise you’ll never find / know which data is most recent, best, approved and official. I’ve seen smart approaches here and they center around clear labeling of the data, it’s origins, similar / duplicative data and more. Having easy way to validate data / views as “best” or “official” helps too. Ultimately, Machine Learning will be a huge help in this arena.

An integrated, dynamic “data catalog” that shows you the breadth of your data, its lineage, validations and error reporting is also must-have.

User-level data FTW

BI tool typically play in the aggregated, anonymous altitude. You can see how all your site visitors behave, customer acquisition by location, sales by campaign, etc. Data is viewed on the content, page, campaign, location level – rarely at user level. In a perfect world, a graph model would be deployed at the atomic data layer allowing pivots by the above altitudes but also on the user level.

A new breed of system called Customer Data Platforms is jumping into the fray here, promising a single view of the user. These CDPs are being leveraged today by Marketing and Sales team but the application of this granular view to more typical BI use cases is immense. Perhaps CDPs are the topic of the next post in this series…