This is the first of a new series of posts dedicated to helping people select data tools and infrastructure. I’ve listed out the ‘perfect’ feature set for a dream product. Of course, these features rarely exist in a single solution, but if they did, I’d use it! First up: business intelligence platforms.
There are many fiefdoms in the kingdom of data – from product analytics to predictive models to advanced user-facing data applications – and no single platform will address every need. So for the purposes of this discussion, let’s define Business Intelligence platforms as data platforms that non-technical business users to explore, prepare and present data germane to their work. These tools should support data-driven insights and decision-making but should not require a STEM degree or General Assembly workshop to operate.
Turns out this is ancient stuff. Business Intelligence (BI) platforms we first utilized back in the ’60s – the 1860s – when Richard Devens coined the term in his Cyclopædia of commercial and business anecdotes when Devens used the term to describe how Sir Henry Furnese, a banker, gained an advantage over his competitors by using and acting upon the information surrounding him. Over the past few decades, BI tools have matured rapidly, shifting from beastly on-premise data warehouses with text-focused UIs to cloud-based, mobile-first, lithe data platforms designed for non-technical users.
BI is defined, generally, as ‘tools for data analysis and report generation on top of data aggregated from multiple disparate systems‘. Some BI platforms sit on top of separate data warehouses, and some modern platforms serve as the data aggregator/data store as well. BI tools pack a ton of functionality but are typically narrow-scoped. You don’t “do” anything within your Business Intelligence platform, instead, you investigate, learn and report on how other systems are “doing”. BI surfaces data to guide decisions made elsewhere.
You will also see BI in the form of Embedded Analytics within various tools – like your CRM system or your Web Analytics platform. Generally, Embedded Analytics help inform micro tasks like, which email subject performed best, as opposed to providing a holistic view of data across multiple sources. The best BI tools provide this holistic view – pulling in all of your data to support cross-functional views and insights.
So how does this work in practice? A great use case for BI platforms is to create easy-to-digest OKRs dashboards for your company, teams, and individuals. Your BI platform should allow teammates from different business units to pull up live views of their progress towards their outcomes/goals… anytime / anywhere… on their phones… without support from business analysis or IT.
OK, enough preamble. Here are the goldilocks (aka “just right”) criteria I look for in BI platforms:
Integrated data warehouse
Traditionally, BI tools sit on top of separate data platforms managed by engineering teams. More recently, a new class of products has emerged that allows you to upload/connect to your data without engineering support. I find this to be a huge advantage as it allows moderately-technical users to get up and running without distracting/relying on external resources. (Self-service also leads to challenges with data governance but that’s another story.)
As an example, imagine easily joining together all of the spreadsheets you store in Google Drive / Dropbox with live data connections to Google Analytics / Facebook Analytics / financial data / more and then exploring and visualizing this data as you choose. That’s what these new platforms do all without the help of data engineering resources.
Data engineering for dummies
Some of the best data scientists I’ve worked with estimate that they spend 80-90% of their time on data hygiene before they can begin analysis and exploration.
What does that mean for BI tools? Any functionality that supports easy data manipulation for the sake of improved clarity is awesome. That means – joining data together via drag and drop, changing data types with a click, and deduplicating rows without writing SQL is all a huge value add, extending the range of users who can go deep with the data without external assistance.
Live data! From the cloud! On your phone!
Data that arrives attached to an email is DOA. This is one of my absolute pet peeves. Further, once people begin offline discussion and editing of the data, the risk of multiple inaccurate versions/views of the same data set is commonplace.
BI tools need to pull from a live backend at all times. When I pull up a link to view a dashboard the data should be (pseudo) real-time, up-to-date, and time stamped clearly with the data last run.
This also means the platform should be mobile-centric. Old-timers still want their landscape printouts, but there is nothing more powerful than conversing with colleagues and pulling up live data views on your phone à la minute.
AI / ML ready
I don’t want to overstate this one as we’re in the very earliest of innings, but your platform should have the foundation of supporting automated machine-learning-driven insights. You may not find these immediately valuable (they rarely are out of the box) but in a few years, you be getting voice alerts when your data spikes unpredictably in ways you may not have imagined. There is no sense in investing in a platform that is not actively working on automated data insights.
As a start, I’d like to see my platform present basic statistics around the data that I’ve onboarded. This means simple distribution and correlation reports. As you play with these statistics you’ll be able to more easily wrap your arms around the data at hand, steering deeper analysis and insights. Simple predictive analytics is another good baby step before full-blown AI.
This all said, you separately need to invest in training your teams to take advantage of these statistical insights. Leveling up the data fluency of your team is always more valuable than standing up a wiz-bang technology solution.
Narrative & collaboration focused
A perfect platform would support metrics-backed storytelling – and not just the sharing of pie charts. That means as a product owner, I can use a BI platform to explore a set of data and then build a coherent, sharable narrative around it. That could manifest itself as an online presentation with live data at different altitudes, supported by text, images, video, and other added insights. It also means that I should be able to draw / pin annotations within the data itself.
Further, the presentation should support active conversation around what’s being presented. Unlimited named user accounts, threaded comments, open annotations, creating next-step action item, @ mentions and more are a natural fit here.
Governance gone wild
Sad to say, this is critical. Like supercritical. Like, as soon as you create your second dashboard you need extreme governance otherwise you’ll never find it again or know if the data set that powers it is up to date, approved, and official.
I’ve seen smart approaches here and they center around clear labeling of the data, its origins, similar/duplicative data, and more. Having an easy way to validate data as “best” or “official” helps too. Ultimately, ML/AI will be a huge help in this arena.
An integrated, dynamic “data catalog” that shows you the breadth of your data, its lineage, stamps of approval, and error reporting is also a must-have.
User-level data FTW
BI tools typically play in the aggregated, anonymous altitude. You can see how all your site visitors behave, customer acquisition by location, sales by campaign, etc. Data is viewed on the content, page, campaign, and location level – rarely at a user level. In a perfect world, a graph model would be deployed at the atomic data layer allowing pivots by the above altitudes but also on the user level.
A new breed of system called Customer Data Platforms is jumping into the fray here, promising a single view of the user. These CDPs are being leveraged today by Marketing and Sales team but the application of this granular view to more typical BI use cases is immense. Perhaps CDPs are the topic of the next post in this series…