If you’re like me, explaining to friends, strangers, and your parents what you do for a living can be an arduous task. (No, I don’t store different companies’ data in a warehouse somewhere, nor do I fix people’s computers.) When I recently registered with Mint.com, however, I saw parallels that could make understanding what Business Intelligence and Data Warehousing are and how they benefit companies simpler, because what is Mint if not an online, personal finance data warehouse?
Staging disparate data in one place
What’s your net worth? Just add your checking and savings account balances, plus the values of any investments you have – 401(k), IRA, individual stocks – and subtract the current balances of your credit cards, mortgage, student loans, car loans, bookie, and any other debts you may have. That’s about a dozen web sites I would have to visit (and countless passwords to remember) to calculate my current net worth, and it’s actually easier for me, because I have gone to the trouble of setting up online access to those accounts. Otherwise, I would have to peruse dated, tree-killing statements to retrieve numbers that change daily. And as timeconsuming as the task is, periodically repeating the process doesn’t make it any faster or easier.
Mint allows you to register all of your accounts in one place – checking and savings, investments, loans, and credit cards – and refresh the balances on demand, even keeping track of historical values. Any question you have about your finances can be answered almost instantly, for faster, better informed decisions.
For a company – a commercial bank, for instance – the equivalent would be tracking multiple lines of business: deposit accounts, loans, credit cards, wire transfers, etc. Without a data warehouse, they would be adding the bottom lines of each of them manually, in a spreadsheet, the same way you would calculate your net worth, despite the fact that they have a few billion more in assets. It’s a tedious process that is dangerously prone to error and not conducive “drilling” into the data, i.e. conducting detailed analyses of numbers that stand out to the report recipients, because the output is anything but dynamic.
Data warehouses can contain a company’s data from multiple source systems, plus custom data, like goals and projections, staged in a single place and organized in a way that is conducive to getting the data back out in the form of reports or dashboards. They keep historical data, and new rows from the source systems are added when the warehouse is refreshed, usually daily or weekly. At that point, the reports and dashboards can be smoothly updated, as they sit on a predictable data structure.
How good is the data in Mint? It makes its best guess as to how to categorize your transactions, but if you want to use its budgeting capabilities, they had better be pretty accurate. For instance, you may think you have already exceeded your entertainment budget for the month, but then come to realize that your cable/internet bill was mistakenly classified in the Entertainment category. Mint gives you the ability to reclassify that transaction as well as having that rule persist for future transactions with the same description.
For data warehouses, data quality is most always problematic. Transactions need to have valid reference data so they can be properly classified, addresses in customer or property data must be validated, and other cleansing/special rules must be done with data cleansing tools or freehand code. In practice, when creating a data warehouse, this can be a lengthy process, as all parties and departments must agree on data definitions, what is valid, how metrics are to be computed, and against what those metrics will be compared, e.g. last year, last month, goals, projections, etc. And every department has a set of rules regarding their data that only they know – “these transactions don’t count towards the total, those transactions are treated differently” – like the ability in Mint to tag rows as reimbursable, tax-related, etc. Fitting everything together within a predictable, repeatable, accepted framework is one of the most challenging aspects of data warehousing. Remember that the processing will ultimately be done by computers, and a human cannot feasibly eyeball every transaction (not in any budgets I’ve seen).
Data visualization, Key Performance Indicators (KPIs), and alerts
Now that your data is all clean and organized, you can confidently look at it.
|The main page is like a dashboard, showing your high-level balances, current amounts spent relative to targets, and any alerts you may have triggered for exceeding those targets.
A BI/DW term commonly used to describe these is KPIs. A metric is anything that is being measured (total dollars spent), while a Key Performance Indicator is a metric with the added context of being measured against a specific target (amount of budget spent).
|A trending page has more spending details, as well as your spending history. Oddly, the “trend” page largely utilizes pie charts – a poor choice for displaying trends (PDF).
The transactions tab, pictured in the Data Cleansing section of this post, contains the row-level data.
|An investments page shows how your current holdings are tracking relative to several common indices. As you can see, I have been killing the market the last few months, only losing about 20% of the total value of my portfolio.|
Some screenshots from Lifehacker
The functions are very familiar to BI tools: a high-level dashboard, various visual representations to help identify trends and outliers, and the ability to access row-level transactions. This is the realm where I spend much of my work time, personally.
A typical bank might have a dashboard containing high level information about revenue, profit, average balances, and new/lost customers, measured against targets for the current year to date and quarter to date. It may also have information about certain types of customers, regions, or services and products offered. A mature data warehouse might even have information about that bank compared to competitors or the market, but getting feeds of that data can be difficult, just as getting people to agree on targets is. Reports contain the granular data needed to investigate notable findings in the dashboard. Depending on the tool, some reports are dynamic, allowing for drilling and other interactivity, while others are static.
A bit more on KPIs in both Mint and BI tools: in both, the user can set rules that, if violated, trigger alerts that notify the appropriate parties. In Mint, if you exceed a budget, you can be notified via email or text message. Similarly, most BI tools have the ability to send notifications (email is most common) if business rules are violated, thereby enabling better exception management.
Data moves in one direction
When Mint alerts you that a credit card bill is due, it does not provide you the functionality to pay it, and you can’t reallocate your portfolio through their interface upon viewing your atrocious recent investment performance. You merely decide what to do by viewing their site, with no ability to manipulate the data.
Similarly, data in a warehouse only travels in one direction: from source systems into the warehouse, and then from the warehouse to various end user applications (reports, dashboards). It is extremely uncommon that an end user would be able to modify the data that they are viewing, as would be the case with data stored in a spreadsheet. The warehouse is the standard, and end users cannot write data back to it. In short, a warehouse expedites decisionmaking, but does not facilitate it.
Mint can do more than I have enumerated here, as can a warehouse, and I’m sure I’ve missed some similarities, but they do have quite a bit in common. Both ideally grant the user better, more accurate, more timely data, affecting behavior positively and leading to faster, better informed decisions. One major area in which they part ways, however, is that Mint is free.