After several years of studying and working with R Studio to do research, I joined Answers 2 Analytics as a blogger in Data Science. I report to their Chief of Data and Analytics with the latest word on the streets in data science.
It’s been a steep — but very fun — learning curve. Soon after joining, my work made a hard turn into the realm of data engineering when I got my first major project: “set up a data warehouse.”
Sure, but what’s the difference between that and a regular MySQL database? I wondered.
I never heard of a data warehouse before. So I asked my CDAO. “Data that all goes in one place, and that enables us to identify business opportunities and helps teams easily make data-driven decisions,” she explained.
Having a graduate degree means I should be good at research, right? So I put those skills to use and dived straight into the brave new world of data stacks and data engineering. It’s indeed a “brave new world” because a whole host of data engineering theory, SaaS vendors, community Slacks — really, an entire ecosystem — have sprung out of seemingly nowhere to address data needs that didn’t exist a decade ago.
Good news, this system is full of helpful, engaging individuals who like to help make difficult jobs into easier ones by (1) getting data from one place to another place and (2) turning raw data into usable data. The bad news though, if you’re not certain of where to begin (e.g., “what’s ETL?”) or even what buzz words to type into Google, it’s easy to get lost. This system has changed very quickly, so my research sent me after vendors who were selling expensive, legacy data management systems that seemed to be designed with outdated assumptions of data. After a while and seemingly innumerable discussions, my understanding slowly became, “I’m building a data stack.”
So — what’s a data stack?
A data stack helps make data utilized. Having a data stack is similar to having a kitchen for data. Think about the process of baking a cake:
See how ingredients turn into a cake after going through the kitchen? Most of the ingredients are not edible by themselves (they have nutrients, true, but you wouldn’t enjoy consuming flour or butter sticks by themselves). But after a while in the kitchen with the right equipment: a cake pan, a mixing bowl, a kitchen timer, an oven, spatula and spoons, a chef who follows the right process, these previously-unedible ingredients become a beautiful cake.
And that’s crucial. Pieces of data sitting in there by themselves aren’t so edible. After going into a data stack, the pieces of data become “edible” as useful facts and dimension tables with understandable field names and types, “edible” for different parts of the company.
What is within a data stack? It’s not simply a warehouse of data! Data stacks consist of tools that carry out the following 4 functions:
- Loading: take data from one spot to another one. Vendors of this service include Alooma, Fivetran, Stitch.
- Warehousing: hold it all in one spot, typically on the cloud. Vendors for this service include BigQuery, Redshift, Snowflake.
- Transforming: make it into data that is edible. Vendors for this service include dbt, ETLeap, XPlenty.
- Analysis & Business Intelligence: serve the edible data to teams. Vendors for this service include Chartio, Cluvio, Looker, Metabase, Mode, Periscope.
Any tech firm who has a passion for data needs to include a data stack that does these 4 functions. The data stack that I created for Convo meets all of these requirements. All the parts of the system function together like a dream, and teams are beginning to consume the data left and right.
To learn more about data stacks, you can reach out to the nice folks at Fishtown Analytics. Also useful are Stephen Levin’s resources, as well as the following video: Future-Proofing your Analytics Stack. In particular, that video shows the similarities and differences between all-in-one vs. modular data stacks, which important to think about when building a data stack.