2002 Usenet tree map by number of returnees

Image by Marc Smith / CC BY

Two different Big Data Structures

I have a habit of sharing blog articles by Bernard Marr. He writes really well about big data possibilities and pitfalls. Couple of days ago I got an comment on our company Facebook page about big data structures for my share of Berdnard’s article about Big Data Layers. The view was a bit different. I have the same view as Bernard. I see big data structure process like (see details in his article).  Vincent Granville had a bit different view to big data structure. It was based on how data was refined. You have raw, linked and summarized data (see his article ‘Defining Big Data‘).

Analogy to Product Data Management

This reminded me about something I realized when I was working with product data management. For product data there is not one data structure but several. I give you an example, a car. The competed product (a car) consists of parts. In order to get the car manufactured you need bills of materials (BOMs) to get a car assembled. It has sub structures, that have their own BOMs. Naturally you don’t have exact BOMs for each different possible car model that you could produce, that would be highly ineffective. Each car is configured and the product data model is modular, so that you can configure different options (e.g. interior, engine size, transmission type etc.). The modules belong to overall product data model, that enables configuration of individual cars.

Getting the car ready is not the full picture. Car’s have a lifespan and they are serviced and maintained during their lifespan. For maintenance you need to have a different data structure. Modules are a bit different and they have different attributes to just manufacturing data structure like maintenance intervals, replacement age and maintenance instructions. When I was responsible of product data management, there were up to five identifiable data structures. All data structures were derived from same set of data.


Is big data structure analogous to product data? I think yes. Different organizations need a different view to the data and they need to see it differently. If you structure it one way, it might be a average match to almost everybody. To have it structured differently for different needs, you can get the most out of the data.

What does this mean? You can’t calculate KPI’s differently to different organizations. That will end you discredited very quickly. I think that this means that you model should be flexible enough that you can reshape it easily. One way to see this is by looking marketing and sales. Marketing has the data reshaped differently as it is for sales. Obvious though is to have some different KPI’s for each other and some the same. This is quite obvious, you could have some more complicated differentiation’s in customer model based in the big data structure.

If you have some real life examples where data has a different model depending on usage, I’m more that happy to hear those.