Best Practice For Building Scalable Data Marts

Making sure the  architecture of your Data Marts is scalable is vital for so, so many reasons.

The main two are that it reduces the  risks of future data loss as well exponentially reducing implementation and  upgrade costs over time.

Definition Of A Data Mart

We’ve already touched on what a Data Mart is so we won’t spend too much time defining them, but in short, a Data Mart can be considered a ‘subset or precursor to a Data Warehouse, drawing on a much smaller or condensed subset of data and  resources.

They’re subject-oriented databases that will focus on one particular aspect of an organization’s data, either by department, product or a particular focus area.

Best Practice When Designing A Data Marts Architecture

If you’re looking to make sure your Data Mart is both efficient and  scalable, you won’t go far wrong in following best practice for building a more traditional Data Warehouse, however, there are definitely some differences you’ll want to consider…

Define The Scope Ahead Of Time

The most important step to take, before any work is started in the  design or implementation phase of creating a Data Mart is to take a step back and  consider why it’s being created in the  first place.

What are the  business needs that need to be met and  what are the  pressing priorities for all stakeholders, from the  CEO/CTO, to the  team members, to their end-users/clients.

Once that’s understood (and  documented) you can start scoping out the  project, with  a much clearer sense of everyone’s expectations and  requirements (as they won’t always be the  same thing).

The Logical Data Mart Model Is Important

A logical Data Mart model isn’t a ‘thing’.

It’s the  theoretical design that some people use when creating Data Marts that labels data through their logical relations, attributes and  entities.

An entity is the  data itself whilst the  attribute can be considered as how the  data is defined within the  Data Mart.

When you start to map out your Data Mart’s architecture it’s important to keep step one in mind and  stay focused on the  organizations needs and  the  stakeholders priorities.

With that front and  centre in your mind, source data can be mapped to a highly specific subject-oriented information in your Data Mart’s destination schema.

That means, when creating your schema for the  first time, the  two most vital elements to focus on are the  source data model and  your user requirements, from staff to end-users.

Find The Data You’ll Need

Organizations find Data Marts so useful because they can hold a subset of data normally available to the  entire organization that’s specific to a particular department, function or task.

Whilst available data is usually defined by immediate business requirements, it’s almost always important to look past those short term requirements to consider what might be needed going forward as well to prevent the  Data Mart becoming obsolete too quickly.

A good starting point is to take all the  required business factors that will be relevant to the  Data Mart and  / or business critical to anyone using the  Data Mart.

From there you can generate a list of critical data fields based on the  requirements of everyone involved scoping out the  Data Mart (and  their end-users).

It’s also probably a good idea to separate your data out into facts and  dimensions at this point to save time scaling later.

Now Narrow Things Down

Once you’ve identified all the  potential data your new Data Mart might need, you’ll have to start narrowing down what actually gets included (before you end up with  a duplicate Data Warehouse).

With the  dimensions and  facts you need scoped out, it’s time to look at all the  disparate sources that will feed into your Data Mart.

Within your growing architecture, the  dimensions will need to be mapped to your lookup tables, with  the  facts mapped to your transactional tables but it’s typically here where you’ll find that some of the  data you were hoping to use can’t be mapped.

If that happens, the  most common reason is that certain fields in your source systems haven’t been made compatible with  the  data groups you’ve created in your Data Mart and  you’ll have to make a decision about limiting the  amount of data you ingest or expanding the  scope of your Data Mart.

It’s Time to Populate

Your Data Mart is now starting to take shape and  you can start populating it by transferring data. This is the  point where you’ll want to set the  frequency of how often your data is updated or refreshed.

A good tip for making sure all the  data in your newly created Data Mart stays clean it’s good practice to make sure it’s overwritten during the  population process.

Who Can Access Your Data Mart… And To What Extent?

Now that your Data Mart is up and  running with  active data, it will likely be used to run queries, generate reports along with  lots of other functions.

The people using it on a day to day basis however may well not be technical so a good step to take is in adding a meta layer to your Data Martin which item names and  your database structures get translated into easily recognisable corporate terms.

Once done, you’ll also need to set the  differing levels of access for anyone using it.

Sign in to save this post