A tour of the Data Warehouse Manager
The Data Warehouse Manager, accessed by clicking Manage Data > Data Warehouse in the sidebar, is the portal to your Magento BI Data Warehouse. Using the Data Warehouse Manager, you can manage table and column sync settings, drill-down into a table’s schema, and create calculated columns to use in reports. Note that you need Admin permissions to access the Data Warehouse Manager.
In this article, we'll cover:
- Learning your way around
- Syncing tables and columns
- Creating calculated columns
- Dropping tables and removing columns
- Syncing new tables in the background
- So, when can I use my new columns?
Learning your way around
The left side of Data Warehouse Manager page contains the tables list, allowing you to easily toggle between tables. When you select a table from the list, the table management area will populate with the table's schema where you can make changes to the selected table.
Within the table list, tables are grouped by their connection source. These sources are added under Manage Data > Integrations and may be either a database, an API, or a third-party connector. At the top of the table list is a search box enabling you to easily find desired tables.
Underneath the search box, you will see two options: All Tables and Synced Tables. The All Tables option lists all the tables that you’ve made available to your Data Warehouse, which includes both synced and unsynced tables.
The Synced Tables option shows all tables that have already been added into your Data Warehouse and have data being replicated from the selected columns.
Don’t see the table you’re looking for in the All Tables list? There are a few possible reasons for this:
- The data source hasn’t been added yet
- The data source is a database and the Magento BI user you created doesn’t have access. In this case, you or your database administrator will need to grant access.
- The data source or table was recently added and hasn’t been synced yet
Syncing tables and columns
Syncing New Tables and Native Columns
The Data Warehouse Manager not only gives you the ability to easily view and manage your data sources, you also have the freedom to select the individual tables and columns you want to sync.
- Click the All Tables option and locate the table you wish to sync.
- Click the name of the table to preview the schema. If the table is new, all columns will display as Unsynced.
- Check the columns you want to sync. Note that columns native to a table will have From Your Database in the Location column.
Make sure you check the Primary Key columns - these columns have a key symbol next to the column name. A Primary Key is required to properly sync data into the Data Warehouse.
If you’re syncing a table that comes directly from your database, it’s possible that Primary Keys may not be denoted. In this case, contact your database administrator to request that a primary key or keys be added to the table.
- When finished, click the button.
A Success! message will display and the status will change to Pending for the selected columns. After the next full update completes, the newly synced tables and columns will be available for use in reports; you can also set new replication methods after the initial sync.
Here’s a quick look at the whole process:
Syncing New Tables in the Background
When you a sync a large, new table for the first time, your data warehouse needs to retroactively capture all data points in the table before capturing new data on an ongoing basis. If your table is particularly large, you may not want to have that initial sync run in sequence with your update cycle — in situation, you'll want the initial sync to occur in the background, in parallel with any currently running update.
To make sure that occurs, you should select the Save and Sync Data Immediately option syncing that table for the first time, as below:
Checking for new tables and columns
Your Data Warehouse doesn’t automatically detect new sources, tables, or columns the moment they’re added. A synchronization process runs throughout the week to find new additions and make them available, but you can force a structure synchronization if you want to access newly added tables and columns before the process runs.
Below the search bar in the table list is a Check for new tables and columns link. Clicking this link will force-start the structure sync process; new additions are typically available after 10 minutes. Refresh the page to see the new source, table, or column.
Creating Calculated Columns
Simply being able to see and manage data from all your sources makes gaining insights into your business that much easier. But within the Data Warehouse Manager, you have the ability to go a step further by creating calculated columns inside your tables. Calculated columns derive new information from your existing data.
Let's say you want to add user’s lifetime revenue to your users table to find high value users. Or, if you want to segment revenue by gender, you can add customer’s gender to your orders table.
To help you master creating these columns, we created a tutorial to walk you through it.
Dropping Tables and Removing Columns
Just as you have the ability to select tables and columns to sync to your Data Warehouse, you also have the ability to drop or remove them. Note that dropping a table or removing columns will delete any dependent reports, metrics, filter sets and columns once you confirm the deletion. Be certain you want to do this - this action cannot be undone.
Don’t worry if you click Delete by accident. A dependency check runs before anything is deleted, so you’ll have the chance to review everything before confirming.
**To remove columns, **click the table that the column belongs to. Check the columns you want to remove and click the button.
To remove a synced table, select all columns in the table, and again click the button. This will remove all native and calculated columns that use this table from your data warehouse.
Whether you’re dropping a table or removing columns, a dependency check will run before the deletion process completes. Dependencies are calculated columns, metrics, filter sets, and reports that utilize the table or column(s) being removed. Any discovered dependencies will display - at this point, you can either cancel the process or click Confirm Changes to drop the table/remove the column(s).
While deleted dependencies cannot be restored, the tables and columns will still be available if you need to resync any native columns in the future.
Here's a quick look at removing a column:
So, when can I use my new columns?
New synced columns and new/updated calculated columns will be ready for use after the next full update completes. If an update is not already in progress, you can force an update by clicking the Force update link shown at the top of the Data Warehouse or Integrations page. You can also schedule an email notification upon completion of the update by clicking Email me when complete.
When you’re ready to use your new columns in reports, you’ll need to add them to metrics first. Although data won’t be available until an update completes, you can still use new columns in reports. Data within the report will display when the update is finished.
That's it - we're at the end!
We covered a lot of material in this tutorial. By now, you should have a solid understanding of what a database is, how data is organized, how tables relate to each other, and what you can do with the Data Warehouse Manager.