Within Spotlight, what is data caching and how can it benefit my workflow?
When downstream sources connect to Spotlight to query the latest results from a Dataset, a chain of events is kicked off within Spotlight to source this data. Spotlight crawls through the sources that feed and build the targeted Dataset and returns a current representation of the requested data.
In order to limit the amount of hops required to access your data, the full set can be cached directly on the Spotlight server providing a one stop shop for your curated Dataset.
I have pieced together a collection of data within a Spotlight Dataset with the goal of populating Tableau. When refreshing the data from my Tableau Workbook I notice a degree of latency. If I were interested in reducing this query latency, I could return to Spotlight, cache the Dataset, and then have direct access to the complete set from Tableau.
Or perhaps I want to control the state of my data when performing downstream activities against it. If the streams of data feeding my Dataset are updated every hour but I only want to see the data from the 0700 hour. In this case, I can have Spotlight cache the data on a schedule to reflect the window I am looking to target.
In order to cache your data, please do the following:
- Open a Workbench for the Workspace containing the target Dataset
- Select the Dataset from within the Workbench
- Locate and press the "Cache Disabled" button located on the right-hand side
- Press "Cache Now" to perform a one-time cache
- If you want to update this cache regularly, press "Enable Scheduler" and configure when you want the cache to refresh
In order to stop caching simply work through the above steps and disable slideers for "Enable Scheduler" and "Use Cached Data".