Use Case Support
Hello Team Datameer
I have a question in relation of approaching a simple use case. I am unable to find documentations or tutorials available regarding to the mechanics to be employed.
There is cron jobs importing inventory data that contains resource types and I'd need to provide a volumetric of the inventory across several time intervals (i.e. the last hour, the last 7 days, the last month, etc).
Please can I seek support from Datameer for some guidance?
thanks,
Anson
-
Hello Anson.
May I ask you to share more details about the use-case, please.- How exactly does the imported inventory data look like (a few records mockup will be sufficient)?
- What result you need to achieve (please make a sample based on the mockup from the first question)?
Thank you in advance.
-
Good morning Konsta - Yes, Absolutely. I will list my workflow and logic here.
Step 1 - Using JSON_ELEMENTS, JSON_KEYS, and JSON_VALUE parsing a JSON document that describes the inventory of a Cloud Resource, the output is below
Step 2 - Based on the parsed data, I'd like to achieve something like a "Pivot Table" that provides numeric volumetric of the resource by attributes, in this case, the number of AWS EC2 Instances and further breaking the number of AWS EC2 Instances by attributes such as "InstanceType", "AvailabilityZone", etc.
We want to establish a simple table that can be visualized with the answer to question of (What do I have in my EC2 inventory now? 7 days ago? 2 weeks ago? a month ago?)
Question: As the raw data is a snap-shot taken at time X, and the only identifable timestamp is "dasLastModfied" from the Datameer Import. I also wanted to seek guidance on how to set this cron Import job of creating a database that can support the outcome of Step 2. Should we set cron job to "append" data? and use "dasLastModified" column to filter the wanted volumetric in the output?
Thank you very much. please let me know if further information is required and I am very happy to get on a call to explain. Much appreciated.
Anson
-
Hello Anson.
If you do not have a data creation timestamp in your dataset, you could use the following approach.- Create an ImportJob and include the dasJobExecutionStartTime column into the schema at the DefineFields step of the artifact's configuration dialog. I'm not sure how your data is configured and updated, that's why I can't confirm that the dasLastModified column is something you could use. But the dasJobExecutionStartTime column will have the date when the job has been triggered. If you run the job daily, you then could identify on which day particular records have been ingested.
- Depends how you want to transform the data further, you could add time-based partitions to store dates ingested on a particular day/hour separately and then use a partition filter in a Workbook to choose what portion of data you'd like to work with. Partitioning Data in Datameer.
- At the Schedule step of the artifact's configuration dialog, choose the job's execution time and set the Data Retention policy to Append in order to accumulate the data in the Datameer storage.
The above configuration of the ImportJob allows you to have an ingestion timestamp included in the dataset as a separate column, as well as have the data stored in time-based partitions.
The next step would be to apply a required transformation to the raw dataset and use e.g. a Pivot Sheet to bring the data to the desired form.
I hope this helps. Let me know if you have any questions.
-
Anson, apologies. I had enabled you but tagged you incorrectly, so you got hit by the admin certification trigger I was trying to bypass for you explicitly.
You should be able to open tickets now, I've verified the tags on your account against the trigger in question, and they are now correct.
Please sign in to leave a comment.
Comments
6 comments