msticpy 0.5.1 release

MSTIC
3 min readMay 29, 2020

In the latest release of msticpy we have added a new data provider, called LocalData, that allows you to query and load locally stored data sets. It’s designed to assist you in testing, demonstrations, and creating example notebooks where querying of remote data sets is not practical.

Why do we need a data provider to read locally-stored data you might be asking? After all reading a pandas DataFrame from a comma-separated-variable (CSV) is as easy as:

pd.read_csv(“myfile.csv”)

The primary driver for this is to create a data provider that behaves in the same way as our online data providers.

For example, to query the alerts from Azure Sentinel you would enter:

alerts_df = qry_prov.SecurityAlert.list_alerts(start=T1, end=T2)

We have lots of code and notebooks that have that kind of construct. What the LocalData provider allows us to do is replace the “qry_prov” part with a provider that appears to do exactly the same thing but sources its data from local files.

How to Use It

First you need to assemble your data files — these can be either pickled pandas DataFrames or CSV files. It is recommend that you use the former since this keeps the best data formatting and type fidelity. CSV import can also be inconsistent when it comes to processing datetime fields unless you tell pandas exactly what to do with them. Pickled DataFrames must have a .pkl extension.

Once you have your files in a folder you can create the query definitions. You do this in a yaml file. The data for each query is very simple — you provide a name that you want the query to be called, a data family and the name of the data file. The data family can be any string but we use this to group related queries together in to a hierarchical collection. In this case we want to emulate the list_alerts query.

Here is a file showing two queries defined:

metadata:
version: 1
description: Local Data Alert Queries
data_environments: [LocalData]
data_families: [SecurityAlert, WindowsSecurity, Network]
tags: ['alert', 'securityalert', 'process', 'account', 'network']
defaults:
sources:
list_alerts:
description: Retrieves list of alerts
metadata:
data_families: [SecurityAlert]
args:
query: alerts_list.pkl
parameters:
list_host_logons:
description: List logons on host
metadata:
data_families: [WindowsSecurity]
args:
query: host_logons.csv
parameters:

The data_environments field in the top “metadata” section is important since it tells msticpy that the queries in this file belong to the local data provider. Save this file with a .yaml extension.

By default, the LocalData provider will try to find data files and yaml query definition files in the current directory.

# Creating a query provider with "LocalData" parameter
qry_prov = QueryProvider("LocalData")

To override this you can specify folders for each (they can be in the same folder) when you create the provider. Note each of these parameters takes a list of strings, so if you have only one folder, enclose in square brackets.

data_path = "./my_data"
query_path = "./myqueries"
qry_prov = QueryProvider("LocalData", data_paths=[data_path], query_paths=[query_path])

Now we can check which queries we have available:

qry_prov.list_queries()SecurityAlert.list_network_alerts
WindowsSecurity.list_host_logons

Each of these queries is an executable function. To execute, just type it in and add braces. Although the provider ignores parameter values it accepts any that you chose to give it (which is exactly what we want if we are trying to emulate one of our existing query providers).

alerts_df = qry_prov.SecurityAlert.list_alerts(start=T1, end=T2)

With a simple one-line change at the start of our notebook we can switch our code over to local cached data equivalent and run everything offline.

Read full details in at msticpy.readthedocs. Install msticpy from PyPi or check us out at GitHub. Thanks for reading.

--

--

MSTIC

This is the account of the Microsoft Threat Intelligence Center (MSTIC).