MSTICPy has been around for nearly 2 and half years so we decided that it was time to bring it out of beta and let it roam free in the big wide world.
(This article is based on the earlier MSTICPy release candidate article but does contain some updates, so please scan through, even if you read the previous article).
The 1.0 release has some new features and some significant updates to existing features:
- Implementing pip “extras” to split dependencies into optional chunks for faster installation
- Settings editor and management tools for easier configuration
- User defaults — preload data providers and other components automatically when you initialize MSTICPy at the start of each notebook
- SQL to KQL translator for cross compatibility
- Pivot functions updates
- Improvements to notebook initialization
Installing
You can install the current version with(but see the next section on “extras”)
pip install msticpy
MSTICPy Dependencies and “extras”
MSTICPy has grown organically over the past 2+ years. As we’ve added new functionality, more Python packages have been added as dependencies:
- Common packages such as pandas, numpy and bokeh
- Security/network-specific packages like tldextract, dnspython and ipwhois.
- Kqlmagic —the core client library for Azure Sentinel queries.
- More specialized packages used by only one or two MSTICPy components: scipy, statsmodel, scikit-learn, virustotal, splunk.
Including all of these in the base install means that, every time you install or upgrade MSTICPy, you got all of those dependencies whether you needed them or not. This created slow install times and more opportunities for package version conflicts.
Python setuptools supports the concept of “extras”. The setuptools docs don’t do a great job of describing this but it’s essentially a way that you can divide your dependencies into core and optional subsets. Installing as normal just installs the core dependencies. However, you can add parameters to your pip install
command to say that you want some additional dependencies installed as well. For example, in MSTICPy we use Python statsmodels (which in turn requires scipy) in our time series analysis — both of which are sizeable packages. Previously you would always get these even if you never did any time series analysis. From v1.0.0, if you want these additional package, you install MSTICPy with a “ml” extra parameter (this also includes Scikit-learn, hence the “ml”).
pip install msticpy[ml]
One of our core dependencies— Kqlmagic — has also released a version using extras. This is a huge benefit to MSTICPy users since around half of our previous dependencies were things brought in by the Kqlmagic default install. The Kqlmagic “extra-ized” version is still in pre-release but because the benefits of use KqlmagicCustom were so great, we decided to switch to this. Depending on what is already installed in your environment, you can expect the MSTICPy installation to take as little as 10–20% of the previous install time.
We’ve also done some work to guard the code that uses extras, so if you try to use a MSTICPy component for which the dependency isn’t installed, you’ll get a somewhat friendly message telling you what to do.
More details of how we’ve broken out the dependencies and what extras are available are in our Installing documentation.
Settings editor and settings management
Much like the dependencies, the MSTICPy settings (contained in msticpyconfig.yaml
) have grown organically and are now quite intimidating for a new user to understand and manage.
We’ve created an interactive settings editor (for use in Jupyter notebooks) that let’s you view, change and add settings.
As well as giving you a visual editing tool for settings, the tool also lets you upload secrets (like API keys) to be stored to (and used from) Azure Key Vault (as shown in the example above)
To use the editor, open an notebook and run the following:
# set an environment variable pointing to your existing config file
# %env MSTICPYCONFIG=./msticpyconfig2.yaml
mpedit = MpConfigEdit()
mpedit
There is also a companion tool — MpConfigFile — that lets you load and validate settings, import workspace settings from Azure Sentinel and view your Key Vault secrets. MpConfigFile can be used as a GUI or programmatically.
Read more about these features in our documentation.
User Defaults
At the start of each notebook it’s common to have to load and initialize a bunch of components: data query provider(s), threat intelligence providers, geo-ip providers, and so on. With some of the new capabilities in MSTICPy, such as Pivot functions, we wanted to be able to speed up this repetitive start-up process. We’ve added a new settings section (also exposed in the MpConfigEdit settings editor) where you can specify which data query providers and other components you want loaded automatically.
The auto-load of components is all driven from the init_notebook()
function, which is usually one of the first things you’d see happening in our notebooks. There is some support for passing parameters to each component from the auto-load functions. Most components will have their own section in msticpyconfig.yaml
but in some cases you can override or supplement these with the auto-load settings. For example, you can specify whether a data query provider should connect (authenticate) on loading or just load and wait for you to provide the authentication details.
As each component is loaded, it is assigned to a variable in the Jupyter kernel global namespace (in many cases you can specify the names of these variables). If you forget what they are, you can find them in an attribute of the msticpy
main module — current_providers
>>> msticpy.current_providers{'qry_azsent': <msticpy.data.data_providers.QueryProvider at 0x21604110ac8>,
'qry_myworkspace': <msticpy.data.data_providers.QueryProvider at 0x216041459c8>,
'qry_cybersoc': <msticpy.data.data_providers.QueryProvider at 0x21660d41308>,
'qry_splunk': <msticpy.data.data_providers.QueryProvider at 0x21661127208>,
'qry_local': <msticpy.data.data_providers.QueryProvider at 0x216605a7c48>,
'ti_lookup': <msticpy.sectools.tilookup.TILookup at 0x216611c7908>,
'geoip': <msticpy.sectools.geoip.GeoLiteLookup at 0x21660659c88>,
'pivot': <msticpy.datamodel.pivot.Pivot at 0x216602d8e88>,
'az_data': <msticpy.data.azure_data.AzureData at 0x21668aaf708>,
'azs_api': <msticpy.data.azure_sentinel.AzureSentinel at 0x21603f42388>,
'nb': <module 'msticnb' from 'e:\\src\\msticnb\\msticnb\\__init__.py'>}
SQL to KQL Translator
This is an “experimental” feature (“experimental” here meaning that we’re not going to invest a lot of time fixing bugs or adding new features). We wrote it to do some bulk translation of Spark SQL queries for OTRF Mordor and thought that it might be useful to others, so added it to MSTICPy. It uses Mozilla’s awesome moz_sql_parser to do the hard work of parsing the SQL statements. The module will translate a SQL query into a Kusto Query Language (KQL) equivalent. It supports a subset of Spark SQL syntax and functions as well as many ANSI SQL constructs and functions. These include:
- Where and Select
- Joins and Unions
- Sub-queries
- Grouping
You can also supply a table-mapping dictionary to rename table references in the output query (no support as yet for renaming columns though!).
sql = """
SELECT Message
FROM apt29Host a
INNER JOIN (
SELECT ProcessGuid
FROM apt29Host
WHERE Channel = "Microsoft-Windows-Sysmon/Operational"
AND EventID = 1
AND LOWER(ParentImage) RLIKE '.*partial_string.*'
AND LOWER(Image) LIKE '%cmd.exe'
) b
ON a.ParentProcessGuid = b.ProcessGuid
WHERE Channel = "Microsoft-Windows-Sysmon/Operational"
AND EventID = 1
AND LOWER(Image) LIKE '%powershell.exe'
"""
table_map = {"apt29Host": "SecurityEvent", "MyTable": "SigninLogs"}
kql = sql_to_kql(sql, table_map)
print(kql)
Output
SecurityEvent
| join kind=inner (SecurityEvent
| where Channel == 'Microsoft-Windows-Sysmon/Operational'
and EventID == 1
and tolower(ParentImage) matches regex '.*partial.string.*'
and tolower(Image) endswith 'cmd.exe'
| project ProcessGuid) on $left.ParentProcessGuid == $right.ProcessGuid
| where Channel == 'Microsoft-Windows-Sysmon/Operational'
and EventID == 1
and tolower(Image) endswith 'powershell.exe'
| project Message
This might be useful as an entry point for someone learning KQL and also if you have a bunch of SQL queries that you want to bulk translate (which was our motivation for creating it).
Please don’t complain too much about either the quality of our KQL output or my poor code (I know I should have used the Visitor pattern — one day, when I get time!).
Read more about this feature here.
Pivot Updates
We released the pivot functions feature earlier this year. This release adds a few significant updates (although we have a bunch more to follow soon):
- Pivot browser — that shows you entities and their pivot functions along with the help documentation for each function. You can also search for pivot functions by keyword of anything in the name or docstring of a function.
- Persistent pipelines — we’ve extended this feature to allow you to save multi-step pandas pipelines as yaml files and re-execute them with different input DataFrames. We’ve also added a couple of useful pandas accessor methods that you can use in your pipelines. As a reminder, an example of a pandas pipeline using pivot functions is shown below, along with the output.
(
suspicious_ips
# Lookup IPs at VT
.mp_pivot.run(IpAddress.ti.lookup_ipv4_VirusTotal, column="IPAddress")
# Filter on high severity
.query("Severity == 'high'")
# lookup whois info for IPs
.mp_pivot.run(IpAddress.util.whois, column="Ioc", join="left")
# display sample of intermediate results
.mp_pivot.display(title="TI High Severity IPs", cols=["Ioc", "Provider", "Reference"], head=5)
.mp_pivot.tee(var_name="ti_whois_df")
# Query IPs that have login attempts
.mp_pivot.run(IpAddress.AzureSentinel.list_aad_signins_for_ip, ip_address_list="Ioc")
# Send the output of this to a plot
.mp_timeline.plot(
title="High Severity IPs with Logon attempts",
source_columns=["UserPrincipalName", "IPAddress", "ResultType", "ClientAppUsed", "UserAgent", "Location"],
group_by="UserPrincipalName"
)
)
A pipeline packaged as a YAML file would look something like this. This example is truncated for space reasons and is for a different pipeline than shown above.
pipelines:
pipeline1:
description: Pipeline 1 description
steps:
- name: get_logons
step_type: pivot
function: util.whois
entity: IpAddress
comment: Standard pivot function
params:
column: IpAddress
join: inner
- name: disp_logons
step_type: pivot_display
comment: Pivot display
params:
title: "The title"
cols:
- Computer
- Account
query: Computer.str.startswith('MSTICAlerts')
head: 10
- name: tee_logons
step_type: pivot_tee
comment: Pivot tee
params:
var_name: var_df
clobber: True
Note that we’ve completed a further update of Pivot functions that will have the following improvements:
- Shortened, friendlier names for many functions
- Shortcuts — a subset of more popular functions are available as direct methods of the entities (rather than in a subcontainer)
- Entity instances as input — for “shortcut” methods, these also work as instance methods. This means that if you have an initialized IpAddress object with the “Address” attribute populated, you can run pivot functions from this entity. Rather than supplying the address as a parameter, the pivot will take the Address value of the entity as the input to the function.
- Several new pandas “accessor” extensions allowing you to do simple row and column filtering, sort data, expand lists and json values to rows and columns
- All pivot functions will support “join” functionality.
We decided not to rush this into v1.0.0 but look for a v1.1.0 in the next couple of weeks with this update.
Read more about the pivot functionality here.
Notebook Startup and Initialization
MSTICPy has a module nbinit
and we use the main function in that module
(init_notebook
) to streamline a lot of the setup needed to get a
notebook up and running. It does things like: import modules, install required packages, check configuration (and now auto-load components — see earlier section User Defaults for more details).
We’ve made a few significant updates to this module including:
- Defaulting to using the cool native data browser in Azure Machine Learning notebooks
- Tidying up and minimizing output so that warnings are clearer and
help URLs are displayed clearly
- Enabling single sign-on for Azure Sentinel notebooks using Azure CLI (all notebooks in a session will use the Azure CLI authenticated session to obtain tokens for your Azure Sentinel workspaces). To use this just run
az login
from an AzureML compute terminal or just run!az login
from a notebook cell. - Plus a bunch of other useability fixes and improvements.
In combination with this release of MSTICPy, we are releasing an update of our Azure Sentinel Notebooks to take avantage of these improvements.
Conclusion
It’s a huge relief to finally get MSTICPy v1.0.0 out of the door and be able to replace the “Beta” tag on PyPi with a “Production” one! Thanks to my co-workers and collaborators — especially Pete Bryan — for devoting endless hours to make this possible.
Read all the gory details in the GitHub release notes for this release.
Please send any feedback to msticpy@microsoft.com or file an issue at our GitHub repo https://github.com/microsoft/msticpy/issues.
Feel free to reach out to me on twitter — ianhellen.