We’ve decided that it’s time to bring MSTICPy out of beta and let it fend for itself in the big wide world.
The 1.0 release has some significant features that we’ll be testing and would like you to try out:
- Implementing pip “extras” to split dependencies into optional chunks
- Settings editor and management tools
- User defaults — preload data providers and other components automatically when you initialize MSTICPy at the start of each notebook
- SQL to KQL translator
- Pivot functions updates
Installing
Since this is currently a pre-release it won’t install with pip install msticpy
You’ll need to specify the version
pip install msticpy==1.0.0rc3
By all means check for future release candidates at PyPI and try them out — MSTICPy PyPI Versions.
MSTICPy Dependencies and “extras”
MSTICPy has grown organically over the past 2+ years. As we’ve added new functionality, more Python packages have been added as dependencies. As a result, every time you install or upgrade MSTICPy you get all of those dependencies whether you need them or not. This creates slow install times and more opportunities for package version conflicts.
Python setuptools supports the concept of “extras”. The setuptools docs don’t do a great job of describing this but it’s essentially a way that you can parameterize your pip install package
command to say that you want a superset of the core dependencies. For example, in MSTICPy we use Python statsmodels, which in turn requires scipy, in our time series analysis — both of which are sizeable packages. Previously you would always get these even if you never did any time series analysis. From v1.0.0, if you want these additional package, you install MSTICPy with a “timeseries” extra parameter.
pip install msticpy[timeseries]
We’ve done some work to guard the code that uses extras, so if you try to use a component for which the dependency isn’t installed, you’ll get a somewhat friendly message telling you what to do.
More details of how we’ve broken out the dependencies and what extras are available are in our Installing documentation.
Settings editor and settings management
Much like the dependencies the MSTICPy settings (contained in msticpyconfig.yaml
have grown organically and are now quite intimidating for a new user to understand and manage.
We’ve created an interactive settings editor (for use in Jupyter notebooks) that let’s you view, change and add settings.
Among other things, the tool allows you to migrate secrets to be stored in Azure Key Vault (as shown in the example above). Open an notebook, install MSTICPy and run the following:
# set an environment variable pointing to your existing config file
# %env MSTICPYCONFIG=./msticpyconfig2.yaml
mpedit = MpConfigEdit()
mpedit
There is also a tool that lets you load and validate settings, import workspace settings from Azure Sentinel and view your Key Vault secrets. MpConfigFile can be used as a GUI or programmatically.
Read more about these features in our documentation.
User Defaults
At the start of each notebook it’s common to have to load and initialize a bunch of components: data query provider(s), threat intel. providers, geo-ip providers, and so on. With some of the new capabilities in MSTICPy, such as Pivot functions, we wanted to be able to speed up this repetitive start-up process. We’ve added a new settings section (also exposed in the MpConfigEdit settings editor) where you can specify which data query providers and other components you want loaded.
This is all driven from the init_notebook()
function, which is usually one of the first things you’d see happening in our notebooks. There is some support for passing parameters to each component from the auto-load functions. Most components will have their own section in msticpyconfig.yaml
anyway but in some cases you can override or supplement these with the auto-load settings. For example, you can specify whether a data query provider should connect (authenticate) on loading or just sit there waiting for you to provide the authentication details.
As each component is loaded it is assigned to a variable in the Jupyter kernel global namespace (in many cases you can specify the names of these variables). If you forget what they are, you can find them in an attribute of the msticpy
main module — current_providers
>>> msticpy.current_providers
{'qry_azsent': <msticpy.data.data_providers.QueryProvider at 0x21604110ac8>,
'qry_myworkspace': <msticpy.data.data_providers.QueryProvider at 0x216041459c8>,
'qry_cybersoc': <msticpy.data.data_providers.QueryProvider at 0x21660d41308>,
'qry_splunk': <msticpy.data.data_providers.QueryProvider at 0x21661127208>,
'qry_local': <msticpy.data.data_providers.QueryProvider at 0x216605a7c48>,
'ti_lookup': <msticpy.sectools.tilookup.TILookup at 0x216611c7908>,
'geoip': <msticpy.sectools.geoip.GeoLiteLookup at 0x21660659c88>,
'pivot': <msticpy.datamodel.pivot.Pivot at 0x216602d8e88>,
'az_data': <msticpy.data.azure_data.AzureData at 0x21668aaf708>,
'azs_api': <msticpy.data.azure_sentinel.AzureSentinel at 0x21603f42388>,
'nb': <module 'msticnb' from 'e:\\src\\msticnb\\msticnb\\__init__.py'>}
SQL to KQL Translator
This is an “experimental” feature (“experimental” here meaning that we’re not going to invest a lot of time fixing bugs or adding new features). We wrote it for some internal experimentation and thought that it might be useful to others, so added it to MSTICPy. It uses Mozilla’s awesome moz_sql_parser to do the hard work of parsing the SQL statements. The module will translate a SQL query into a Kusto Query Language (KQL) equivalent. It supports a subset of Spark SQL syntax and functions as well as many ANSI SQL constructs and functions. These include:
- Where and Select
- Joins and Unions
- Sub-queries
- Grouping
You can also supply a table-mapping dictionary to rename table references in the output query (no support for renaming columns though!).
sql = """
SELECT Message
FROM apt29Host a
INNER JOIN (
SELECT ProcessGuid
FROM apt29Host
WHERE Channel = "Microsoft-Windows-Sysmon/Operational"
AND EventID = 1
AND LOWER(ParentImage) RLIKE '.*partial_string.*'
AND LOWER(Image) LIKE '%cmd.exe'
) b
ON a.ParentProcessGuid = b.ProcessGuid
WHERE Channel = "Microsoft-Windows-Sysmon/Operational"
AND EventID = 1
AND LOWER(Image) LIKE '%powershell.exe'
"""
table_map = {"apt29Host": "SecurityEvent", "MyTable": "SigninLogs"}
kql = sql_to_kql(sql, table_map)
print(kql)
Output
SecurityEvent
| join kind=inner (SecurityEvent
| where Channel == 'Microsoft-Windows-Sysmon/Operational'
and EventID == 1
and tolower(ParentImage) matches regex '.*partial.string.*'
and tolower(Image) endswith 'cmd.exe'
| project ProcessGuid) on $left.ParentProcessGuid == $right.ProcessGuid
| where Channel == 'Microsoft-Windows-Sysmon/Operational'
and EventID == 1
and tolower(Image) endswith 'powershell.exe'
| project Message
This might be useful as an entry point for someone learning KQL and also if you have a bunch of SQL queries that you want to bulk translate (which was our motivation for creating it).
Please don’t complain too much about either the quality of our KQL output or my poor code (I know I should have used the Visitor pattern — one day, when I get time!).
Read more about this feature here.
Pivot Updates
We released the pivot functions feature earlier this year. This release adds a few significant updates (although we have a bunch more to follow soon):
- Pivot browser — that shows you entities and their pivot functions along with the help documentation for each function. You can also search for pivot functions by keyword of anything in the name or docstring of a function.
- Persistent pipelines — we’ve extended this feature to allow you to save multi-step pandas pipelines as yaml files and re-execute them with different input DataFrames. We’ve also added a couple of useful pandas accessor methods that you can use in your pipelines. As a reminder, an example of a pandas pipeline using pivot functions is shown below, along with the output.
(
suspicious_ips
# Lookup IPs at VT
.mp_pivot.run(IpAddress.ti.lookup_ipv4_VirusTotal, column="IPAddress")
# Filter on high severity
.query("Severity == 'high'")
# lookup whois info for IPs
.mp_pivot.run(IpAddress.util.whois, column="Ioc", join="left")
# display sample of intermediate results
.mp_pivot.display(title="TI High Severity IPs", cols=["Ioc", "Provider", "Reference"], head=5)
.mp_pivot.tee(var_name="ti_whois_df")
# Query IPs that have login attempts
.mp_pivot.run(IpAddress.AzureSentinel.list_aad_signins_for_ip, ip_address_list="Ioc")
# Send the output of this to a plot
.mp_timeline.plot(
title="High Severity IPs with Logon attempts",
source_columns=["UserPrincipalName", "IPAddress", "ResultType", "ClientAppUsed", "UserAgent", "Location"],
group_by="UserPrincipalName"
)
)
A pipeline packaged as a YAML file would look something like this. This example is truncated for space reasons and is for a different pipeline than shown above.
pipelines:
pipeline1:
description: Pipeline 1 description
steps:
- name: get_logons
step_type: pivot
function: util.whois
entity: IpAddress
comment: Standard pivot function
params:
column: IpAddress
join: inner
- name: disp_logons
step_type: pivot_display
comment: Pivot display
params:
title: "The title"
cols:
- Computer
- Account
query: Computer.str.startswith('MSTICAlerts')
head: 10
- name: tee_logons
step_type: pivot_tee
comment: Pivot tee
params:
var_name: var_df
clobber: True
Read more about the pivot functionality here.
Conclusion
That’s all for this post — next time we hope to be announcing this as v1.0.0!!!
GitHub release notes for this release.
Please send any feedback to msticpy@microsoft.com or file an issue at our GitHub repo https://github.com/microsoft/msticpy/issues.
Feel free to reach out to me on twitter — ianhellen.