MSTICPy v1.0.0 and Jupyter Notebooks for CyberSec

With the recent release of v 1.0.0 of MSTICPy we thought it was a good time to do an overview article. This is based on an article in the Azure Sentinel Technical Community blog but since that one focuses on MSTICPy’s use in Azure Sentinel and MSTICPy is ostensibly SIEM-agnostic we thought it would be good to do another version of it here.

What is MSTICPy?

MSTICPy is a package of Python tools for security analysts to assist them in investigations and threat hunting. It is primarily designed for use in Jupyter notebooks. If you haven’t used notebooks for security analysis before we’ve put together a (completely unbiased) guide on why you should!

  1. Improve the usability of notebooks by reducing the amount of code needed in notebooks.
  2. Make the functionality open and available to all, to both use and contribute to.

1000 feet view

MSTICPy is organized into several functional areas:

  • Data Enrichment — focuses on components such as threat intelligence and geo-location lookups that provide additional context to events found in the data. It also includes Azure APIs to retrieve details about Azure resources such as virtual machines and subscriptions.
  • Data Analysis — packages here focus on more advanced data processing: clustering, time series analysis, anomaly identification, base64 decoding and Indicator of Compromise (IoC) pattern extraction. Another component that we include here but really spans all of the first three categories is pivot functions — these give access to many MSTICPy functions via entities (for example, all IP address related functions are accessible as methods of the IpAddress entity class.)
  • Visualization — this includes components to visualize data or results of analyses such as: event timelines, process trees, mapping, morph charts, and time series visualization. Also included under this heading are a large number of notebook widgets that help speed up or simplify tasks such as setting query date ranges and picking items from a list. Also included here are a number of browsers for complex data (like the threat intel and alert browsers) or to help you navigate internal functionality (like the query and pivot function browsers).

Companion Notebook

This article has a companion notebook. This is the source of the examples in the article and you can download and run the notebook for yourself. The notebook has some additional sections that are not covered in the article.

Notebook Initialization

Assuming that you have a blank notebook running what do you do next?

  1. Imports MSTICPy components.
  2. Loads and authenticates a query provider to be able to start querying data.

Wait! I don’t have a SIEM to query data from

As long as you can get your data into a pandas DataFrame, you can use most of MSTICPy and the examples in the notebook. We have a few “pickled” (use pandas pd.read_pickle()) or CSV sample data in these two locations:

Data Queries

Once this data provider is loaded and authenticated, we’re at the stage where we can start doing interesting things!

Timespans

Nearly all queries need time range parameters. You can specify these as parameters to the query function but who wants to type long date-time strings? It usually easier to use the QueryTime widget to set your desired time range and just pass it to the query. In the example below we can see how to load the QueryTime widget. You pass the widget itself to the query function, where the start and end values will be inserted into the query before being run.

  1. You can write a query from scratch as a string and just execute it.

Visualizing Data

Now that we can get data, let’s do something more interesting with it.

Event Timelines

One of the most basic but also most useful visualizations is to project events onto a timeline. You can do this using MSTICPy’s separate Timeline function or, more conveniently call it directly from a DataFrame using the mp_timeline pandas extension.

Process Tree

A process tree is another common visualization used when investigating endpoint (host) data.

Alert Viewer

As mentioned at the start, MSTICPy has a number of special-purpose viewers for things like alerts, where it is often difficult to see the required data in a when it’s in tabular format.

Data Enrichment with Threat Intelligence, WhoIs and GeoIP

MSTICPy contains many enrichment components for geo-location, ASN/whois, threat intelligence, Azure resource data and others.

Side note — Pivot functions

If what you want to do is entity-related, there is a good chance that the MSTICPy function will appear as an entity pivot function. Queries, enrichment functions and analysis functions that relate to a particular entity type are all exposed as pivot functions of that entity.

Back to Enrichment

A nice side benefit of pivot functions using DataFrames as both input and output is that we can chain several together in a pandas pipeline. Here we’re taking IP addresses from an alert and successively getting WhoIs data, geo-location data (after this we insert an mp_pivot.display function to peek at the intermediate data in the pipeline). Finally, we’re querying multiple Threat Intelligence providers to see if they have any data about the IP address. At each stage we’re asking for the new data obtained by each stage to be joined to the previous stage (via the join parameter) — although joining is optional.

Using advanced analysis (aka simple machine learning)

MSTICPy has several more-advanced analysis components that help with identifying anomalous patterns in large data sets.

Documentation and Resources

We’ve invested a lot of time in documentation (since nothing is more frustrating than an interesting package that you cannot find how to use!). Most of this is the form of user guides for the different components — see msticpy ReadTheDocs — and we also have a set of example notebooks for many components. You can also find a set of more applied notebooks in the Azure-Sentinel-Notebooks repo (although some of these could do with an update or two — which we’re working on). We do also try to document our code well so that even the API documents are often informative enough to work things out (if you find examples where this isn’t the case, please let us know).

Conclusion

Thanks for sticking with me in through this marathon article. I hope it has given you a flavor of the power of Jupyter notebooks in CyberSec hunting and investigation tasks and a reasonable overview of many of the capabilities in MSTICPy.

Take-aways and actions.

  • The very obvious first action is go and start playing with notebooks for your CyberSec investigations.
  • Second would be install MSTICPy and kick the tires. We are always looking for feedback on what does and doesn’t work and are always open to requests or suggestions for new features.
  • Read the docs.
  • If you’re feeling adventurous, consider contributing to MSTICPy. These could be ideas that you have that you think would be helpful to the CyberSec community. If you’re a bit stuck for ideas but love security and Python coding, we have a few ideas of our own and way too few people to implement them.
  • File an issue, feature request, create a PR or just poke around the code on our GitHub repo.
  • Read a summary of the latest release.
  • Follow me (@ianhellen), Pete (@MSSPete) and Ashwin (@ashwinpatil) on Twitter.
  • You can also reach us at msticpy@microsoft.com

This is the account of the Microsoft Threat Intelligence Center (MSTIC).

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store