MSTICPy 0.8.0 Release

MSTIC
6 min readSep 21, 2020

We recently released 0.8.0 of MSTICPy. The significant features in the release are:

  • New widgets, a mechanism for selected widgets to persist and recall their state, and the ability to drive widget values from notebook parameters.
  • Data obfuscation functions — not something you’ll likely need during the average investigation but useful for disguising sensitive data if you are presenting it externally.
  • Interactive browsers for Data queries and Threat Intel results.

Widget Updates

New Widgets

We’ve added two new simple widgets.

GetText is a simple wrapper around the ipywidgets Text widget. Why would you use this rather than the Text widget? This is a RegisteredWidget (see more below) which lets it do some nice tricks not available in the Text widget.

nbwidgets.GetText(prompt=’Enter a value’, auto_display=True)

OptionButtons is a multi-option button with a timeout and default value. Again, this is nothing revolutionary but it does allow get an option from the user that doesn’t block notebook execution. In many case we’ve been using Python input built-in function to prompt the user for some input (e.g. “do you want to run this optional section of the notebook?”). The input function, however, blocks notebook execution, meaning that it’s impossible to run the notebook in an automated way — for example, with Papermill or in a CI pipeline.

If you just create and display an OptionButtons instance it behaves as you expect. You supply your options (a list of strings) as the buttons parameter — it will use “Yes”, “No”, “Cancel” if you don’t give it this parameter. You click on a button and the value is set to whatever you chose.

However, if you display it with the display_async method it will display a countdown timer. Choosing an option sets the value of the widget to the option and exits the timeout. If the timeout expires the value is set to the default option (this is the first option by…er…default, but you can override this using the default parameter).

# Using display_async will run the widget with a visible
# timer. As soon as one option is chosen, that remains as the value
# of the value of the widget.value property.
opt = nbwidgets.OptionButtons(description="Continue?", timeout=10)
await opt.display_async()

Note that since this method uses asyncio to run the timer, you need to invoke it with the await keyword.

Admission of failure: my original goal with this second widget was to actually block notebook execution for the timeout period but I couldn’t get this to work. I wanted to selectively block the Jupyter event loop, allowing it to process the async timer but not continue on to the next cell. I saw a couple of examples that supposedly did this but I couldn’t get them to work reliably. If anyone has a bright idea here I’d appreciate suggestions.

Registered Widgets

Have you ever typed some text or set a date range in a widget cell and then accidentally re-executed the same cell, wiping the data you just entered? I do it all the time (it’s easy to do since the focus is on the cell where the widget is displayed). The problem here is that when you re-execute the cell, you are typically creating a whole new instance of the widget object while the instance with your carefully-chosen time range is making its way to the Python garbage collector.

We’ve made a few of our widgets a bit smarter to help avoid this. These registered widgets save their current values in a per-kernel registry (this is an in-memory dictionary so only works for a single kernel session). The widgets use a subset of the parameters specified to calculated a unique-ish ID. The use this ID as a key to determine which registry “slot” in which to save (or retrieve) the data. For the GetText widget, the ID parameter set is typically just the description parameter. Here’s an example:

The second instance of the widget remembered that I’d entered “Ian” into its first incarnation. Other RegisteredWidgets include the QueryTimes and EnvironmentString.

Note: If you have multiple instances of the same widget with the same input parameters, they will overwrite each other, so be sure to use unique parameter values, if you want to use this feature.

Read more about this functionality here.

Populating Widget Values with Notebook Parameters

This second feature of registered widgets lets you use parameters to pre-populate the widget. This is in anticipation of some work we are doing to make our notebooks work in an unattended execution environment like Papermill. The setup is a little clunky but this is a somewhat advanced option.

First you need to set up a dictionary specifying which widget values should be populated and the names of the predefined variables to populate them (you might need to scour the source code if you want to use this). Papermill parameters work by inserting a parameter cell at the start of the notebook that initializes one or more variables to the parameter values.

params_dict = {
"widget_attr1", "var_name1",
"widget_attr2", "var_name2"
...
}

Then you run the widget passing this dict as the nb_params parameter. If the variables in the dictionary are defined in the global namespace, these will be used to initialize the widget.

Text widget showing pre-population of values from papermill-defined variables

Data Obfuscation functions

This is something that you don’t often need but when you do need it, it can be pretty painful to implement. I write documents and give demonstrations of notebooks. Sometimes it’s OK to use synthetic data but occasionally you might want to show details of a case that involves real customer/user data. On these occasions it’s imperative to protect privacy, while still giving a reasonable representation of the data.

The obfuscation functionality uses hashing and random lookup tables to obfuscate data. It tries to do so intelligently, so that the original structure of the data is preserved — e.g. an obfuscated IPV4 address or SID or UUID still looks like what it is supposed to be rather than something like “fmlmbnlpdcbnbnn”.

There are a set of data-type specific obfuscation functions for this like IP address and UUID.

> hash_ip('192.168.3.1')
160.21.239.194

> hash_ip('2001:0db8:85a3:0000:0000:8a2e:0370:7334')
85d6:7819:9cce:9af1:9af1:24ad:d338:7d03

> hash_ip('['192.168.3.1', '192.168.5.2', '192.168.10.2']')
['160.21.239.194', '160.21.103.84', '160.21.149.84']

There is also functionality to apply these functions to an entire pandas DataFrame. The DataFrame functionality uses a mapping dictionary to decide which transform to apply to a given column. You can also supply your own mapping to supplement or override the built-in mappings.

Read more about this here.

Performance Note: the obfuscation of DataFrames is not particularly speedy. Unless you have a lot of time on your hands, I’d recommend only obfuscating the results sets that you intend to show.

Query Browser and TI Browser

The Threat Intel results browser has been around for a few weeks but I think we may have forgotten to mention it in the previous blog. The Query Browser is new. It allows you to step through each data query available for a query provider (Azure Sentinel, Splunk, etc.) and see the required parameters, the actual query that will be run, and sample usage.

Query browser

We’ll be making some changes to allow you to invoke this browser directly from the query provider (just as you can call list_queries()).

The TI Browser works in the same way; with some basic formatting to make things easier to read.

You can also invoke the TI Browser directly from the TILookup class.

TILookup.browse_results(ti_df)

Other Things

There are a few other minor fixes and improvements in this release. You can read the details on our GitHub repo.

--

--

MSTIC

This is the account of the Microsoft Threat Intelligence Center (MSTIC).