ML Security with the Adversarial Robustness Toolbox

Part 3: Building an API for Model Defense Testing

17 min readMay 30, 2022

Written by Tigran Avetisyan.

This is PART 3 in the 3 PART series on the Adversarial Robustness Toolbox and ML security.

You can read PART 1 here and PART 2 here.

In PART 3, we are going to build a Python API that would allow you to test your TensorFlow Keras models against adversarial attacks. We are going to use the React framework to add a front end to the API as well.

Let’s start this tutorial with an overview of our application!

Overview of Our React Application

You can find the React application in this GitHub repository.

Our app implements the following attacks:

· Fast Gradient Method (evasion).

· Poisoning Backdoor (poisoning).

· Copycat CNN (extraction).

The app is designed for models trained on the MNIST digits dataset — the training/testing data is hardcoded in the API. TF/Keras models are expected as well. The app should work with various model architectures, but we haven’t tested how the app works with different models.

We provide pretrained test models in the app project’s directory models. The pretrained models are as follows:

· vulnerable_model_fgm.h5. This model is a basic model that is vulnerable to all attacks in implemented the app. You can use this model with any of the attacks, but it is intended to serve as a reference model for the Fast Gradient Method.

· robust_model_fgm.h5. This model has been trained with the Adversarial Trainer to be resistant to the Fast Gradient Method Attack.

· poisoned_model.h5. This is a model that’s been trained on poisoned data. We used ART’s poisoning backdoor attack to generate the poisoned data to train this model.

· postprocessed_model.h5. This model has a custom output layer that applies ART’s Reverse Sigmoid to standard softmax output.

We’ll take a look at the details of these models a little bit later.

If you want to train your own models from scratch, use the Jupyter notebooks in the directory models. The code you’ll find there is the exact same code we used to train our test models. reverse_sigmoid_model.ipynb contains the code for the model with the custom Reverse Sigmoid output, while poisoning_fgm_models.ipynb contains the code for the poisoned model and the model that is resistant to the Fast Gradient Method attack.

Prerequisites for the Project

To be able to build the API and run the React app, you will need Node.js (download from here) and Docker Desktop (download from here). Windows users can also use Git to run shell commands and handle GitHub repos.

For the Python side of things, you will need:

· FastAPI.

· Uvicorn.

· Pydantic.

· Aiofiles.

· TensorFlow.

· NumPy.

· Adversarial Robustness Toolbox.

You should have TensorFlow, Numpy, and ART if you’ve read PART 1 and PART 2.

To install the packages, use the command pip install [package-name]. Conda users should use conda install -c conda-forge [package-name] instead.

Building Utility Functions for the API

Before we start building our API, we need to make a few utility functions for it first. The code for these functions is located in api/util_functions.py in the web app’s directory. Let’s go over our utility functions one by one.

Importing dependencies

The dependencies for our utility functions are as follows:

We are essentially importing a few things to build a TF Keras model and to create a function for dataset poisoning.

train_step

The function train_step defines the training step for our model. We need to explicitly define a training step to be able to use ART’s wrapper class TensorFlowV2Classifier.

We are using TensorFlowV2Classifier instead of KerasClassifier because the latter is implemented in TF 1 and was not working properly in the API. The code below will help you get started with TensorFlowV2Classifier.

Here’s what our function train_step looks like:

On lines 3 and 4, we define the loss and optimizer objects for our train step to use. We are using categorical cross-entropy loss and the Adam optimizer — the usual stuff that we use to train models in our tutorials.

The function train_step does the following:

1. Predicts on the current set of inputs (line 15) and computes the loss for current predictions (line 16). We do these calculations in the context of tf.GradientTape so that TensorFlow can record the operations for automatic differentiation.

2. Calculates the gradients for the loss value with respect to our model’s weights (lines 19 to 21).

3. Applies the calculated gradients to our model’s weights (line 24).

If you want to learn more about automatic differentiation and its uses in TensorFlow, read this guide on the TF website. At a high level, directly using tf.GradientTape allows you to bypass your model’s method fit and define custom logic for training.

create_model

The function create_model looks like this:

This function helps us quickly initialize TF Keras models. It’s the same function as the one we used in PART 2.

poison_dataset

The function poison_dataset is intended to help us quickly poison datasets:

This function is identical to the function poison_dataset we used in PART 2, but there are a few differences.

First, we are not returning the poison indicator list is_poison because we don’t need it.

Secondly, along with the poisoned data, we are also returning clean labels y_clean so that we can assess our model’s performance on poisoned images with respect to clean labels. So alongside y_poison, which contains clean labels for clean samples and poisoned labels for poisoned samples, we also return the clean labels for the poisoned samples.

Class ReverseSigmoidLayer

Finally, we have the class ReverseSigmoidLayer, which inherits from tf.keras.layers.Layer. This is a custom layer that implements ART’s Reverse Sigmoid postprocessing defense.

The reason why we implemented this defense as a Keras layer is that ART doesn’t provide an easy way of saving its postprocessing defense as part of the model that you want to protect. Without a custom layer, you would need to manually initialize the postprocessing defense every time you wanted to deploy your unprotected model. With our custom layer, you can save and load the postprocessed model just as easily as any standard Keras model.

Let’s quickly go over the methods in our class:

· __init__ (lines 4 to 7) is the constructor where we get our parent class’s methods and attributes and initialize our own custom attributes.

· call (lines 10 to 53) is the method that defines the forward pass for our custom layer. Notice that we are passing the parameter training=None to be able to distinguish between training and inference. When the model is training, it will return standard softmax predictions without any postprocessing (lines 12 and 13). It will only apply postprocessing at inference time. We adapted the code for postprocessing from the source code of ART’s class art.defences.postprocessor.ReverseSigmoid.

· get_config (lines 56 to 62) allows us to serialize our custom layer. Without this method, TF would not be able to save our custom layer in an .h5 file.

· sigmoid (lines 65 and 66) just implements the standard sigmoid function.

If you want to learn more about creating custom layers in TF Keras, read this guide.

In our pretrained model postprocessed_model.h5, ReverseSigmoidLayer is the final layer. You can see the usage of ReverseSigmoidLayer in this Jupyter notebook.

In the notebook, we first trained a model without ReverseSigmoidLayer and added the layer to it only after training. This is so that we could have separate protected and unprotected models for comparison purposes. In practice, you could add ReverseSigmoidLayer to the model before training — the outputs wouldn’t be postprocessed at training time.

Building the API

Now that we have our utility functions, we can start building the API itself!

Importing dependencies and defining data models

First up, we import dependencies:

Next, let’s set up the data models for our endpoints:

The data models contain some of the parameters of their respective attacks. We haven’t implemented all their parameters to keep things simple.

For the Fast Gradient Method attack, you can adjust these parameters (FGMArgs):

· eps — the attack step size or input variation (effectively determines the strength of the attack).

· eps_step — the step size of eps for minimal perturbation computation.

· batch_size — the batch size for processing clean images.

For the poisoning backdoor attack, you can adjust the following parameters (BackdoorArgs):

· percent_poison — the fraction of the original data to poison.

· target_labels — the fake labels that you want to replace the original clean labels with. Our API experts a long string with integers that are separated with commas.

For the Copycat CNN attack, you can adjust the following parameters (CopycatCNNArgs):

· batch_size_fit and batch_size_query — the batch size for fitting the stolen model and querying the victim model respectively. You can use larger batch sizes to speed up extraction, but mind OOM (out of memory) issues.

· nb_epochs — the number of epochs that Copycat CNN trains the stolen model for. More epochs should bring better extraction results, but the training time will increase as well.

· nb_stolen — effectively determines the size of the training set for the stolen classifier. More queries equal better results at the cost of longer training times.

And as the last step for this section, let’s load the MNIST digits dataset to be able to run our attacks:

Loading our data like this means that the data is hardcoded into the API. This is fine for the purposes of our app, but you won’t be able to use the app for other datasets without modifying the API.

Initializing the API

We initialize the API and define its CORS policy like this:

On line 20, we initialize a dictionary that will store our uploaded models. You’ll see this dictionary in action when we start going over our endpoints.

Setting up API endpoints

Our API will have the following endpoints:

· upload-model — accepts TF Keras .h5 files and loads them to the web server.

· run-fgm — runs the Fast Gradient Method attack on your model.

· run-backdoor — runs the poisoning backdoor attack on your model.

· run-copycatcnn — runs the Copycat CNN attack on your model.

Let’s go over these endpoints one by one.

upload-model

The code for this endpoint is as follows:

upload-model requires two parameters:

· filename — the filename that the API should use to save the model.

· model — the model that we want to test.

In this endpoint, we do the following:

1. Read the provided model as bytes (line 10) and then save it in an .h5 file on the web server (line 13).

2. Load our model from the saved .h5 file (lines 16 to 19). Notice that we are passing our custom Reverse Sigmoid layer as custom_objects={“ReverseSigmoidLayer”: util_functions.ReverseSigmoidLayer}. This is necessary so that TF knows how to read the custom layer.

3. Compile the model (lines 22 to 26).

4. Wrap our model in the TensorFlowV2Classifier class (lines 29 to 35). Notice that we are passing the number of classes, the input shape, the loss object, and our train step function to the wrapper. Also notice that the wrapped model is saved in our dictionary app.classifiers under the key filename.

This endpoint will be used to not only upload the model that we want to test but also the reference vulnerable model that the Fast Gradient Method will use to generate adversarial images. The model that you want to test will be stored under the key tested_model, while the vulnerable model will be stored under the key vuln_model.

run-fgm

The endpoint run-fgm runs the Fast Gradient Method attack on the model you provide.

In this endpoint, we do the following:

1. Define the Fast Gradient Method attack, using the uploaded vulnerable model and the received arguments (lines 5 to 10).

2. Generate adversarial images (line 13).

3. Evaluate the performance of the supplied robust classifier on clean images (line 16) and adversarial images (line 17).

4. Return the model’s performance metrics, rounding them to the third decimal (lines 20 to 25).

run-backdoor

The endpoint run-backdoor runs the poisoning backdoor attack on your model.

In this endpoint, we:

1. Break down the received string with the target labels into integers (lines 5 to 12).

2. Poison our training dataset, using our utility function poison_dataset (lines 15 to 20). The function will replace a portion (percent_poison) of the clean labels with target_labels.

3. Evaluate the performance of the model on poisoned images with respect to the poisoned labels (line 23). poisoned_labels contains clean labels for clean images and poisoned labels for poisoned images.

4. Evaluate the performance of the model on poisoned images with respect to the clean labels (line 26). clean_labels contains clean labels for both clean and poisoned images.

5. Return the results (lines 29 to 34).

We’ll explain the reason for calculating the performance of the model on poisoned images with respect to both poisoned and clean labels when we start using the app.

run-copycatcnn

The final endpoint in our API is run-copycatcnn. This endpoint runs an extraction attack against your model using probabilistic Copycat CNN.

In this endpoint, we:

1. Initialize the attack, using the model that we want to test (lines 4 to 12).

2. Initialize the base model for Copycat CNN to train (lines 15 to 21).

3. Extract the victim classifier (lines 24 to 28).

4. Evaluate the performance of the victim classifier (line 31) and the stolen classifier (line 32) for comparison.

5. Return the results (lines 35 to 40).

Launching the API

Finally, we launch the API:

As a reminder, you can go to http://localhost:5000/docs to check out the interactive docs for your API. You will see information about your endpoints and their parameters there.

Using our Web Application

We can now see our API in action! We’ve integrated the API into a React application that allows you to run attacks on your model to evaluate its defenses. You can find the code for the application here.

Follow the instructions below to launch and use the app.

Cloning the app

First off, clone the app’s GitHub repository to your machine:

git clone https://github.com/tavetisyan95/art_web_app.git

You can use Git Bash if you are on Windows.

Setting up the API endpoint configuration

If necessary, you can change the API endpoint config in api/art_web_app/src/config.js:

Don’t put any backslashes in api_url. Additionally, keep in mind that config.js doesn’t actually define which host, port, or endpoints the API will use — it only helps our app send HTTP requests to the correct endpoints.

Launching the Docker container

To start the application’s Docker container, launch Docker Desktop. Then, navigate to the root directory of the application, launch the terminal, and run the following command:

docker-compose -f docker-compose.yaml up -d –build

It may take some time for the app to spin up. Once you see terminal messages that the containers are up, navigate to http://localhost:3000 in your web browser to open the application’s webpage.

ALTERNATIVELY, you can run the start.sh shell script to start the web app without Docker. Run the command bash start.sh in the terminal to launch the shell script. If you are on Windows, you can use Git Bash to run shell scripts. Note that you may need to modify the paths in the API and JS files for the app to work outside of Docker.

Running a Fast Gradient Method attack against your model

The first attack that you can try in the app is the Fast Gradient Method attack.

As mentioned above, you need to upload two models to the app’s web server to run a Fast Gradient Method attack:

· Tested model – this is the model that you want to test. You could use a robust model trained with the Adversarial Trainer as Tested model. You can train a robust model from scratch or use the pretrained model models/robust_model_fgm.h5.

· Vulnerable model — this is the model that the Fast Gradient Method will use to generate adversarial image samples. This model should be similar to the tested model. You can again train a vulnerable model from zero or use models/vulnerable_model_fgm.h5.

Our assumption for this attack is that if your model performs similarly both on clean and adversarial data, then it most likely is resistant to the attack.

To upload the models, click the buttons UPLOAD in their respective boxes and then click Upload Model to upload the tested model and Upload Vulnerable Model to upload the vulnerable model.

After that, you can adjust the parameters eps, eps_step, and batch_size. Use the default values for the first run.

Finally, to run the attack, click Run an FGM Attack. While the attack is running, you will see the message Running attack… in the button’s box. After the attack is complete, you will see the performance metrics of the tested model:

Our robust model achieves good performance, though its test loss is a bit high.

If you instead upload vulnerable_model_fgm.h5 as Tested model, your results would be like this:

The model performs very poorly on the adversarial data, so we can conclude that it’s not resistant to the Fast Gradient Method.

To conclude this attack, let’s try to upload robust_model_fgm.h5 both as Tested model and Vulnerable Model. Your results might look like this:

The attack was even more effective! Based on the results, we could conclude that robust models serve as a better basis for the Fast Gradient Method attack.

Running a poisoning backdoor attack

The second attack you can try in the app is the poisoning backdoor attack.

This attack only requires one model. You can use the model models/poisoned_model.h5 to test this attack. Or train a poisoned model from scratch if you want!

We trained poisoned_model.h5 with the perturbation add_pattern_bd. We generated fake labels by incrementing the real labels by 1. In other words, 0 -> 1, 1 -> 2, and so on, except for the digit 9 whose fake label is 0.

Now, remember that our endpoint run-backdoor returns test scores on poisoned images with respect to both poisoned and clean labels? This is to help us identify if there is actually a backdoor in the model.

This test could produce these three outcomes:

1. Metrics with respect to poisoned labels are worse than with respect to clean labels. This probably means that your model doesn’t have backdoors. The model performs well against clean labels because it can correctly identify the samples, while the performance against poisoned labels is bad because the backdoor in the samples doesn’t trigger the attacker’s desired output.

2. Metrics with respect to clean labels are worse than with respect to poisoned labels. This probably means that your model has a backdoor and that your target labels reflect the actual poisoned labels the model has been trained on. If the model performs well with respect to poisoned labels, it means that the backdoor in the samples triggers the attacker’s desired output.

3. Metrics with respect to both clean and poisoned labels are poor. This would probably mean that your model has a backdoor, but your chosen target labels aren’t the actual poisoned labels the model has been trained on. The model performs badly against clean labels because it has a backdoor, but the performance against poisoned labels is also bad because you’ve picked incorrect target labels.

Based on these assumptions, you should be able to determine whether or not your model is poisoned. Now, let’s see if these assumptions hold up in the real world!

Let’s upload poisoned_model.h5 to the web server and run a poisoning attack on it. After you upload the model, click Run a Backdoor Attack to run the attack.

For the poisoned model, your results should look something like this:

Because the model’s performance metrics are good against poisoned labels and bad against clean labels, we can assume that the model is poisoned.

Next, let’s try changing Target labels to see how the performance metrics change. Let’s type in ten 9s:

The results would be as follows:

The model performs poorly with respect to both poisoned and clean labels. Because the performance on the clean labels is bad, we can assume that the model has a backdoor. But because the performance with respect to our target fake labels is also bad, we can assume that the original fake labels weren’t all 9s (which is true). So the target labels differ from the fake labels that were used to train the model.

Finally, let’s try a model without a backdoor, like vulnerable_model_fgm.h5. Your results on a clean model should look like this:

With results like these, it would be reasonable to conclude that the model doesn’t have any backdoors.

To conclude this section, our assumptions from above appear to hold up!

Running an extraction attack with Copycat CNN

Finally, let’s try running a Copycat CNN attack.

You can use postprocessed_model.h5 for this test, as well as any of the other models we provide with the app. postprocessed_model.h5 uses the custom Reverse Sigmoid layer as its output, so it has some degree of protection against extraction attacks from probabilistic extractors. The other models don’t have any defense measures against extraction attacks.

You should interpret the results of this attack as follows:

· If the stolen model performs similarly to the victim model, it probably means that the victim model is NOT resistant to extraction attacks.

· Otherwise, if the stolen model performs notably worse than the victim model, it probably means that the victim model IS resistant to extraction attacks.

For the first run, try the default parameter values. Don’t increase any of the parameters too much if you don’t have a powerful machine.

Let’s try to run the attack on postprocessed_model.h5 first. The results would be as follows:

It appears that this model is resistant to the Copycat CNN attack because the accuracy of the stolen model is low.

Note that the metrics of the stolen model may vary. We’ve gotten accuracies ranging from 0.1 to about 0.5.

Additionally, note that the test loss of the postprocessed model is meaningless. The test loss is high, but that doesn’t mean that the model performs poorly.

The high loss is due to Reverse Sigmoid. At training time, the model calculated gradients based on standard outputs from the softmax activation function. Because Reverse Sigmoid changes the output probabilities at inference time, we get drastically higher losses. This is an issue that you would probably want to work around in a production setting by converting your Reverse Sigmoid outputs back to softmax.

Let’s now try an unprotected model — vulnerable_model_fgm.h5. Your results should look like this:

The stolen model performs close to the victim model, but it’s not quite there yet. Increasing the number of queries (nb_stolen) should allow you to drastically improve the performance of the stolen classifier.

Limitations of the App

Our app has a number of limitations that you should keep in mind, including:

· The MNIST digits dataset is hardcoded into the API. So the app is only useful for models that have been trained on MNIST digits.

· Only a fraction of the adjustable parameters for each attack have been implemented in the app.

· Only three attacks have been implemented, whereas ART has many more.

· The app doesn’t show detailed error messages. The only message in the UI that you will see is Something went wrong!. If you encounter errors, you’ll have to check your web browser’s console to see what exactly went wrong.

Next Steps

This concludes our 3 PART series on the Adversarial Robustness Toolbox. You now know how to use the Adversarial Robustness Toolbox to test the resistance of your ML models to adversarial attacks!

As your next step, you could try adding more attacks to the app. Even better, you could modify the app so that it provides insight beyond just test accuracies and losses! You can also use the code of the app as a launchpad for your own projects!

Until next time!