← Back to Resources
blogSeptember 16th 2020 • 7 m read

Auto augmentations: optimize through image augmentations

Event detection models are a foundation of Assaia ApronAI technology. The events happening around any aircraft on a stand are tracked by cameras and then fed to detection mechanisms. These mechanisms provide Assaia with real-time data about how a turnaround unfolds.

The quality of this data depends mostly on the quality of the detection mechanisms, which may differ from stand to stand, or from event to event. In this article, we will demonstrate typical challenges our computer vision system faces and discuss an efficient solution for dealing with them — AutoAugmentations.

Challenging data

Sometimes, there is just no way to see the region of interest because of vehicles moving around, weather-caused corruptions, lighting changes, etc.


Obstacles that make it difficult to detect the events that are happening on the aft part of the aircraft.

This may corrupt the signal (the probability of an event happening measured at each timestep) and may cause wrong representation of the data.

The common way to overcome these obstacles is to augment the data with a set of corruptions in order to train more corruption-sustainable deep learning models which generalise better. Such an approach not only helps to create a system that is robust to occlusions and other obstacles, but also leads to creation of models which usually generate better results.


Choose wisely

So, if applying augmentations to a dataset improves the overall quality of the models, then why doesn't one simply stack all possible augmentations and apply them to the training set? The answer is straightforward: the more augmentations you apply, the more information you lose. Some cases become impossible to solve because of severe information corruption.

For example, imagine that you have to define what road sign you see. Compare three different images with different corruption intensity in the sense of detecting the stop sign:


If you blur the image too hard it is almost impossible to determine what sign it is.

So, how does one define what augmentation to use and how hard the intensity should be?

Overview of the Methods

In this section, we will review the main strategies of creating a right set of augmentations.

The first method to answer the question above is a random search algorithm. Its idea is simply to pick several random augmentation policies and to compare their resulting quality. It is easy to use, but each random pick doesn't use any information about the previous picks.

One can easily improve the method above by defining the search space more precisely and turning the random search approach into the grid search. This method will avoid policies that are out of search space, therefore, the policies may become more consistent with your dataset. This approach still has the disadvantage of not using the information of previously picked policies.

In 2019, Google Brain has come up with a completely different approach. This approach treats the problem of searching good augmentation policy as an optimization problem, solved by Reinforcement Learning methods.

Specifically, you have to define:

    • augmentations search space

    • controller (RNN-based policy generator)

    • child model (model that is trained with given augmentation policy)

    • optimization method - (Proximal Policy Optimization Algorithm)

According to the paper, ablation experiments indicate that even data augmentation policies that are randomly sampled from the defined search space can lead to improvements on CIFAR-10 over the baseline augmentation policy. However, the improvements exhibited by random policies are smaller than those shown by the AutoAugment policy (2.6% ± 0.1% vs. 3.0% ± 0.1% error rate).

In general, the computational cost of training such an approach may be very big, because usually RNN-controller may need large amount of optimization steps.

So now we are stuck between too simple and too complex methods for creating augmentations. But is there any other method to search for optimal augmentation policy which is efficient and not too complex? It turns out there is one.

Bayesian optimization

There exists a compromise approach between all the methods mentioned above. It doesn't take that much computational costs as RL-based method takes, but uses the information from optimization steps that were done before.

The class of such methods uses the Bayesian Optimization instead of RNN-controller + Proximal Policy Optimization Algorithm. Therefore, the computational cost of each iteration becomes lower, but the information of previous optimization steps is being utilized.

The overall scheme of the Bayesian Optimization (BO) method is illustrated below.


The child model is a lightweight deep learning model. It is used to approximately estimate the learning quality with the given augmentation policy. The quality is presented by a value that may be fed to optimizer as the optimization loss. The choice of the child model and its training hyperparameters are considered by the experimenter, and the overall pipeline complexity depends on this choice.

Also, the experimenter can choose between different BO algorithms which may be based on Gaussian processes, Random Forests, Tree Parzen Estimators.

The table below describes practical difference between the mentioned methods:

Screen Shot 2020-09-16 at 11.36.23.png

Search space

We were inspired with Google's work while designing the search space. The search space is a space of policies, each of policies consists of 5 sub-policies which are two transformations with their probability of application (p) and magnitude (m).

An image below illustrates an example of a policy.


Probabilities to use any sub-policy are equal, there is also a probability not to apply any transformations at all (we call it 'keep image').



The following hardware was used in the experiment:

  • Tesla V-100 GPU
  • Intel Xeon (16 cores / one thread per core)

The approximate time for 100 optimization steps is ~3 hours, with mobilenet_v2 used as a child model trained for 8 epochs.


In order to train the child model we use the following set of images:

Screen Shot 2020-09-16 at 11.36.46.png

Experiment details

We use the dataset described above to evaluate BO optimization step as follows: the child model is trained on the Train fold, the best model is selected via the Validation fold, the optimization step objective loss is measured on the Test fold. In order to prevent overfitting one may use sampled images as a test set out of special held-out big test set of images. Moreover, the overall quality will be measured with data that has not been used by BO optimizations.

We use the Tree Parzen Estimator (TPE) approach implemented in hyperopt python library.

Each iteration contains augmentation policy with the appropriate objective loss. The resulting set of iterations is being sorted by the loss and the top k policies are selected to measure the final scores.

Final scores are evaluated on full-size dataset (independent of the dataset from above) using Assaia ApronAI training pipelines with the found augmentation policies.


We consider the following set of metrics:

  • - F1 score is measured among detected (and not) start/end events
  • - RMSE is measured between ground true signal and corresponding prediction

  • Assaia ApronAI pipeline contains continuous predictions for each event happening or not on a stand. These predictions form a branch of signals. Signals may be distorted by different visual complications such as adverse weather conditions, occlusions, etc.


    Comparison between good(upper) and noisy(lower) signals


    In order to compare different approaches we measure some scores on several datasets:


    As we can see, the Auto-Augmentation approach mostly outperforms other approaches in terms of F1-scores.

    In addition, we can see that sometimes 'the more you stack, the more you get' approach performs worst of all, which is a sign for us to use augmentations carefully.

    Let's see what happens to signals: RMSE_v2.png

    Here we can see that BO approach has shown significantly good scores on dataset #1 and almost similar to other policies scores on datasets #2, #3, #4.

    You can check out some examples of the policies used below:

    Screen Shot 2020-09-14 at 14.16.06.png


    In this post we have discussed different approaches for generating augmentation strategies and did experiment with BO-approach. The results have inspired our team to explore the application of BO in Assaia ApronAI pipelines and possibly to include BO-augmentation optimization into the pipelines as they seem to have a relatively high performance without causing a huge computational or engineering overhead.

    Alexey Chekmachev has over 3 years of experience in Data Science industry including working with time series data, computer vision, predictive analytics. He is a researcher at Machine Learning department of Assaia with current focus on a time series analysis bounded with AI generated data.