AI Engineer at Lockheed Martin

AI engineering internship at Lockheed Martin Space applying deep learning and CNNs for anomaly detection in space vehicle testing.

Summary

AI Engineer at Lockheed Martin Space for Test Anomaly Detection

May 2022 – Aug 2022

Applied Image Classification techniques to anomaly detection for space vehicle testing using Python and PyTorch

Researched many different approaches to AI anomaly detection as part of an Agile team
Implemented transfer learning from AlexNet model using deep convolutional neural networks in PyTorch
Developed custom data loading methods, model architecture, and validation procedures
Designed custom binary and multi-class classifier modules
Achieved over 98% multi-class classification accuracy on synthetic validation data

What I Did

I was part of a research team developing an AI model to search for anomalies in space vehicle test data. On this role, I:

Took part in discussions to design model architecture
Researched new techniques to improve model performance
Shared information with other team members, some working on the same model and others working on different models in parallel

Programming the Model: I was also the primary model programmer for the team of three working on our specific model. Another team member worked on synthetic data generation, and the third team member was our direct supervisor. I wrote code to handle data loading and normalization, model training and inference, and model validation procedures. I implemented all of these features in python, using the PyTorch library.

Team Presentation: At the end of my internship, I gave an in depth presentation of my contributions to the entire research team. Our model was able to achieve over 98% validation accuracy on the synthetic dataset, which was generated from an impressively small number of real world samples. The model spurred several interesting questions from the group, and was selected for further development after I left.

What I Learned

Though this work was in an industry setting, my specific group was structured more like a research lab. I really enjoyed this environment, since it gave me an opportunity to learn why and how something works, well at the same time building a practical product.

Transfer Learning: The high level concepts I leveraged for the model were transfer learning and convolutional neural nets (CNNs). CNNs are commonly used in AI for image recognition problems, which at first glance seems unrelated to anomaly detection in test data, the focus of our work. However, images to computers are really just matrices of numbers - and so is the test data. Transfer learning allows us to use a model trained to do image recognition, and, with minimal tweaking, train that model to detect anomalies in our data.

Overfitting: One of the first problems I faced with this model was overfitting - the phenomenon where a model “memorizes” the training data rather than actually learning the underlying pattern. To combat this, I first decreased the number of epochs the model was allowed to train for, and lowered the learning rate. However, this did not completely fix the problem. My research for solutions eventually lead me to a technique called k-fold validation, where the model is trained on distinct subsets of the data for each epoch, and validated against the remaining portion.

Dying ReLU: Possibly the hardest problem I dealt with was dying ReLU. ReLU layers introduce non-linearity into the model to help it learn non-linear functions better. It does this by clamping all negative inputs to 0, which becomes a problem if the model reaches a configuration where most or all of the inputs are negative. In this case nearly all the outputs of the ReLU layer are 0, meaning information is lost, and the gradient through all subsequent layers is 0, making it difficult for the model to learn. To combat this, I tested several alternate non-linearities, such as the sigmoid function and PReLU functions. Ultimately, the change that caused the biggest improvement was implementing batch normalization, which shifts and scales the data between each layer to be more like a unit distribution around the origin. This prevented a configuration where most or all of the inputs were negative, and allowed the model to continue learning effectively.