Sound of Silence: Reducing Noise with Transfer Learning & Synthetic Data


Much research is dedicated to novel deep learning architectures and complex feature engineering approaches to improve performance in deep learning models for NER (named entity recognition) tasks. However, the research devotes less attention to the everyday challenges when your data is just plain “meh.”


The majority of text data generated in real-world applications is quite messy. Transforming the mess into large amounts of high-quality, labeled training data may not be feasible given budget or time constraints.

In this presentation at the UC Data Science Symposium, CoStrategix demonstrated the results of addressing noise by varying the training data utilized for a NER machine learning pipeline, effectively side-stepping these constraints.

Your privacy is important to us. We do not sell or distribute your personal data.