The customized spaCy training loop

Enrico Alemani
2 min readMay 10, 2020


Customization and implementation of tips and advice for NER training

In this post, I explain how to customize the spaCy Named Entity Recognition (NER) training loop from the comfort of your jupyter notebook, including the implementation of spaCy tips and advice on performance optimization.

NB: the code snippets use spaCy v2


The issue
spaCy provides users with the possibility to fully customize the training process using the Command Line Interface (see docs). For example, NER training can be customized by changing the learning rate or L2 regularisation. I spent a bit of time researching and digging into the source code and docs, but I couldn’t find any clear examples on how to achieve the same level of customization while working in a python interpreter (e.g. jupyter notebook).
Thus, I decided to write up a quick solution as well as this blog post, hoping it will be helpful to others while experimenting with the spaCy library.

The solution
The solution is to overwrite the default values once the model is initialised. This method can be used to customize:

  • learning rate;
  • L2 norm;
  • Adam optimizer beta1 and beta2 coefficients;
  • CNN window width;
  • CNN depth;

and many many ther parameters available from the Command Line Interface.

The code
In the example, I tweaked the spaCy NER training example to customize the following parameters:

  • convolution window : conv_window = 3
  • learning rate : learn_rate = 0.3

The explanation
As shown in lines 55 to 61, customization is achieved by the following:

  • component_cfg={"ner":{"conv_window":3}}

The component_cfg is a keywork argument of nlp.begin_training()that can be used to modify the default values of many parameters such as:

  • custom_optimizer. This is a simple function that overwrites the default value of the Adam optimizer. The function is the following:

In addition to this, I have also implemented three other tips suggested in the spaCy docs: dropout decay (lines 67 and 77), parameter averaging (line 95) and batch compounding (line 71).

And this how you can experiment/customize spaCy’s NER model.
Happy training :)



Enrico Alemani

Data Scientist | Economist. Curious.