Enrico Alemani – Medium

Enrico Alemani

Enrico Alemani

Gazetteer deduplication in Pandas

Gazetteer deduplication is for matching a messy data set against a ‘canonical’ dataset (i.e. gazette). The former contains misspellings…

Dec 19, 2020

Gazetteer deduplication in Pandas

Dec 19, 2020

Enrico Alemani

Record linkage in Pandas

Record linkage is the process of linking records from different data sources (e.g. pandas dataframes) using any fields in common between…

Dec 15, 2020

Record linkage in Pandas

Dec 15, 2020

Enrico Alemani

Records deduplication in Pandas

How many times have you found yourself in a situation where you had to deal messy data, especially reconciliate mispellings, short forms…

Nov 20, 2020

Records deduplication in Pandas

Nov 20, 2020

Enrico Alemani

Flatten nested dictionaries in pandas using glom

Pandas is great! You can do pretty much eveything with it: from data cleaning to quick data viz. How about working with nested dictionary…

Jun 23, 2020

Jun 23, 2020

Enrico Alemani

The customized spaCy training loop

Customization and implementation of tips and advice for NER training

May 10, 2020

The customized spaCy training loop

May 10, 2020

Enrico Alemani

How to create training data for spaCy NER models using ipywidgets

In this post, I present the spacy-annotator: a library to create training data for the spaCy Named Entity Recognition (NER) model using…

May 3, 2020

How to create training data for spaCy NER models using ipywidgets

May 3, 2020

Enrico Alemani

Fake reviews detection and transfer learning

We apply the Universal Language Model Fine-Tuning(ULMFiT) by Howard and Ruder (2018) to fake reviews detection and demonstrate that deep…

Jan 4, 2020

Fake reviews detection and transfer learning

Jan 4, 2020

Enrico Alemani

Enrico Alemani

Data Scientist | Economist. Curious.

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams