Enrico Alemani – Medium

Enrico Alemani

Gazetteer deduplication in Pandas

Gazetteer deduplication is for matching a messy data set against a ‘canonical’ dataset (i.e. gazette). The former contains misspellings…

3 min readDec 19, 2020

--

Gazetteer deduplication in Pandas

--

Enrico Alemani

Record linkage in Pandas

Record linkage is the process of linking records from different data sources (e.g. pandas dataframes) using any fields in common between…

3 min readDec 15, 2020

--

1

Record linkage in Pandas

--

1

Enrico Alemani

Records deduplication in Pandas

How many times have you found yourself in a situation where you had to deal messy data, especially reconciliate mispellings, short forms…

3 min readNov 20, 2020

--

Records deduplication in Pandas

--

Enrico Alemani

Flatten nested dictionaries in pandas using glom

Pandas is great! You can do pretty much eveything with it: from data cleaning to quick data viz. How about working with nested dictionary…

2 min readJun 23, 2020

--

--

Enrico Alemani

The customized spaCy training loop

Customization and implementation of tips and advice for NER training

2 min readMay 10, 2020

--

1

The customized spaCy training loop

--

1

Enrico Alemani

How to create training data for spaCy NER models using ipywidgets

In this post, I present the spacy-annotator: a library to create training data for the spaCy Named Entity Recognition (NER) model using…

3 min readMay 3, 2020

--

2

How to create training data for spaCy NER models using ipywidgets

--

2

Enrico Alemani

Fake reviews detection and transfer learning

We apply the Universal Language Model Fine-Tuning(ULMFiT) by Howard and Ruder (2018) to fake reviews detection and demonstrate that deep…

7 min readJan 4, 2020

--

Fake reviews detection and transfer learning

--

Enrico Alemani

Enrico Alemani

Data Scientist | Economist. Curious.

Help
Status
About
Careers
Blog
Privacy
Terms
Text to speech
Teams