José Angel Daza Arévalo

Amsterdam · Netherlands · josdaza.a [at] gmail

Computer Systems Engineer with a master's degree in Computer Science and a Ph.D. in Computational Linguistics. I recently joined as a postdoc for the CLTL group at the Vrije Universiteit Amsterdam where I work with Prof. Dr. Antske Fokkens and Prof. Dr. Piek Vossen , particularly collaborating in the InTaVia project for European Cultural Heritage.


NLP Researcher

Vrije Universiteit Amsterdam. The Netherlands.

Development of NLP models and text mining strategies for automatic processing, integration and dynamic language analysis of biographies and other historiographical texts. This work is part of the InTaVia project, which aims to address major research challenges and bridge the semantic gap between large object databases, biography databases, and present-day users across Europe.

March 2021 - Current

NLP Developer

Leibniz-Institut für Deutsche Sprache. Mannheim, Germany.

Implementation of SOTA lemmatizers and part-of-speech taggers for large-scale (~50 billion tokens) German resources.

August 2020 - February 2021

Research Scientist

Heidelberg University. Germany.

My research concerns on finding effective methods for creating more training data for the task of Semantic Role Labeling (SRL) in several languages (but particularly German). I approach SRL as a sequence classification task and also as a sequence generation task. Through my research I have worked with Recurrent Neural Networks , LSTM's , Sequence-to-Sequence models and multilingual Neural Language Models such as ELMo and BERT. I implemented my research code in Pytorch and also used state-of-the-art frameworks such as SpaCy , Transformers and AllenNLP when building more complex models.

April 2017 - July 2020

Research Intern

RIKEN Center of Advanced Intelligence Project (AIP). Tokyo, Japan.

I collaborated on a project for constructing a multilingual fine-grained entity classifier

June 2019 - August 2019

Senior Developer & Data Scientist

Innomius Technologies

I focused on implementing dashboards for business analytics using Tableau . Some of my projects included developing machine learning algorithms (clustering and classification) using Python and Scikit-learn . My latest project was a demo for Customer Sentiment Analysis on Twitter.

January 2016 - March 2017

Mobile & Web Developer


Mobile App prototype design and development. I used Appcelerator Titanium for cross-platform apps and also implemented a couple of native apps using Swift (for iPhone) and Java (for Android). As a web developer I used frameworks such as Wordpress. When building projects from scratch, I focused on the Backend (mainly NodeJS and MongoDB), however I also produced some code for the Frontend (mainly using React.)

January 2012 - March 2017

Assistant Consultant

Management Solutions

I participated as an assistant for financial consulting projects such as: the creation of an automatic executive report software on Microsoft Excel, exploitation of Bank Risk Management databases using SAS, and analysis for the improvement of the Credit-card system performance and documentation on BBVA.

August 2012 - April 2013


Heidelberg University

Doctor of Philosophy
Computational Linguistics / Natural Language Processing
Thesis: Cross-lingual Semantic Role Labeling through Translation and Multilingual Learning
April 2017 - May 2021 (Expected)

Instituto Politécnico Nacional (CIC IPN)

Master of Science
Computer Science
Thesis: Automatic Text Generation by Learning from Literary Structures
August 2013 - December 2015

Tecnológico de Monterrey

Bachelor of Science
Computer Systems Engineering
Specialization in Artificial Intelligence
August 2007 - December 2011

Tecnológico de Monterrey

Interdisciplinary Program for Highly-skilled Students
August 2004 - July 2007


Programming Languages & Frameworks
  • Python 2.7 and Python 3 (Advanced)
  • JavaScript (Intermediate)
  • Java (Intermediate)
  • NodeJS
  • ReactJS
Machine Learning & Data Science
  • Pytorch
  • NLTK
  • SpaCy
  • AllenNLP
  • Transformers
  • Tableau
  • NumPy
  • Pandas
  • Scikit-learn


X-SRL: A Parallel Cross-Lingual Semantic Role Labeling Dataset

Daza, A. and Frank, A. (2020). To be presented at EMNLP 2020 - Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. On-line Conference .[PDF]

Translate and Label! An Encoder-Decoder Approach for Cross-lingual Semantic Role Labeling

Daza, A. and Frank, A. (2019). Published at EMNLP-IJCNLP 2019 - Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Hong Kong, China .[PDF]

A Sequence-to-Sequence Model for Semantic Role Labeling.

Daza, A. and Frank, A. (2018). Published at ACL 2018 - Proceedings of the 3rd Workshop on Representation Learning for NLP (RepL4NLP), Melbourne, Australia.[PDF]

Automatic Text Generation by Learning from Literary Structures.

Daza A., Calvo H., Figueroa-Nazuno J. (2016). Published at NAACL 2016 - Workshop on Computational Linguistics for Literature, San Diego, California.[PDF]


When I am not in front of a computer I enjoy reading, both non-fiction (science, history, philosophy) and fiction: Borges, Dostoevsky, Mishima and Rulfo are some of my favorite authors. I am also a big football fan, particularly I enjoy watching the Champions League after a nice day at work.

I love traveling and getting to know new cultures and languages (I speak Spanish and English fluently and have learned some Portuguese, German and lately Dutch). Photography and Movies are also pleasant hobbies for the weekend.