FINSIM 2020 shared task on Financial Word Embeddings Evaluation
Held at IJCAI-PRICAI 2020 as part of the FIN-NLP 2020 workshop.
11-12 July, Yokohama, Japan.
Shared Task URL: https://sites.google.com/nlg.csie.ntu.edu.tw/finnlp2020/shared-task-finsim
Workshop URL: https://sites.google.com/nlg.csie.ntu.edu.tw/finnlp2020/
Participation Form: https://forms.gle/sBC459xK8YuVw3Zz8
The FINSIM 2020 shared task is colocated with the FinNLP workshop and offers the challenge of automatically learning effective and precise semantic models to process documents from the financial domain. This task hopes to spark interest from communities in NLP, ML/AI, Knowledge Representation and Financial document processing.
Going beyond the mere representation of words is a key step to industrial applications that make use of Natural Language Processing (NLP).
This is typically addressed using either
- Unsupervised corpus-derived representations like word embeddings, which are typically opaque to human understanding but very useful in NLP applications;
- Manually crafted resources such as corpora, lexica, taxonomies and ontologies, which typically have low coverage and contain inconsistencies, but provide a deeper understanding of the target domain.
These two methods form the two ends of a spectrum which a number of approaches have attempted to combine, particularly in tasks aiming at expanding the coverage of manual resources using automatic methods.
- The Semeval community has organized several evaluation campaigns to stimulate the development of methods which extract semantic/lexical relations between between concepts/words (Camacho-Collados et al. 2018, , Bordea et al. 2015, Bordea et al. 2016,).
- There are also a large number of datasets and challenges that specifically look at how to automatically populate knowledge bases such as DBpedia or Wikidata (e.g. KBP challenges).
- To the best of our knowledge, FINSIM 2020 is the first time such a task is proposed for the Financial domain.
Participants will be given a list of carefully selected terms from the Financial domain such as “European depositary receipt”, “Interest rate swaps” and will be asked to design a system which can automatically classify them into the most relevant hypernym (or top-level) concept in an external ontology. For example, given the set of concepts “Bonds”, “Unclassified”, “Share”, “Loan”, the most relevant hypernym of “European depositary receipt” is “Share”.
Participants will be given a large corpus of in-domain data to facilitate learning semantic representations as well as a set of concepts extracted from an ontology (The Financial Industry Business Ontology (FIBO)).
Systems will be evaluated according to the accuracy with which financial terms are classified, and according to recall (based on the total number of predictions). We are interested in systems which make creative use of relevant resources such as ontologies and lexica, as well as systems which make use of large existing word embeddings such as BERT (Devlin et al. 2018).
A USD$1000 prize will be rewarded to the best-performing teams.
To register your interest in participating in FinSim shared task please use the following google form by no later than May 8th, 2020:
March 13th, 2020: Registration opens.
March 30th, 2020: Release of training set & scoring scripts.
May 1st, 2020: Release of test set.
May 8th, 2020: Registration deadline.
May 15th, Submission deadline.
May 31st, 2020: Release of results.
July 11th, 2020: Workshop day.
The dates may change due to the current situation.
For any questions on the shared task please contact us on:
Shared task organizers:
– Ismail El Maarouf, Fortia Financial Solutions
– Youness Mansar, Fortia Financial Solutions
– Virginie Mouilleron, Fortia Financial Solutions
– Dialekti Valsamou-Stanislawski, Fortia Financial Solutions
Georgeta Bordea, Paul Buitelaar, Stefano Faralli and Roberto Navigli (2015). “SemEval-2015 Task 17: Taxonomy Extraction Evaluation (TExEval)”. In Proceedings of SemEval 2015, co-located with NAACL HLT 2015, Denver, Col, USA.
Georgeta Bordea, Els Lefever, and Paul Buitelaar (2016). “Semeval-2016 task 13: Taxonomy extraction evaluation (TExEval-2)”. In Proceedings of the 10th International Workshop on Semantic Evaluation, San Diego, CA, USA.
Jose Camacho-Collados, Claudio Delli Bovi, Luis Espinosa-Anke, Sergio Oramas, Tommaso Pasini, Enrico Santus, Vered Shwartz, Roberto Navigli, and Horacio Saggion (2018). “SemEval-2018 Task 9: Hypernym Discovery”. In Proceedings of the 12th International Workshop on Semantic Evaluation (SemEval-2018), New Orleans, LA, United States. Association for Computational Linguistics.
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova (2018). “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”. https://arxiv.org/abs/1810.04805v2.
David Jurgens and Mohammad Taher Pilehvar (2016). “SemEval-2016 Task 14: Semantic Taxonomy Enrichment”. In Proceedings of SemEval-2016, NAACL-HLT.
The Financial Industry Business Ontology (FIBO): https://spec.edmcouncil.org/fibo/
Find out more details about Fortia Financial Solutions here