Update README.md (1ed55a80) · Commits · hoepfl / diac_lm_bilingual_dictionary_inference

README.md

+3 −1

Original line number	Diff line number	Diff line
		@@ -2,4 +2,6 @@
		Create a conda env by running
		`conda env create --file requirements.yaml`<br><br>
		Then, either download the data from the Leipzig corpus (or other data) and use the respective python scripts `combine_sources_into_corpus.py`and `handle_giellalt_data.py` to create a training corpus, or use the corpora in the data-folder. <br><br>
		Then use the `submit.sh`script to create fastText embeddings and run VecMap.
		To create new fastText embeddings, it is necessary to clone the fastText github, then go to the fastText folder and run `make`. <br>Details can be found on the fastText website. <br>
		When other methods / libraries should be used to create embeddings, their output should be in the word2vec format (recognizable by the `.vec`ending, and a file content in the format `[word] [k-dim vector]`)<br><br>
		After installing fastText, the `submit.sh`script can be used to create fastText embeddings and run VecMap.