Commit 1ed55a80 authored by hoepfl's avatar hoepfl
Browse files

Update README.md

parent 550b441c
Loading
Loading
Loading
Loading
+3 −1
Original line number Diff line number Diff line
@@ -2,4 +2,6 @@
Create a conda env by running 
`conda env create --file requirements.yaml`<br><br>
Then, either download the data from the Leipzig corpus (or other data) and use the respective python scripts `combine_sources_into_corpus.py`and `handle_giellalt_data.py` to create a training corpus, or use the corpora in the *data*-folder. <br><br>
Then use the `submit.sh`script to create fastText embeddings and run *VecMap*. 
To create new fastText embeddings, it is necessary to clone the fastText github, then go to the fastText folder and run `make`. <br>Details can be found on the fastText website. <br>
When other methods / libraries should be used to create embeddings, their output should be in the word2vec format (recognizable by the `.vec`ending, and a file content in the format `[word] [k-dim vector]`)<br><br>
After installing fastText, the `submit.sh`script can be used to create fastText embeddings and run *VecMap*.