aktuelle Version (44f597f1) · Commits · nguyen / Best-Worst-Scaling

README.md

0 → 100755

+207 −0

Original line number	Diff line number	Diff line
		# Web Interface for Best-Worst-Scaling

		(authored by Dung Nguyen, Maryna Charniuk, Sanaz Safdel, 2019-2020)

		This project aims at creating a user-friendly website to annotate data
		using Best-Worst-Scaling ([Kiritchenko and Mohammad 2016](https://saifmohammad.com/WebPages/BestWorst.html)).

		## Requirements

		* [Python3.6](https://www.python.org/downloads/release/python-369/) or later
		* [Flask](https://flask.palletsprojects.com/)
		* [Flask-Bootstrap](https://pythonhosted.org/Flask-Bootstrap/)
		* [Flask-Login](https://flask-login.readthedocs.io/en/latest/)
		* [Flask-WTF](https://flask-wtf.readthedocs.io/en/stable/)
		* [Flask-SQLAlchemy](https://flask-sqlalchemy.palletsprojects.com/en/2.x/)
		* [Boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/mturk.html)
		* [Pytest](https://docs.pytest.org/en/latest/)
		* [Sphinx](https://www.sphinx-doc.org/en/master/)

		## Installation
		* Create a virtual environment using [venv](https://docs.python.org/3/library/venv.html)
		or [virtualenv](https://virtualenv.pypa.io/en/latest/) to manage dependencies for this repository. (recommended)
		* Clone the repository:
		```sh
		$ git clone https://gitlab.cl.uni-heidelberg.de/nguyen/swp.git
		$ cd swp/
		```

		* After activating the virtual environment, run:
		```sh
		$ pip install -r requirements.txt
		```
		to install requirements for this project.

		## How to

		### 1. Web Application
		In ```swp/``` run:
		```sh
		$ python main.py
		```
		The system works locally. Open this [URL](http://127.0.0.1:5000/ "Local development system") in any browser to access to the web application.

		#### § Structure

		In the following scheme is the structure of this directory for the web application :
		```bash
		swp/
		├── README.md
		├── __init__.py
		├── config.py
		├── doc
		│ └── ...
		├── examples
		│ ├── first_10_characters_examples.txt
		│ └── movie_reviews_examples.txt
		├── index.rst
		├── main.py
		├── project
		│ ├── __init__.py
		│ ├── annotator
		│ │ ├── __init__.py
		│ │ ├── account.py
		│ │ ├── annotation.py
		│ │ ├── forms.py
		│ │ ├── helpers.py
		│ │ └── views.py
		│ ├── generator.py
		│ ├── models.py
		│ ├── start
		│ │ ├── __init__.py
		│ │ └── routes.py
		│ ├── static
		│ │ └── styles.css
		│ ├── templates
		│ │ ├── annotator
		│ │ │ ├── batch.html
		│ │ │ ├── index.html
		│ │ │ └── project.html
		│ │ ├── questions.xml
		│ │ ├── start.html
		│ │ └── user
		│ │ ├── index.html
		│ │ ├── login.html
		│ │ ├── profile.html
		│ │ ├── project.html
		│ │ ├── signup.html
		│ │ └── upload-project.html
		│ ├── user
		│ │ ├── __init__.py
		│ │ ├── account.py
		│ │ ├── forms.py
		│ │ ├── helpers.py
		│ │ ├── home.py
		│ │ ├── inputs.py
		│ │ ├── outputs.py
		│ │ └── views.py
		│ └── validators.py
		├── requirements.txt
		└── tests
		└── ...
		```

		#### § Short User Manual
		* In order to upload a project, you need an account first. Then, follow the instructions on the website.
		* For the project, upload only non-empty _.txt_-files.
		* There are 2 options how the annotation works:
		* Option 1: Local annotator system - You should find the annotators yourself.
		* Option 2: *Mechanical Turk* - The project will be created on Amazon
		Crowdsourcing Platform - [Mechanical Turk](https://www.mturk.com/) as HITs.
		People that are interested in the HITS will accept and do the annotations.
		You don't need to find any annotator.
		* At any time (when at least one annotator has submitted any batch), 2 files can be downloaded:
		* scores.txt : calculated scores of the items
		* report.txt : report with raw annotated datas

		### 2. Testing
		To run test:
		```sh
		$ pytest
		```

		#### $ Structure
		```sh
		swp/
		├── ...
		├── ...
		│ ....
		└── tests
		├── __init__.py
		├── conftest.py
		├── functional
		│ ├── __init__.py
		│ ├── test_annotators.py
		│ ├── test_batches.py
		│ ├── test_projects.py
		│ ├── test_users.py
		│ └── test_wrong_cases_input_required.py
		└── unit
		├── __init__.py
		├── test_generator.py
		└── test_models.py
		```

		#### § Tests
		##### 1. Unit Tests
		* Test creating and saving data in any table, test relationships between tables
		* Test adding uploaded items, creating tuples, creating batches
		+ Every uploaded item must be included.
		+ Every item must be divided in at least one tuple.
		+ Items must appear relatively in the same number of tuples: 2 conditions
		1. Most of the items have the frequency in range (`average frequency - 2`,
		`average frequency + 3` ). This happens because creating tuples from
		source code is basically based on randomization and shuffling.
		+ `Max frequency` and `min frequency` are in range `± 5` of `average frequency`.
		+ Batches must be relatively equally divided: 2 cases
		+ Case 1 : for all batches: `normal batch size` ≤ `batch size` ≤ `normal batch size + (minimum batch size - 1)`.
		E.g.: `normal batch size` = 20, `minimum batch size` = 5 => 20 ≤ `average batch size` ≤ 24.
		+ Case 2 : Accept only one batch that: `minimum batch size` ≤ `batch size` < `normal batch size`
		and the rest has the size of `normal batch size`. E.g.: `normal batch size` = 20, `minimum batch size` = 5,


		##### 2. Functional Tests
		* Test validations in user registration
		* Username, email are never used before.
		* Username has no special character, meets the length requirement.
		* Email must have email format.
		* Password must meet the length requirement.
		* Test validations in user login
		* Not signed up username returns error.
		* Invalid password for valid username is not accepted.
		* Test validations in uploading a project
		* There must exist at least one non-empty `txt`-file.
		* At least 5 uploaded items for the project.
		* Project description must be long enough (at least 20 characters long).
		* Best and Worst definition are not the same.
		* Test validation in annotator login
		* If keyword is already used, the pseudoname must correspond to given pseudoname before. (No 2 annotators have the same keyword)
		* Test validations in annotating a batch
		* Every field is required.
		* In a tuple, an item is not allowed to be chosen as both Best and Worst.

		> Note: No validation of required inputs for form attributes defined as
		`MultipleFileField`, `StringField`, `PasswordField` or `TextAreaField` from module
		[wtforms.fields](https://wtforms.readthedocs.io/en/stable/fields.html) in this project.
		> Reason: Validator `InputRequired` used from module
		[wtforms.validators](https://wtforms.readthedocs.io/en/stable/validators.html#wtforms.validators.InputRequired)
		can validate this requirement directly on web server but during testing in backend,
		those fields are misinterpreted (due to [source codes](https://github.com/wtforms/wtforms/blob/master/src/wtforms/fields/core.py)).
		More information see cases in `tests/functional/test_wrong_cases_input_required.py`

		### 3. Documentation
		* To read the documentation, run:
		```sh

		```



		## Additional Resource
		* Bryan K. Orme. *Maxdiff analysis : Simple counting , individual-level logit, and
		hb*. 2009. [URL](https://www.sawtoothsoftware.com/download/techpap/indivmaxdiff.pdf)
		* Saif Mohammad and Peter D. Turney. *Crowdsourcing a word-emotion association
		lexicon*. CoRR, abs/1308.6297, 2013. [URL](http://arxiv.org/abs/1308.6297).
		* Svetlana Kiritchenko and Saif M. Mohammad. *Best-worst scaling more reliable than
		rating scales: A case study on sentiment intensity annotation*. CoRR,
		abs/1712.01765, 2017. [URL](http://arxiv.org/abs/1712.01765)

init.py

0 → 100755

+0 −0

Empty file added.

config.py

0 → 100755

+55 −0

Original line number	Diff line number	Diff line
		# -- coding: utf-8 --
		"""

		Module ``config``
		********************

		This module defines different Config objects for different servers.

		"""

		import os
		basedir = os.path.abspath(os.path.dirname(__file__))


		class Config:
		"""
		Base Configurations used for all servers.
		"""
		SECRET_KEY = os.environ.get('SECRET_KEY') or 'Thisissupposedtobesecret!'
		SQLALCHEMY_TRACK_MODIFICATIONS = False
		BASE_DIR = basedir
		SQLALCHEMY_DATABASE_URI = 'sqlite:///%s'%(os.path.join(basedir, 'database.db'))

		@staticmethod
		def init_app(app):
		pass

		class DevelopmentConfig(Config):
		"""
		Configurations used during development.
		"""
		FLASK_ENV = 'development'
		DEBUG = True
		SQLALCHEMY_DATABASE_URI = os.environ.get('DEV_DATABASE_URL') or \
		'sqlite:///%s'%(os.path.join(basedir, 'database-dev.db'))
		MTURK_URL = 'https://mturk-requester-sandbox.us-east-1.amazonaws.com' # in production mode this will be NONE!!
		AWS_ACCESS_KEY_ID = None#"AKIAI57NPWPWYHGOWIPQ"
		AWS_SECRET_ACCESS_KEY = None#"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"
		MTURK_SHOW_UP_URL = "https://workersandbox.mturk.com/"

		class TestingConfig(Config):
		"""
		Configurations used during testing.
		"""
		DEBUG = False
		TESTING = True
		SQLALCHEMY_DATABASE_URI = os.environ.get('TEST_DATABASE_URL') or \
		'sqlite:///%s'%(os.path.join(basedir, 'database-test.db'))
		WTF_CSRF_ENABLED = False

		config = {
		'development': DevelopmentConfig,
		'testing': TestingConfig,
		'default': DevelopmentConfig
		}
		No newline at end of file

doc/Makefile

0 → 100644

+20 −0

Original line number	Diff line number	Diff line
		# Minimal makefile for Sphinx documentation
		#

		# You can set these variables from the command line.
		SPHINXOPTS =
		SPHINXBUILD = sphinx-build
		SPHINXPROJ = BWS
		SOURCEDIR = source
		BUILDDIR = build

		# Put it first so that "make" without argument is like "make help".
		help:
		@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

		.PHONY: help Makefile

		# Catch-all target: route all unknown targets to Sphinx using the new
		# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
		%: Makefile
		@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
		No newline at end of file

doc/build/doctrees/init.doctree

0 → 100644

+3.11 KiB

File added.

No diff preview for this file type.

View file