Update README.md (ec1f162d) · Commits · pirapakaran / Softwareproject_ws2020_21

README.md

+15 −13

Original line number	Diff line number	Diff line
		# SOFTWAREPROJECT "Discovering frames of arguments"
		# Discovering frames of arguments

		This project is part of the course "Softwareprojekt WS 2020/21" of the Computational Linguistics program at Heidelberg University. This project is supervised by Prof. Dr. Anette Frank. <br>
		In this project, we take the approach of "Paul and Frank" and extend it by the approach of filtering human needs on argumentative texts.
		@@ -39,30 +39,32 @@ What things you need to install and how to install them

		To execute the project, you must first select a dataset. <br>
		The Dataset has to be annotated manually, which is then used as the gold standard. <br>
		For Annotation we used Maslow and Reiss motives. Maslow's hierarchy of needs is a motivational theory in psychology comprising a five-tier model of human needs, often depicted as hierarchical levels within a pyramid. Reiss builds on this pyramid, adding finer terms to each category. Each four of us annotated hundred of the fourhundred essays with a Maslow and a Reiss motive. We then used Fleiss Kappa for calculation of the Inter Annotater Agreement. <br> The data must then be prepared accordingly. A test file and a train file. How these must look exactly, can be taken from our attached files.
		For Annotation we used Maslow and Reiss motives. Maslow's hierarchy of needs is a motivational theory in psychology comprising a five-tier model of human needs, often depicted as hierarchical levels within a pyramid. Reiss builds on this pyramid, adding finer terms to each category. Each four of us annotated hundred of the four-hundred essays with a Maslow and a Reiss motive. We then used Fleiss Kappa for calculation of the Inter Annotater Agreement. <br> The data must then be prepared accordingly. A test file and a train file. How these must look exactly, can be taken from our attached files.

		## Steps to get started <br>

		### Pre-work
		First of all, two code files must be executed: Comparer.py and Readhumans.py. The Comparer.py requires the concept-net-assertions-5.6.0 and the selected dataset. The output is a concept-filtered and lemmatized set of words from each input set. When executing the Comparer code, the concrete path where the dataset is located must be specified. After that ontology_create has to be executed for the list with the concepts. Then the sets are split and lemmatized. Run through matching_dicts for final result with concepts. <br> Second, the Readhumans.py file is executed. As training set we use the ROCStories dev-set. The output is a file with the individual components of the set (e.g. file name). To choose the right directory in line 338 the file has to be edited specificly (dev -> motiv -> allcharlines).
		First of all, two code files must be executed: *Comparer.py* and *Readhumans.py. The Comparer.py* requires the concept-net-assertions-5.6.0 and the selected dataset. The output is a concept-filtered and lemmatized set of words from each input set. When executing the Comparer code, the concrete path where the dataset is located must be specified. After that ontology_create has to be executed for the list with the concepts. Then the sets are split and lemmatized. Run through matching_dicts for final result with concepts. <br> Second, the Readhumans.py file is executed. As training set we use the ROCStories dev-set. The output is a file with the individual components of the set (e.g. file name). To choose the right directory in line 338 the file has to be edited specificly (dev -> motiv -> allcharlines).
		<br> <br>

		> Graphpath: Path to constructed Concepts concept_graph_full <br>
		> Outputpath: Path to (empty) File, purpose written behind as _[purpose]
		> Purpose: --dev \| --train \| --test

		### Start coding
		To work, we use the steps provided by Debjit Paul from Ranking and Selecting Multi-Hop Knowledge Paths to Better Predict Human Needs (NAACL 2019)'. However, we had to modify and adapt these steps, which is why our given steps are slightly different from his. <br>
		To work, we use the steps provided by Debjit Paul from Ranking and Selecting Multi-Hop Knowledge Paths to Better Predict Human Needs (NAACL 2019)'. However, we had to modify and adapt these steps, which is why our given steps are slightly different from his. Also the code files are different from Debjit Paul's original files, because his codes didn't run perfectly for us and we wrote our own accordingly. <br>

		### Construct ConceptNet into a graph
		### Step 1: Construct ConceptNet into a graph
		Prerequisite for this step is the previous download of the concept-net-assertions-5.6.0.csz.gz.

		```
		python src/graph_model/conceptnet2graph.py /Ppath_to_unziped_conceptnet-assertions-5.6.0.csv_File
		python src/graph_model/conceptnet2graph.py /path_to_unziped_conceptnet-assertions-5.6.0.csv_File
		```
		> Inputfile: path to .txtfile of train and testdata
		> Graphpath: Path to constructed Concepts concept_graph_full <br>
		> Outputpath: Path to (empty) File, purpose written behind as _[purpose]
		> Purpose: --dev \| --train \| --test
		>
		> Output is the file concept_graph_full

		### Construct subgraph per sentence for for train- test files
		### Step 2: Construct subgraph per sentence for for train- test files

		In this step we construct the subgraph for each sentence. <br>
		> Input: ConceptNet Graph, Data for train/test <br>
		@@ -73,13 +75,13 @@ python src/graph_model/make_sub_graph_server.py "inputfile" "graphpath" "outputp
		e.g. python scr/graph_model/make_sub_graph_server.py .\training_neu.txt .\concept_graph_full .\output --purpose train
		```
		Inputformat:
		```
		~~~~
		"unique_ID" + "\t" + "HumanNeed \| None" + "\t" + "sentence" + "\t" + "concepts"
		```
		~~~~
		Seperate sentences with newlines.


		### Extracting relevant knowledge paths from subgraphs
		### Step 3: Extracting relevant knowledge paths from subgraphs
		> Inputpath: path to inputfile from step 2 <br>
		> Outputpath: path to txt.file where extracted knowledgepaths can be saved <br>
		> Input: subgraphs and inputfile from step 2<br>