HamidBekam commited on
Commit
34aa7ff
·
verified ·
1 Parent(s): 1fb8a93

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +59 -0
README.md CHANGED
@@ -3,6 +3,65 @@
3
  An updated version of Standard-Based Impact Classification (SBIC) method of CSR report analysis in accordance with GRI framework
4
 
5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  ---
7
  license: gpl-3.0
8
  ---
 
3
  An updated version of Standard-Based Impact Classification (SBIC) method of CSR report analysis in accordance with GRI framework
4
 
5
 
6
+ Here's a README section with instructions on how to run the code.
7
+
8
+ ---
9
+
10
+ # **Multilabel Classification Step**
11
+
12
+ This code performs report similarity search using **cosine similarity**, **K-Nearest Neighbor (KNN) algorithm**, and **Sigmoid activation function** to classify reports based on embeddings.
13
+
14
+ ## **Prerequisites**
15
+
16
+ Ensure you have the following installed before running the script:
17
+
18
+ - Python 3.8+
19
+ - Required Python libraries (install using the command below)
20
+
21
+ ```bash
22
+ pip install numpy pandas torch sentence-transformers scikit-learn
23
+ ```
24
+
25
+ ## **Input Files**
26
+
27
+ Before running the script, make sure you have the following input files in the working directory:
28
+
29
+ 1. **Patent Data Files**:
30
+ - `df_360k_41lables_05012023.csv`
31
+ - `german_plc_all_paragraphs_unnested_only.csv`
32
+
33
+ 2. **Precomputed Embeddings**:
34
+ - dataset for prediction:`embeddings_paragraphs_07012023.pkl`
35
+ - labeled dataset:`embeddings_sentences_360k_09012023.pkl`
36
+
37
+ ## **Running the Script**
38
+
39
+ Run the script using the following command:
40
+
41
+ ```bash
42
+ python script.py
43
+ ```
44
+
45
+ ## **Processing Steps**
46
+
47
+ The script follows these main steps:
48
+
49
+ 1. **Load Data & Pretrained Embeddings**
50
+ 2. **Perform Cosine Similarity Search**: Finds the most relevant reports (sentences) using `semantic_search` from `sentence-transformers`.
51
+ 3. **Apply K-Nearest Neighbor (KNN) Algorithm**: Selects top similar reports (sentences) and aggregates predictions.
52
+ 4. **Use Sigmoid Activation for Classification**: Applies a threshold to generate final classification outputs.
53
+ 5. **Save Results**: Exports `df_results_0_50k.csv` containing the processed data.
54
+
55
+ ## **Output File**
56
+
57
+ The processed results will be saved in:
58
+
59
+ - `df_results_0_50k.csv`
60
+
61
+ ## **Execution Time**
62
+
63
+ Execution time depends on the number of test samples and system resources. The script prints the total processing time upon completion.
64
+
65
  ---
66
  license: gpl-3.0
67
  ---