ia-nechaev
/

sbic-method2

Text Classification

Model card Files Files and versions

HamidBekam commited on Mar 19

Commit

34aa7ff

·

verified ·

1 Parent(s): 1fb8a93

Update README.md

Files changed (1) hide show

README.md +59 -0

README.md CHANGED Viewed

@@ -3,6 +3,65 @@
 An updated version of Standard-Based Impact Classification (SBIC) method of CSR report analysis in accordance with GRI framework
 ---
 license: gpl-3.0
 ---

 An updated version of Standard-Based Impact Classification (SBIC) method of CSR report analysis in accordance with GRI framework
+Here's a README section with instructions on how to run the code.
+---
+# **Multilabel Classification Step**
+This code performs report similarity search using **cosine similarity**, **K-Nearest Neighbor (KNN) algorithm**, and **Sigmoid activation function** to classify reports based on embeddings.
+## **Prerequisites**
+Ensure you have the following installed before running the script:
+- Python 3.8+
+- Required Python libraries (install using the command below)
+```bash
+pip install numpy pandas torch sentence-transformers scikit-learn
+```
+## **Input Files**
+Before running the script, make sure you have the following input files in the working directory:
+1. **Patent Data Files**:
+   - `df_360k_41lables_05012023.csv`
+   - `german_plc_all_paragraphs_unnested_only.csv`
+2. **Precomputed Embeddings**:
+   - dataset for prediction:`embeddings_paragraphs_07012023.pkl`
+   - labeled dataset:`embeddings_sentences_360k_09012023.pkl`
+## **Running the Script**
+Run the script using the following command:
+```bash
+python script.py
+```
+## **Processing Steps**
+The script follows these main steps:
+1. **Load Data & Pretrained Embeddings**
+2. **Perform Cosine Similarity Search**: Finds the most relevant reports (sentences) using `semantic_search` from `sentence-transformers`.
+3. **Apply K-Nearest Neighbor (KNN) Algorithm**: Selects top similar reports (sentences) and aggregates predictions.
+4. **Use Sigmoid Activation for Classification**: Applies a threshold to generate final classification outputs.
+5. **Save Results**: Exports `df_results_0_50k.csv` containing the processed data.
+## **Output File**
+The processed results will be saved in:
+- `df_results_0_50k.csv`
+## **Execution Time**
+Execution time depends on the number of test samples and system resources. The script prints the total processing time upon completion.
 ---
 license: gpl-3.0
 ---