cycloevan commited on
Commit
f032c92
·
verified ·
1 Parent(s): 6f26109

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +146 -1
README.md CHANGED
@@ -1,3 +1,136 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  @inproceedings{raff2017malware,
2
  title={Malware detection by eating a whole exe},
3
  author={Raff, Edward and Barker, Jon and Sylvester, Jared and Brandon, Robert and Catanzaro, Bryan and Nicholas, Charles},
@@ -5,4 +138,16 @@
5
  pages={461--470},
6
  year={2017},
7
  organization={IEEE}
8
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: mit
4
+ library_name: tensorflow
5
+ tags:
6
+ - malware-detection
7
+ - text-classification
8
+ - tensorflow
9
+ - keras
10
+ datasets:
11
+ - custom
12
+ - iosifache/DikeDataset
13
+ metrics:
14
+ - accuracy
15
+ - precision
16
+ - recall
17
+ - f1
18
+ ---
19
+
20
+ # Model Card for MalConv
21
+
22
+ ## Model Details
23
+
24
+ **Model Description:** This is a TensorFlow 2 implementation of the MalConv model, a deep neural network for malware detection from raw byte sequences. MalConv is a convolutional neural network (CNN) designed to classify executable files as either malicious or benign. It takes the raw bytes of an entire executable file as input, making it an end-to-end, feature-free malware detection model.
25
+
26
+ * **Developed by:** This implementation by [Your Name or Organization], based on the original work by Edward Raff, Jon Barker, Jared Sylvester, Robert Brandon, Bryan Catanzaro, and Charles Nicholas.
27
+ * **Model type:** Binary Classification
28
+ * **Language(s) (NLP):** Not applicable
29
+ * **License:** MIT
30
+ * **Finetuned from model:** Not applicable
31
+
32
+ ## Uses
33
+
34
+ ### Direct Use
35
+
36
+ This model is intended for classifying executable files as either malicious or benign.
37
+
38
+ ### Downstream Use
39
+
40
+ This model can be used as a base for further fine-tuning on other malware classification tasks.
41
+
42
+ ### Out-of-Scope Use
43
+
44
+ This model is designed for binary classification of executables and should not be used for other tasks such as malware generation or analysis of other file types.
45
+
46
+ ## Bias, Risks, and Limitations
47
+
48
+ This model's performance is highly dependent on the dataset it was trained on. If the training data is not representative of the types of malware you are trying to detect, the model's performance will be poor. The model may also be susceptible to adversarial attacks.
49
+
50
+ ### Recommendations
51
+
52
+ Users should be aware of the potential biases and limitations of the model. It is recommended to train the model on a large and diverse dataset of malware and benign files.
53
+
54
+ ## How to Get Started with the Model
55
+
56
+ ### 1. Data Preparation
57
+
58
+ Before training or tuning the model, you need to prepare your dataset. You can do this in two ways:
59
+
60
+ * **CSV File:** Create a CSV file with two columns: `filepath` and `label`. The `filepath` column should contain the absolute path to each executable file, and the `label` column should contain the corresponding label (0 for benign, 1 for malware).
61
+
62
+ * **Directories:** Organize your malware and benign files into separate directories.
63
+
64
+ ### 2. Training
65
+
66
+ To train the MalConv model, use the `src/train.py` script. You can provide the training data using either a CSV file or directories.
67
+
68
+ **Using a CSV file:**
69
+
70
+ ```bash
71
+ python src/train.py --csv /path/to/your/data.csv
72
+ ```
73
+
74
+ **Using directories:**
75
+
76
+ ```bash
77
+ python src/train.py --malware_dir /path/to/malware --benign_dir /path/to/benign
78
+ ```
79
+
80
+ The trained model will be saved to `models/malconv_model.h5` by default. You can change this with the `--save_path` argument.
81
+
82
+ ### 3. Prediction
83
+
84
+ To make predictions on new executable files, use the `src/predict.py` script.
85
+
86
+ **Predicting a single file:**
87
+
88
+ ```bash
89
+ python src/predict.py /path/to/your/model.h5 --file /path/to/your/executable.exe
90
+ ```
91
+
92
+ **Predicting a batch of files from a CSV:**
93
+
94
+ ```bash
95
+ python src/predict.py /path/to/your/model.h5 --csv /path/to/your/files.csv --output /path/to/your/predictions.csv
96
+ ```
97
+
98
+ ## Training Details
99
+
100
+ **Training Data:** This model should be trained on a large and diverse dataset of malware and benign executable files. The original paper used a dataset of 1.2 million files. Another option is the [DIKE dataset](https://github.com/iosifache/dike-dataset), which contains both benign and malicious PE and OLE files.
101
+
102
+ **Training Procedure:** The model was trained using the Adam optimizer with a learning rate of 0.001 and a batch size of 512. The training procedure is described in detail in the original paper.
103
+
104
+ ## Evaluation
105
+
106
+ **Testing Data:** The model should be evaluated on a held-out test set of malware and benign executable files.
107
+
108
+ **Metrics:** The model's performance can be evaluated using the following metrics:
109
+
110
+ * Accuracy
111
+ * Precision
112
+ * Recall
113
+ * F1-score
114
+
115
+ **Results:** The results of the evaluation will depend on the dataset used. The original paper reported an AUC of 0.99.
116
+
117
+ ## Model Card Authors
118
+
119
+ [Your Name or Organization]
120
+
121
+ ## Model Card Contact
122
+
123
+ [your-email@example.com]
124
+
125
+ ## References
126
+
127
+ - [Malware Detection by Eating a Whole EXE (arXiv:1710.09435)](https://arxiv.org/abs/1710.09435)
128
+
129
+ ## Citation
130
+
131
+ If you use this code in your research, please cite the original MalConv paper:
132
+
133
+ ```
134
  @inproceedings{raff2017malware,
135
  title={Malware detection by eating a whole exe},
136
  author={Raff, Edward and Barker, Jon and Sylvester, Jared and Brandon, Robert and Catanzaro, Bryan and Nicholas, Charles},
 
138
  pages={461--470},
139
  year={2017},
140
  organization={IEEE}
141
+ }
142
+ ```
143
+
144
+ ```
145
+ @article{raff2017malware_arxiv,
146
+ title={Malware Detection by Eating a Whole EXE},
147
+ author={Edward Raff and Jon Barker and Jared Sylvester and Robert Brandon and Bryan Catanzaro and Charles Nicholas},
148
+ year={2017},
149
+ eprint={1710.09435},
150
+ archivePrefix={arXiv},
151
+ primaryClass={cs.CR}
152
+ }
153
+ ```