Model Card for MalConv

Model Details

Model Description: This is a TensorFlow 2 implementation of the MalConv model, a deep neural network for malware detection from raw byte sequences. MalConv is a convolutional neural network (CNN) designed to classify executable files as either malicious or benign. It takes the raw bytes of an entire executable file as input, making it an end-to-end, feature-free malware detection model.

Developed by: This implementation by [Your Name or Organization], based on the original work by Edward Raff, Jon Barker, Jared Sylvester, Robert Brandon, Bryan Catanzaro, and Charles Nicholas.
Model type: Binary Classification
Language(s) (NLP): Not applicable
License: MIT
Finetuned from model: Not applicable

Uses

Direct Use

This model is intended for classifying executable files as either malicious or benign.

Downstream Use

This model can be used as a base for further fine-tuning on other malware classification tasks.

Out-of-Scope Use

This model is designed for binary classification of executables and should not be used for other tasks such as malware generation or analysis of other file types.

Bias, Risks, and Limitations

This model's performance is highly dependent on the dataset it was trained on. If the training data is not representative of the types of malware you are trying to detect, the model's performance will be poor. The model may also be susceptible to adversarial attacks.

Recommendations

Users should be aware of the potential biases and limitations of the model. It is recommended to train the model on a large and diverse dataset of malware and benign files.

How to Get Started with the Model

1. Data Preparation

Before training or tuning the model, you need to prepare your dataset. You can do this in two ways:

CSV File: Create a CSV file with two columns: filepath and label. The filepath column should contain the absolute path to each executable file, and the label column should contain the corresponding label (0 for benign, 1 for malware).
Directories: Organize your malware and benign files into separate directories.

2. Training

To train the MalConv model, use the src/train.py script. You can provide the training data using either a CSV file or directories.

Using a CSV file:

python src/train.py --csv /path/to/your/data.csv

Using directories:

python src/train.py --malware_dir /path/to/malware --benign_dir /path/to/benign

The trained model will be saved to models/malconv_model.h5 by default. You can change this with the --save_path argument.

3. Prediction

To make predictions on new executable files, use the src/predict.py script.

Predicting a single file:

python src/predict.py /path/to/your/model.h5 --file /path/to/your/executable.exe

Predicting a batch of files from a CSV:

python src/predict.py /path/to/your/model.h5 --csv /path/to/your/files.csv --output /path/to/your/predictions.csv

Training Details

Training Data: This model should be trained on a large and diverse dataset of malware and benign executable files. The original paper used a dataset of 1.2 million files. Another option is the DIKE dataset, which contains both benign and malicious PE and OLE files.

Training Procedure: The model was trained using the Adam optimizer with a learning rate of 0.001 and a batch size of 512. The training procedure is described in detail in the original paper.

Evaluation

Testing Data: The model should be evaluated on a held-out test set of malware and benign executable files.

Metrics: The model's performance can be evaluated using the following metrics:

Accuracy
Precision
Recall
F1-score

Results: The results of the evaluation will depend on the dataset used. The original paper reported an AUC of 0.99.

Model Card Authors

[Seokhee Chang]

Model Card Contact

[cycloevan97@gmail.com]

References

Malware Detection by Eating a Whole EXE (arXiv:1710.09435)

Citation

If you use this code in your research, please cite the original MalConv paper:

@article{raff2017malware_arxiv,
      title={Malware Detection by Eating a Whole EXE},
      author={Edward Raff and Jon Barker and Jared Sylvester and Robert Brandon and Bryan Catanzaro and Charles Nicholas},
      year={2017},
      eprint={1710.09435},
      archivePrefix={arXiv},
      primaryClass={cs.CR}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

cycloevan
/

malconv