Upload README.md with huggingface_hub

fec1719 verified about 2 months ago

5.69 kB

	---
	license: apache-2.0
	base_model: bartpho
	tags:
	- vietnamese
	- aspect-based-sentiment-analysis
	- VLSP-ABSA
	datasets:
	- visolex/VLSP2018-ABSA-Hotel
	metrics:
	- accuracy
	- macro-f1
	model-index:
	- name: bartpho-absa-hotel
	results:
	- task:
	type: text-classification
	name: Aspect-based Sentiment Analysis
	dataset:
	name: VLSP2018-ABSA-Hotel
	type: VLSP-ABSA
	metrics:
	- type: accuracy
	value: 0.9016
	- type: macro-f1
	value: 0.1161
	- type: macro_precision
	value: 0.2827
	- type: macro_recall
	value: 0.0730
	---

	# bartpho-absa-hotel: Aspect-based Sentiment Analysis for Vietnamese Reviews

	This model is a fine-tuned version of [bartpho](https://huggingface.co/bartpho)
	on the VLSP2018-ABSA-Hotel dataset for aspect-based sentiment analysis in Vietnamese reviews.

	## Model Details

	* Base Model: bartpho
	* Description: BartPho for Vietnamese ABSA
	* Dataset: VLSP2018-ABSA-Hotel
	* Fine-tuning Framework: HuggingFace Transformers
	* Task: Aspect-based Sentiment Classification (3 classes)

	### Hyperparameters

	* Batch size: `32`
	* Learning rate: `3e-5`
	* Epochs: `100`
	* Max sequence length: `256`
	* Weight decay: `0.01`
	* Warmup steps: `500`
	* Optimizer: AdamW

	## Dataset

	Model was trained on VLSP2018 ABSA Hotel dataset for aspect-based sentiment analysis.

	### Sentiment Labels:

	* 0 - Negative (Tiêu cực): Negative opinions
	* 1 - Neutral (Trung lập): Neutral, objective opinions
	* 2 - Positive (Tích cực): Positive opinions

	### Aspect Categories:

	Model được train để phân tích sentiment cho các aspects sau:

	- FACILITIES#CLEANLINESS
	- FACILITIES#COMFORT
	- FACILITIES#DESIGN&FEATURES
	- FACILITIES#GENERAL
	- FACILITIES#MISCELLANEOUS
	- FACILITIES#PRICES
	- FACILITIES#QUALITY
	- FOOD&DRINKS#MISCELLANEOUS
	- FOOD&DRINKS#PRICES
	- FOOD&DRINKS#QUALITY
	- FOOD&DRINKS#STYLE&OPTIONS
	- HOTEL#CLEANLINESS
	- HOTEL#COMFORT
	- HOTEL#DESIGN&FEATURES
	- HOTEL#GENERAL
	- HOTEL#MISCELLANEOUS
	- HOTEL#PRICES
	- HOTEL#QUALITY
	- LOCATION#GENERAL
	- ROOMS#CLEANLINESS
	- ROOMS#COMFORT
	- ROOMS#DESIGN&FEATURES
	- ROOMS#GENERAL
	- ROOMS#MISCELLANEOUS
	- ROOMS#PRICES
	- ROOMS#QUALITY
	- ROOM_AMENITIES#CLEANLINESS
	- ROOM_AMENITIES#COMFORT
	- ROOM_AMENITIES#DESIGN&FEATURES
	- ROOM_AMENITIES#GENERAL
	- ROOM_AMENITIES#MISCELLANEOUS
	- ROOM_AMENITIES#PRICES
	- ROOM_AMENITIES#QUALITY
	- SERVICE#GENERAL

	## Evaluation Results

	The model was evaluated on test set with the following metrics:

	* Accuracy: `0.9016`
	* Macro-F1: `0.1161`
	* Weighted-F1: `0.2486`
	* Macro-Precision: `0.2827`
	* Macro-Recall: `0.0730`

	## Usage Example

	```python
	import torch
	from transformers import AutoTokenizer, AutoModel

	# Load model and tokenizer
	repo = "visolex/bartpho-absa-hotel"
	tokenizer = AutoTokenizer.from_pretrained(repo, trust_remote_code=True)
	model = AutoModel.from_pretrained(repo, trust_remote_code=True)
	model.eval()

	# Aspect labels for VLSP2018-ABSA-Hotel
	aspect_labels = [
	"FACILITIES#CLEANLINESS",
	"FACILITIES#COMFORT",
	"FACILITIES#DESIGN&FEATURES",
	"FACILITIES#GENERAL",
	"FACILITIES#MISCELLANEOUS",
	"FACILITIES#PRICES",
	"FACILITIES#QUALITY",
	"FOOD&DRINKS#MISCELLANEOUS",
	"FOOD&DRINKS#PRICES",
	"FOOD&DRINKS#QUALITY",
	"FOOD&DRINKS#STYLE&OPTIONS",
	"HOTEL#CLEANLINESS",
	"HOTEL#COMFORT",
	"HOTEL#DESIGN&FEATURES",
	"HOTEL#GENERAL",
	"HOTEL#MISCELLANEOUS",
	"HOTEL#PRICES",
	"HOTEL#QUALITY",
	"LOCATION#GENERAL",
	"ROOMS#CLEANLINESS",
	"ROOMS#COMFORT",
	"ROOMS#DESIGN&FEATURES",
	"ROOMS#GENERAL",
	"ROOMS#MISCELLANEOUS",
	"ROOMS#PRICES",
	"ROOMS#QUALITY",
	"ROOM_AMENITIES#CLEANLINESS",
	"ROOM_AMENITIES#COMFORT",
	"ROOM_AMENITIES#DESIGN&FEATURES",
	"ROOM_AMENITIES#GENERAL",
	"ROOM_AMENITIES#MISCELLANEOUS",
	"ROOM_AMENITIES#PRICES",
	"ROOM_AMENITIES#QUALITY",
	"SERVICE#GENERAL"
	]

	# Sentiment labels
	sentiment_labels = ["POSITIVE", "NEGATIVE", "NEUTRAL"]

	# Example review text
	text = "Khách sạn rất sạch sẽ, phòng ốc thoải mái nhưng giá hơi cao."

	# Tokenize
	inputs = tokenizer(
	text,
	return_tensors="pt",
	padding=True,
	truncation=True,
	max_length=256
	)
	inputs.pop("token_type_ids", None)

	# Predict
	with torch.no_grad():
	outputs = model(**inputs)

	# Get logits: shape [1, num_aspects, num_sentiments + 1]
	logits = outputs.logits.squeeze(0) # [num_aspects, num_sentiments + 1]
	probs = torch.softmax(logits, dim=-1)

	# Predict for each aspect
	none_id = probs.size(-1) - 1 # Index of "none" class
	results = []

	for i, aspect in enumerate(aspect_labels):
	prob_i = probs[i]
	pred_id = int(prob_i.argmax().item())

	if pred_id != none_id and pred_id < len(sentiment_labels):
	score = prob_i[pred_id].item()
	if score >= 0.5: # threshold
	results.append((aspect, sentiment_labels[pred_id].lower()))

	print(f"Text: {text}")
	print(f"Predicted aspects: {results}")
	# Output example: [('aspects', 'positive'), ('aspects', 'positive'), ('aspects', 'negative')]
	```

	## Citation

	If you use this model, please cite:

	```bibtex
	@misc{visolex_absa_bartpho_absa_hotel,
	title={BartPho for Vietnamese ABSA for Vietnamese Aspect-based Sentiment Analysis},
	author={ViSoLex Team},
	year={2025},
	url={https://huggingface.co/visolex/bartpho-absa-hotel}
	}
	```

	## License

	This model is released under the Apache-2.0 license.

	## Acknowledgments

	* Base model: [bartpho](https://huggingface.co/bartpho)
	* Dataset: VLSP2018-ABSA-Hotel
	* ViSoLex Toolkit

	---