add LICENSE and README
Browse files
LICENSE
ADDED
|
@@ -0,0 +1,78 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT
|
| 2 |
+
Tencent HunyuanVideo 1.5 Release Date: November 21, 2025
|
| 3 |
+
THIS LICENSE AGREEMENT DOES NOT APPLY IN THE EUROPEAN UNION, UNITED KINGDOM AND SOUTH KOREA AND IS EXPRESSLY LIMITED TO THE TERRITORY, AS DEFINED BELOW.
|
| 4 |
+
By clicking to agree or by using, reproducing, modifying, distributing, performing or displaying any portion or element of the Tencent Hunyuan Works, including via any Hosted Service, You will be deemed to have recognized and accepted the content of this Agreement, which is effective immediately.
|
| 5 |
+
1. DEFINITIONS.
|
| 6 |
+
a. “Acceptable Use Policy” shall mean the policy made available by Tencent as set forth in the Exhibit A.
|
| 7 |
+
b. “Agreement” shall mean the terms and conditions for use, reproduction, distribution, modification, performance and displaying of Tencent Hunyuan Works or any portion or element thereof set forth herein.
|
| 8 |
+
c. “Documentation” shall mean the specifications, manuals and documentation for Tencent Hunyuan made publicly available by Tencent.
|
| 9 |
+
d. “Hosted Service” shall mean a hosted service offered via an application programming interface (API), web access, or any other electronic or remote means.
|
| 10 |
+
e. “Licensee,” “You” or “Your” shall mean a natural person or legal entity exercising the rights granted by this Agreement and/or using the Tencent Hunyuan Works for any purpose and in any field of use.
|
| 11 |
+
f. “Materials” shall mean, collectively, Tencent’s proprietary Tencent Hunyuan and Documentation (and any portion thereof) as made available by Tencent under this Agreement.
|
| 12 |
+
g. “Model Derivatives” shall mean all: (i) modifications to Tencent Hunyuan or any Model Derivative of Tencent Hunyuan; (ii) works based on Tencent Hunyuan or any Model Derivative of Tencent Hunyuan; or (iii) any other machine learning model which is created by transfer of patterns of the weights, parameters, operations, or Output of Tencent Hunyuan or any Model Derivative of Tencent Hunyuan, to that model in order to cause that model to perform similarly to Tencent Hunyuan or a Model Derivative of Tencent Hunyuan, including distillation methods, methods that use intermediate data representations, or methods based on the generation of synthetic data Outputs by Tencent Hunyuan or a Model Derivative of Tencent Hunyuan for training that model. For clarity, Outputs by themselves are not deemed Model Derivatives.
|
| 13 |
+
h. “Output” shall mean the information and/or content output of Tencent Hunyuan or a Model Derivative that results from operating or otherwise using Tencent Hunyuan or a Model Derivative, including via a Hosted Service.
|
| 14 |
+
i. “Tencent,” “We” or “Us” shall mean the applicable entity or entities in the Tencent corporate family that own(s) intellectual property or other rights embodied in or utilized by the Materials.
|
| 15 |
+
j. “Tencent Hunyuan” shall mean the large language models, text/image/video/audio/3D generation models, and multimodal large language models and their software and algorithms, including trained model weights, parameters (including optimizer states), machine-learning model code, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing made publicly available by Us, including, without limitation to, Tencent HunyuanVideo 1.5 released at [https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5, https://huggingface.co/tencent/HunyuanVideo-1.5].
|
| 16 |
+
k. “Tencent Hunyuan Works” shall mean: (i) the Materials; (ii) Model Derivatives; and (iii) all derivative works thereof.
|
| 17 |
+
l. “Territory” shall mean the worldwide territory, excluding the territory of the European Union, United Kingdom and South Korea.
|
| 18 |
+
m. “Third Party” or “Third Parties” shall mean individuals or legal entities that are not under common control with Us or You.
|
| 19 |
+
n. “including” shall mean including but not limited to.
|
| 20 |
+
2. GRANT OF RIGHTS.
|
| 21 |
+
We grant You, for the Territory only, a non-exclusive, non-transferable and royalty-free limited license under Tencent’s intellectual property or other rights owned by Us embodied in or utilized by the Materials to use, reproduce, distribute, create derivative works of (including Model Derivatives), and make modifications to the Materials, only in accordance with the terms of this Agreement and the Acceptable Use Policy, and You must not violate (or encourage or permit anyone else to violate) any term of this Agreement or the Acceptable Use Policy.
|
| 22 |
+
3. DISTRIBUTION.
|
| 23 |
+
You may, subject to Your compliance with this Agreement, distribute or make available to Third Parties the Tencent Hunyuan Works, exclusively in the Territory, provided that You meet all of the following conditions:
|
| 24 |
+
a. You must provide all such Third Party recipients of the Tencent Hunyuan Works or products or services using them a copy of this Agreement;
|
| 25 |
+
b. You must cause any modified files to carry prominent notices stating that You changed the files;
|
| 26 |
+
c. You are encouraged to: (i) publish at least one technology introduction blogpost or one public statement expressing Your experience of using the Tencent Hunyuan Works; and (ii) mark the products or services developed by using the Tencent Hunyuan Works to indicate that the product/service is “Powered by Tencent Hunyuan”; and
|
| 27 |
+
d. All distributions to Third Parties (other than through a Hosted Service) must be accompanied by a “Notice” text file that contains the following notice: “Tencent Hunyuan is licensed under the Tencent Hunyuan Community License Agreement, Copyright © 2025 Tencent. All Rights Reserved. The trademark rights of “Tencent Hunyuan” are owned by Tencent or its affiliate.”
|
| 28 |
+
e. In the event that You use, integrate, implement, or otherwise deploy the Tencent Hunyuan Works, in whole or in part, to provide, enable, or support any service, product, or functionality to third parties, You shall clearly, accurately, and prominently disclose to all end users the full legal name and entity of the actual provider of such service, product, or functionality. You shall expressly and conspicuously state that Tencent is not affiliated with, associated with, sponsoring, or endorsing any such service, product, or functionality. You shall not use or display any name, logo, trademark, trade name, or other indicia of Tencent in any manner that could be construed as, or be likely to create, confusion, deception, or a false impression regarding any relationship, affiliation, sponsorship, or endorsement by Tencent.
|
| 29 |
+
You may add Your own copyright statement to Your modifications and, except as set forth in this Section and in Section 5, may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Model Derivatives as a whole, provided Your use, reproduction, modification, distribution, performance and display of the work otherwise complies with the terms and conditions of this Agreement (including as regards the Territory). If You receive Tencent Hunyuan Works from a Licensee as part of an integrated end user product, then this Section 3 of this Agreement will not apply to You.
|
| 30 |
+
4. ADDITIONAL COMMERCIAL TERMS.
|
| 31 |
+
If, on the Tencent Hunyuan version release date, the monthly active users of all products or services made available by or for Licensee is greater than 100 million monthly active users in the preceding calendar month, You must request a license from Tencent, which Tencent may grant to You in its sole discretion, and You are not authorized to exercise any of the rights under this Agreement unless or until Tencent otherwise expressly grants You such rights.
|
| 32 |
+
5. RULES OF USE.
|
| 33 |
+
a. Your use of the Tencent Hunyuan Works must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Tencent Hunyuan Works, which is hereby incorporated by reference into this Agreement. You must include the use restrictions referenced in these Sections 5(a) and 5(b) as an enforceable provision in any agreement (e.g., license agreement, terms of use, etc.) governing the use and/or distribution of Tencent Hunyuan Works and You must provide notice to subsequent users to whom You distribute that Tencent Hunyuan Works are subject to the use restrictions in these Sections 5(a) and 5(b).
|
| 34 |
+
b. You must not use the Tencent Hunyuan Works or any Output or results of the Tencent Hunyuan Works to improve any other AI model (other than Tencent Hunyuan or Model Derivatives thereof).
|
| 35 |
+
c. You must not use, reproduce, modify, distribute, or display the Tencent Hunyuan Works, Output or results of the Tencent Hunyuan Works outside the Territory. Any such use outside the Territory is unlicensed and unauthorized under this Agreement.
|
| 36 |
+
6. INTELLECTUAL PROPERTY.
|
| 37 |
+
a. Subject to Tencent’s ownership of Tencent Hunyuan Works made by or for Tencent and intellectual property rights therein, conditioned upon Your compliance with the terms and conditions of this Agreement, as between You and Tencent, You will be the owner of any derivative works and modifications of the Materials and any Model Derivatives that are made by or for You.
|
| 38 |
+
b. No trademark licenses are granted under this Agreement, and in connection with the Tencent Hunyuan Works, Licensee may not use any name or mark owned by or associated with Tencent or any of its affiliates, except as required for reasonable and customary use in describing and distributing the Tencent Hunyuan Works. Tencent hereby grants You a license to use “Tencent Hunyuan” (the “Mark”) in the Territory solely as required to comply with the provisions of Section 3(c), provided that You comply with any applicable laws related to trademark protection. All goodwill arising out of Your use of the Mark will inure to the benefit of Tencent.
|
| 39 |
+
c. If You commence a lawsuit or other proceedings (including a cross-claim or counterclaim in a lawsuit) against Us or any person or entity alleging that the Materials or any Output, or any portion of any of the foregoing, infringe any intellectual property or other right owned or licensable by You, then all licenses granted to You under this Agreement shall terminate as of the date such lawsuit or other proceeding is filed. You will defend, indemnify and hold harmless Us from and against any claim by any Third Party arising out of or related to Your or the Third Party’s use or distribution of the Tencent Hunyuan Works.
|
| 40 |
+
d. Tencent claims no rights in Outputs You generate. You and Your users are solely responsible for Outputs and their subsequent uses.
|
| 41 |
+
7. DISCLAIMERS OF WARRANTY AND LIMITATIONS OF LIABILITY.
|
| 42 |
+
a. We are not obligated to support, update, provide training for, or develop any further version of the Tencent Hunyuan Works or to grant any license thereto.
|
| 43 |
+
b. UNLESS AND ONLY TO THE EXTENT REQUIRED BY APPLICABLE LAW, THE TENCENT HUNYUAN WORKS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED “AS IS” WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES OF ANY KIND INCLUDING ANY WARRANTIES OF TITLE, MERCHANTABILITY, NONINFRINGEMENT, COURSE OF DEALING, USAGE OF TRADE, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING, REPRODUCING, MODIFYING, PERFORMING, DISPLAYING OR DISTRIBUTING ANY OF THE TENCENT HUNYUAN WORKS OR OUTPUTS AND ASSUME ANY AND ALL RISKS ASSOCIATED WITH YOUR OR A THIRD PARTY’S USE OR DISTRIBUTION OF ANY OF THE TENCENT HUNYUAN WORKS OR OUTPUTS AND YOUR EXERCISE OF RIGHTS AND PERMISSIONS UNDER THIS AGREEMENT.
|
| 44 |
+
c. TO THE FULLEST EXTENT PERMITTED BY APPLICABLE LAW, IN NO EVENT SHALL TENCENT OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, FOR ANY DAMAGES, INCLUDING ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, EXEMPLARY, CONSEQUENTIAL OR PUNITIVE DAMAGES, OR LOST PROFITS OF ANY KIND ARISING FROM THIS AGREEMENT OR RELATED TO ANY OF THE TENCENT HUNYUAN WORKS OR OUTPUTS, EVEN IF TENCENT OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING.
|
| 45 |
+
8. SURVIVAL AND TERMINATION.
|
| 46 |
+
a. The term of this Agreement shall commence upon Your acceptance of this Agreement or access to the Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein.
|
| 47 |
+
b. We may terminate this Agreement if You breach any of the terms or conditions of this Agreement. Upon termination of this Agreement, You must promptly delete and cease use of the Tencent Hunyuan Works. Sections 6(a), 6(c), 7 and 9 shall survive the termination of this Agreement.
|
| 48 |
+
9. GOVERNING LAW AND JURISDICTION.
|
| 49 |
+
a. This Agreement and any dispute arising out of or relating to it will be governed by the laws of the Hong Kong Special Administrative Region of the People’s Republic of China, without regard to conflict of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement.
|
| 50 |
+
b. Exclusive jurisdiction and venue for any dispute arising out of or relating to this Agreement will be a court of competent jurisdiction in the Hong Kong Special Administrative Region of the People’s Republic of China, and Tencent and Licensee consent to the exclusive jurisdiction of such court with respect to any such dispute.
|
| 51 |
+
|
| 52 |
+
EXHIBIT A
|
| 53 |
+
ACCEPTABLE USE POLICY
|
| 54 |
+
|
| 55 |
+
Tencent reserves the right to update this Acceptable Use Policy from time to time.
|
| 56 |
+
Last modified: November 5, 2024
|
| 57 |
+
|
| 58 |
+
Tencent endeavors to promote safe and fair use of its tools and features, including Tencent Hunyuan. You agree not to use Tencent Hunyuan or Model Derivatives:
|
| 59 |
+
1. Outside the Territory;
|
| 60 |
+
2. In any way that violates any applicable national, federal, state, local, international or any other law or regulation;
|
| 61 |
+
3. To harm Yourself or others;
|
| 62 |
+
4. To repurpose or distribute output from Tencent Hunyuan or any Model Derivatives to harm Yourself or others;
|
| 63 |
+
5. To override or circumvent the safety guardrails and safeguards We have put in place;
|
| 64 |
+
6. For the purpose of exploiting, harming or attempting to exploit or harm minors in any way;
|
| 65 |
+
7. To generate or disseminate verifiably false information and/or content with the purpose of harming others or influencing elections;
|
| 66 |
+
8. To generate or facilitate false online engagement, including fake reviews and other means of fake online engagement;
|
| 67 |
+
9. To intentionally defame, disparage or otherwise harass others;
|
| 68 |
+
10. To generate and/or disseminate malware (including ransomware) or any other content to be used for the purpose of harming electronic systems;
|
| 69 |
+
11. To generate or disseminate personal identifiable information with the purpose of harming others;
|
| 70 |
+
12. To generate or disseminate information (including images, code, posts, articles), and place the information in any public context (including –through the use of bot generated tweets), without expressly and conspicuously identifying that the information and/or content is machine generated;
|
| 71 |
+
13. To impersonate another individual without consent, authorization, or legal right;
|
| 72 |
+
14. To make high-stakes automated decisions in domains that affect an individual’s safety, rights or wellbeing (e.g., law enforcement, migration, medicine/health, management of critical infrastructure, safety components of products, essential services, credit, employment, housing, education, social scoring, or insurance);
|
| 73 |
+
15. In a manner that violates or disrespects the social ethics and moral standards of other countries or regions;
|
| 74 |
+
16. To perform, facilitate, threaten, incite, plan, promote or encourage violent extremism or terrorism;
|
| 75 |
+
17. For any use intended to discriminate against or harm individuals or groups based on protected characteristics or categories, online or offline social behavior or known or predicted personal or personality characteristics;
|
| 76 |
+
18. To intentionally exploit any of the vulnerabilities of a specific group of persons based on their age, social, physical or mental characteristics, in order to materially distort the behavior of a person pertaining to that group in a manner that causes or is likely to cause that person or another person physical or psychological harm;
|
| 77 |
+
19. For military purposes;
|
| 78 |
+
20. To engage in the unauthorized or unlicensed practice of any profession including, but not limited to, financial, legal, medical/health, or other professional practices.
|
NOTICE
ADDED
|
@@ -0,0 +1,160 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
This project is built on and with the aid of the following open source projects. Credits are given to these projects.
|
| 2 |
+
|
| 3 |
+
In case you believe there have been errors in the attribution below, you may submit the concerns to us for review and correction.
|
| 4 |
+
|
| 5 |
+
The below software in this distribution may have been modified by Tencent ("Tencent Modifications"). All Tencent Modifications
|
| 6 |
+
are Copyright(C)Tencent.
|
| 7 |
+
|
| 8 |
+
|
| 9 |
+
Open Source Software Licensed under the Apache-2.0:
|
| 10 |
+
--------------------------------------------------------------------
|
| 11 |
+
1. Code from Glyph-ByT5
|
| 12 |
+
Copyright (c) 2025 Glyph-ByT5 original author and authors
|
| 13 |
+
|
| 14 |
+
2.flex block attn
|
| 15 |
+
Copyright (C) 2025 Tencent. All rights reserved.
|
| 16 |
+
|
| 17 |
+
Terms of the Apache-2.0:
|
| 18 |
+
--------------------------------------------------------------------
|
| 19 |
+
Apache License
|
| 20 |
+
Version 2.0, January 2004
|
| 21 |
+
http://www.apache.org/licenses/
|
| 22 |
+
|
| 23 |
+
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
|
| 24 |
+
|
| 25 |
+
1. Definitions.
|
| 26 |
+
|
| 27 |
+
"License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document.
|
| 28 |
+
|
| 29 |
+
"Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License.
|
| 30 |
+
|
| 31 |
+
"Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity.
|
| 32 |
+
|
| 33 |
+
"You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License.
|
| 34 |
+
|
| 35 |
+
"Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files.
|
| 36 |
+
|
| 37 |
+
"Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types.
|
| 38 |
+
|
| 39 |
+
"Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below).
|
| 40 |
+
|
| 41 |
+
"Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof.
|
| 42 |
+
|
| 43 |
+
"Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution."
|
| 44 |
+
|
| 45 |
+
"Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work.
|
| 46 |
+
|
| 47 |
+
2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form.
|
| 48 |
+
|
| 49 |
+
3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed.
|
| 50 |
+
|
| 51 |
+
4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions:
|
| 52 |
+
|
| 53 |
+
You must give any other recipients of the Work or Derivative Works a copy of this License; and
|
| 54 |
+
You must cause any modified files to carry prominent notices stating that You changed the files; and
|
| 55 |
+
You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and
|
| 56 |
+
If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License.
|
| 57 |
+
You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License.
|
| 58 |
+
|
| 59 |
+
5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions.
|
| 60 |
+
|
| 61 |
+
6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file.
|
| 62 |
+
|
| 63 |
+
7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License.
|
| 64 |
+
|
| 65 |
+
8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages.
|
| 66 |
+
|
| 67 |
+
9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability.
|
| 68 |
+
|
| 69 |
+
END OF TERMS AND CONDITIONS
|
| 70 |
+
|
| 71 |
+
|
| 72 |
+
|
| 73 |
+
|
| 74 |
+
|
| 75 |
+
Open Source Software Licensed under the LicenseRef-Tencent-Hunyuan-Community-License:
|
| 76 |
+
--------------------------------------------------------------------
|
| 77 |
+
1. Code from HunyuanVideo
|
| 78 |
+
Copyright © 2024 Tencent
|
| 79 |
+
Terms of the LicenseRef-Tencent-Hunyuan-Community-License:
|
| 80 |
+
--------------------------------------------------------------------
|
| 81 |
+
TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT
|
| 82 |
+
Tencent HunyuanVideo Release Date: December 3, 2024
|
| 83 |
+
THIS LICENSE AGREEMENT DOES NOT APPLY IN THE EUROPEAN UNION AND IS EXPRESSLY LIMITED TO THE TERRITORY, AS DEFINED BELOW.
|
| 84 |
+
By clicking to agree or by using, reproducing, modifying, distributing, performing or displaying any portion or element of the Tencent Hunyuan Works, including via any Hosted Service, You will be deemed to have recognized and accepted the content of this Agreement, which is effective immediately.
|
| 85 |
+
1. DEFINITIONS.
|
| 86 |
+
a. “Acceptable Use Policy” shall mean the policy made available by Tencent as set forth in the Exhibit A.
|
| 87 |
+
b. “Agreement” shall mean the terms and conditions for use, reproduction, distribution, modification, performance and displaying of Tencent Hunyuan Works or any portion or element thereof set forth herein.
|
| 88 |
+
c. “Documentation” shall mean the specifications, manuals and documentation for Tencent Hunyuan made publicly available by Tencent.
|
| 89 |
+
d. “Hosted Service” shall mean a hosted service offered via an application programming interface (API), web access, or any other electronic or remote means.
|
| 90 |
+
e. “Licensee,” “You” or “Your” shall mean a natural person or legal entity exercising the rights granted by this Agreement and/or using the Tencent Hunyuan Works for any purpose and in any field of use.
|
| 91 |
+
f. “Materials” shall mean, collectively, Tencent’s proprietary Tencent Hunyuan and Documentation (and any portion thereof) as made available by Tencent under this Agreement.
|
| 92 |
+
g. “Model Derivatives” shall mean all: (i) modifications to Tencent Hunyuan or any Model Derivative of Tencent Hunyuan; (ii) works based on Tencent Hunyuan or any Model Derivative of Tencent Hunyuan; or (iii) any other machine learning model which is created by transfer of patterns of the weights, parameters, operations, or Output of Tencent Hunyuan or any Model Derivative of Tencent Hunyuan, to that model in order to cause that model to perform similarly to Tencent Hunyuan or a Model Derivative of Tencent Hunyuan, including distillation methods, methods that use intermediate data representations, or methods based on the generation of synthetic data Outputs by Tencent Hunyuan or a Model Derivative of Tencent Hunyuan for training that model. For clarity, Outputs by themselves are not deemed Model Derivatives.
|
| 93 |
+
h. “Output” shall mean the information and/or content output of Tencent Hunyuan or a Model Derivative that results from operating or otherwise using Tencent Hunyuan or a Model Derivative, including via a Hosted Service.
|
| 94 |
+
i. “Tencent,” “We” or “Us” shall mean THL A29 Limited.
|
| 95 |
+
j. “Tencent Hunyuan” shall mean the large language models, text/image/video/audio/3D generation models, and multimodal large language models and their software and algorithms, including trained model weights, parameters (including optimizer states), machine-learning model code, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing made publicly available by Us, including, without limitation to, Tencent HunyuanVideo released at [https://github.com/Tencent/HunyuanVideo].
|
| 96 |
+
k. “Tencent Hunyuan Works” shall mean: (i) the Materials; (ii) Model Derivatives; and (iii) all derivative works thereof.
|
| 97 |
+
l. “Territory” shall mean the worldwide territory, excluding the territory of the European Union.
|
| 98 |
+
m. “Third Party” or “Third Parties” shall mean individuals or legal entities that are not under common control with Us or You.
|
| 99 |
+
n. “including” shall mean including but not limited to.
|
| 100 |
+
2. GRANT OF RIGHTS.
|
| 101 |
+
We grant You, for the Territory only, a non-exclusive, non-transferable and royalty-free limited license under Tencent’s intellectual property or other rights owned by Us embodied in or utilized by the Materials to use, reproduce, distribute, create derivative works of (including Model Derivatives), and make modifications to the Materials, only in accordance with the terms of this Agreement and the Acceptable Use Policy, and You must not violate (or encourage or permit anyone else to violate) any term of this Agreement or the Acceptable Use Policy.
|
| 102 |
+
3. DISTRIBUTION.
|
| 103 |
+
You may, subject to Your compliance with this Agreement, distribute or make available to Third Parties the Tencent Hunyuan Works, exclusively in the Territory, provided that You meet all of the following conditions:
|
| 104 |
+
a. You must provide all such Third Party recipients of the Tencent Hunyuan Works or products or services using them a copy of this Agreement;
|
| 105 |
+
b. You must cause any modified files to carry prominent notices stating that You changed the files;
|
| 106 |
+
c. You are encouraged to: (i) publish at least one technology introduction blogpost or one public statement expressing Your experience of using the Tencent Hunyuan Works; and (ii) mark the products or services developed by using the Tencent Hunyuan Works to indicate that the product/service is “Powered by Tencent Hunyuan”; and
|
| 107 |
+
d. All distributions to Third Parties (other than through a Hosted Service) must be accompanied by a “Notice” text file that contains the following notice: “Tencent Hunyuan is licensed under the Tencent Hunyuan Community License Agreement, Copyright © 2024 Tencent. All Rights Reserved. The trademark rights of “Tencent Hunyuan” are owned by Tencent or its affiliate.”
|
| 108 |
+
You may add Your own copyright statement to Your modifications and, except as set forth in this Section and in Section 5, may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Model Derivatives as a whole, provided Your use, reproduction, modification, distribution, performance and display of the work otherwise complies with the terms and conditions of this Agreement (including as regards the Territory). If You receive Tencent Hunyuan Works from a Licensee as part of an integrated end user product, then this Section 3 of this Agreement will not apply to You.
|
| 109 |
+
4. ADDITIONAL COMMERCIAL TERMS.
|
| 110 |
+
If, on the Tencent Hunyuan version release date, the monthly active users of all products or services made available by or for Licensee is greater than 100 million monthly active users in the preceding calendar month, You must request a license from Tencent, which Tencent may grant to You in its sole discretion, and You are not authorized to exercise any of the rights under this Agreement unless or until Tencent otherwise expressly grants You such rights.
|
| 111 |
+
5. RULES OF USE.
|
| 112 |
+
a. Your use of the Tencent Hunyuan Works must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Tencent Hunyuan Works, which is hereby incorporated by reference into this Agreement. You must include the use restrictions referenced in these Sections 5(a) and 5(b) as an enforceable provision in any agreement (e.g., license agreement, terms of use, etc.) governing the use and/or distribution of Tencent Hunyuan Works and You must provide notice to subsequent users to whom You distribute that Tencent Hunyuan Works are subject to the use restrictions in these Sections 5(a) and 5(b).
|
| 113 |
+
b. You must not use the Tencent Hunyuan Works or any Output or results of the Tencent Hunyuan Works to improve any other AI model (other than Tencent Hunyuan or Model Derivatives thereof).
|
| 114 |
+
c. You must not use, reproduce, modify, distribute, or display the Tencent Hunyuan Works, Output or results of the Tencent Hunyuan Works outside the Territory. Any such use outside the Territory is unlicensed and unauthorized under this Agreement.
|
| 115 |
+
6. INTELLECTUAL PROPERTY.
|
| 116 |
+
a. Subject to Tencent’s ownership of Tencent Hunyuan Works made by or for Tencent and intellectual property rights therein, conditioned upon Your compliance with the terms and conditions of this Agreement, as between You and Tencent, You will be the owner of any derivative works and modifications of the Materials and any Model Derivatives that are made by or for You.
|
| 117 |
+
b. No trademark licenses are granted under this Agreement, and in connection with the Tencent Hunyuan Works, Licensee may not use any name or mark owned by or associated with Tencent or any of its affiliates, except as required for reasonable and customary use in describing and distributing the Tencent Hunyuan Works. Tencent hereby grants You a license to use “Tencent Hunyuan” (the “Mark”) in the Territory solely as required to comply with the provisions of Section 3(c), provided that You comply with any applicable laws related to trademark protection. All goodwill arising out of Your use of the Mark will inure to the benefit of Tencent.
|
| 118 |
+
c. If You commence a lawsuit or other proceedings (including a cross-claim or counterclaim in a lawsuit) against Us or any person or entity alleging that the Materials or any Output, or any portion of any of the foregoing, infringe any intellectual property or other right owned or licensable by You, then all licenses granted to You under this Agreement shall terminate as of the date such lawsuit or other proceeding is filed. You will defend, indemnify and hold harmless Us from and against any claim by any Third Party arising out of or related to Your or the Third Party’s use or distribution of the Tencent Hunyuan Works.
|
| 119 |
+
d. Tencent claims no rights in Outputs You generate. You and Your users are solely responsible for Outputs and their subsequent uses.
|
| 120 |
+
7. DISCLAIMERS OF WARRANTY AND LIMITATIONS OF LIABILITY.
|
| 121 |
+
a. We are not obligated to support, update, provide training for, or develop any further version of the Tencent Hunyuan Works or to grant any license thereto.
|
| 122 |
+
b. UNLESS AND ONLY TO THE EXTENT REQUIRED BY APPLICABLE LAW, THE TENCENT HUNYUAN WORKS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED “AS IS” WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES OF ANY KIND INCLUDING ANY WARRANTIES OF TITLE, MERCHANTABILITY, NONINFRINGEMENT, COURSE OF DEALING, USAGE OF TRADE, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING, REPRODUCING, MODIFYING, PERFORMING, DISPLAYING OR DISTRIBUTING ANY OF THE TENCENT HUNYUAN WORKS OR OUTPUTS AND ASSUME ANY AND ALL RISKS ASSOCIATED WITH YOUR OR A THIRD PARTY’S USE OR DISTRIBUTION OF ANY OF THE TENCENT HUNYUAN WORKS OR OUTPUTS AND YOUR EXERCISE OF RIGHTS AND PERMISSIONS UNDER THIS AGREEMENT.
|
| 123 |
+
c. TO THE FULLEST EXTENT PERMITTED BY APPLICABLE LAW, IN NO EVENT SHALL TENCENT OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, FOR ANY DAMAGES, INCLUDING ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, EXEMPLARY, CONSEQUENTIAL OR PUNITIVE DAMAGES, OR LOST PROFITS OF ANY KIND ARISING FROM THIS AGREEMENT OR RELATED TO ANY OF THE TENCENT HUNYUAN WORKS OR OUTPUTS, EVEN IF TENCENT OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING.
|
| 124 |
+
8. SURVIVAL AND TERMINATION.
|
| 125 |
+
a. The term of this Agreement shall commence upon Your acceptance of this Agreement or access to the Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein.
|
| 126 |
+
b. We may terminate this Agreement if You breach any of the terms or conditions of this Agreement. Upon termination of this Agreement, You must promptly delete and cease use of the Tencent Hunyuan Works. Sections 6(a), 6(c), 7 and 9 shall survive the termination of this Agreement.
|
| 127 |
+
9. GOVERNING LAW AND JURISDICTION.
|
| 128 |
+
a. This Agreement and any dispute arising out of or relating to it will be governed by the laws of the Hong Kong Special Administrative Region of the People’s Republic of China, without regard to conflict of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement.
|
| 129 |
+
b. Exclusive jurisdiction and venue for any dispute arising out of or relating to this Agreement will be a court of competent jurisdiction in the Hong Kong Special Administrative Region of the People’s Republic of China, and Tencent and Licensee consent to the exclusive jurisdiction of such court with respect to any such dispute.
|
| 130 |
+
|
| 131 |
+
EXHIBIT A
|
| 132 |
+
ACCEPTABLE USE POLICY
|
| 133 |
+
|
| 134 |
+
Tencent reserves the right to update this Acceptable Use Policy from time to time.
|
| 135 |
+
Last modified: November 5, 2024
|
| 136 |
+
|
| 137 |
+
Tencent endeavors to promote safe and fair use of its tools and features, including Tencent Hunyuan. You agree not to use Tencent Hunyuan or Model Derivatives:
|
| 138 |
+
1. Outside the Territory;
|
| 139 |
+
2. In any way that violates any applicable national, federal, state, local, international or any other law or regulation;
|
| 140 |
+
3. To harm Yourself or others;
|
| 141 |
+
4. To repurpose or distribute output from Tencent Hunyuan or any Model Derivatives to harm Yourself or others;
|
| 142 |
+
5. To override or circumvent the safety guardrails and safeguards We have put in place;
|
| 143 |
+
6. For the purpose of exploiting, harming or attempting to exploit or harm minors in any way;
|
| 144 |
+
7. To generate or disseminate verifiably false information and/or content with the purpose of harming others or influencing elections;
|
| 145 |
+
8. To generate or facilitate false online engagement, including fake reviews and other means of fake online engagement;
|
| 146 |
+
9. To intentionally defame, disparage or otherwise harass others;
|
| 147 |
+
10. To generate and/or disseminate malware (including ransomware) or any other content to be used for the purpose of harming electronic systems;
|
| 148 |
+
11. To generate or disseminate personal identifiable information with the purpose of harming others;
|
| 149 |
+
12. To generate or disseminate information (including images, code, posts, articles), and place the information in any public context (including –through the use of bot generated tweets), without expressly and conspicuously identifying that the information and/or content is machine generated;
|
| 150 |
+
13. To impersonate another individual without consent, authorization, or legal right;
|
| 151 |
+
14. To make high-stakes automated decisions in domains that affect an individual’s safety, rights or wellbeing (e.g., law enforcement, migration, medicine/health, management of critical infrastructure, safety components of products, essential services, credit, employment, housing, education, social scoring, or insurance);
|
| 152 |
+
15. In a manner that violates or disrespects the social ethics and moral standards of other countries or regions;
|
| 153 |
+
16. To perform, facilitate, threaten, incite, plan, promote or encourage violent extremism or terrorism;
|
| 154 |
+
17. For any use intended to discriminate against or harm individuals or groups based on protected characteristics or categories, online or offline social behavior or known or predicted personal or personality characteristics;
|
| 155 |
+
18. To intentionally exploit any of the vulnerabilities of a specific group of persons based on their age, social, physical or mental characteristics, in order to materially distort the behavior of a person pertaining to that group in a manner that causes or is likely to cause that person or another person physical or psychological harm;
|
| 156 |
+
19. For military purposes;
|
| 157 |
+
20. To engage in the unauthorized or unlicensed practice of any profession including, but not limited to, financial, legal, medical/health, or other professional practices.
|
| 158 |
+
|
| 159 |
+
==================================================
|
| 160 |
+
End of the Attribution Notice of this project.
|
README.md
ADDED
|
@@ -0,0 +1,326 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[中文文档](./README_CN.md)
|
| 2 |
+
|
| 3 |
+
# HunyuanVideo-1.5
|
| 4 |
+
|
| 5 |
+
<div align="center">
|
| 6 |
+
|
| 7 |
+
<img src="./assets/logo.png" alt="HunyuanVideo-1.5 Logo" width="80%">
|
| 8 |
+
|
| 9 |
+
# 🎬 HunyuanVideo-1.5: A leading lightweight video generation model
|
| 10 |
+
|
| 11 |
+
</div>
|
| 12 |
+
|
| 13 |
+
|
| 14 |
+
<div align="center">
|
| 15 |
+
<!-- <img src="./assets/banner.png" alt="HunyuanVideo-1.5 Banner" width="800"> -->
|
| 16 |
+
|
| 17 |
+
</div>
|
| 18 |
+
|
| 19 |
+
|
| 20 |
+
HunyuanVideo-1.5 is a video generation model that delivers top-tier quality with only 8.3B parameters, significantly lowering the barrier to usage. It runs smoothly on consumer-grade GPUs, making it accessible for every developer and creator. This repository provides the implementation and tools needed to generate creative videos.
|
| 21 |
+
|
| 22 |
+
|
| 23 |
+
<div align="center">
|
| 24 |
+
<a href="https://hunyuan.tencent.com/video/zh?tabIndex=0" target="_blank"><img src=https://img.shields.io/badge/Official%20Site-333399.svg?logo=homepage height=22px></a>
|
| 25 |
+
<a href=https://huggingface.co/tencent/HunyuanVideo-1.5 target="_blank"><img src=https://img.shields.io/badge/%F0%9F%A4%97%20Models-d96902.svg height=22px></a>
|
| 26 |
+
<a href=https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5 target="_blank"><img src= https://img.shields.io/badge/Page-bb8a2e.svg?logo=github height=22px></a>
|
| 27 |
+
<a href="https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5/blob/main/assets/HunyuanVideo_1_5.pdf" target="_blank"><img src=https://img.shields.io/badge/Report-b5212f.svg?logo=arxiv height=22px></a>
|
| 28 |
+
<a href=https://x.com/TencentHunyuan target="_blank"><img src=https://img.shields.io/badge/Hunyuan-black.svg?logo=x height=22px></a>
|
| 29 |
+
</div>
|
| 30 |
+
|
| 31 |
+
|
| 32 |
+
<p align="center">
|
| 33 |
+
👏 Join our <a href="./assets/wechat.png" target="_blank">WeChat</a> and <a href="https://discord.gg/ehjWMqF5wY">Discord</a> |
|
| 34 |
+
💻 <a href="https://hunyuan.tencent.com/video/zh?tabIndex=0">Official website Try our model!</a>  
|
| 35 |
+
</p>
|
| 36 |
+
|
| 37 |
+
## 🔥🔥🔥 News
|
| 38 |
+
👋 Nov 20, 2025: We release the inference code and model weights of HunyuanVideo.
|
| 39 |
+
|
| 40 |
+
|
| 41 |
+
## 🎥 Demo
|
| 42 |
+
<div align="center">
|
| 43 |
+
<video src="https://github.com/user-attachments/assets/d45ec78e-ea40-47f1-8d4d-f4d9a0682e2d" width="60%"> </video>
|
| 44 |
+
</div>
|
| 45 |
+
|
| 46 |
+
## 🧩 Community Contributions
|
| 47 |
+
|
| 48 |
+
If you develop/use HunyuanVideo in your projects, welcome to let us know.
|
| 49 |
+
|
| 50 |
+
- **ComfyUI** - [ComfyUI](https://github.com/comfyanonymous/ComfyUI): A powerful and modular diffusion model GUI with a graph/nodes interface. ComfyUI supports HunyuanVideo-1.5 with various engineering optimizations for fast inference.
|
| 51 |
+
|
| 52 |
+
- **LightX2V** - [LightX2V](https://github.com/ModelTC/LightX2V): A lightweight and efficient video generation framework that integrates HunyuanVideo-1.5, supporting multiple engineering acceleration techniques for fast inference.
|
| 53 |
+
|
| 54 |
+
## 📑 Open-source Plan
|
| 55 |
+
- HunyuanVideo-1.5 (T2V/I2V)
|
| 56 |
+
- [x] Inference Code and checkpoints
|
| 57 |
+
- [ ] Diffusers Support
|
| 58 |
+
- [ ] Release all model weights (Sparse attention, distill model, and SR models)
|
| 59 |
+
|
| 60 |
+
## 📋 Table of Contents
|
| 61 |
+
- [🔥🔥🔥 News](#-news)
|
| 62 |
+
- [🎥 Demo](#-demo)
|
| 63 |
+
- [🧩 Community Contributions](#-community-contributions)
|
| 64 |
+
- [📑 Open-source Plan](#-open-source-plan)
|
| 65 |
+
- [📖 Introduction](#-introduction)
|
| 66 |
+
- [✨ Key Features](#-key-features)
|
| 67 |
+
- [📜 System Requirements](#-system-requirements)
|
| 68 |
+
- [🛠️ Dependencies and Installation](#️-dependencies-and-installation)
|
| 69 |
+
- [🧱 Download Pretrained Models](#-download-pretrained-models)
|
| 70 |
+
- [📝 Prompt Guide](#-prompt-guide)
|
| 71 |
+
- [🔑 Usage](#-usage)
|
| 72 |
+
- [Prompt Enhancement](#prompt-enhancement)
|
| 73 |
+
- [Text to Video](#text-to-video)
|
| 74 |
+
- [Image to Video](#image-to-video)
|
| 75 |
+
- [Command Line Arguments](#command-line-arguments)
|
| 76 |
+
- [🧱 Models Cards](#-models-cards)
|
| 77 |
+
- [🎬 More Examples](#-more-examples)
|
| 78 |
+
- [📊 Evaluation](#-evaluation)
|
| 79 |
+
- [📚 Citation](#-citation)
|
| 80 |
+
- [🙏 Acknowledgements](#-acknowledgements)
|
| 81 |
+
- [🌟 Github Star History](#-github-star-history)
|
| 82 |
+
|
| 83 |
+
|
| 84 |
+
## 📖 Introduction
|
| 85 |
+
We present HunyuanVideo 1.5, a lightweight yet powerful video generation model that achieves state-of-the-art visual quality and motion coherence with only 8.3 billion parameters, enabling efficient inference on consumer-grade GPUs. This achievement is built upon several key components, including meticulous data curation, an advanced DiT architecture with selective and sliding tile attention(SSTA), enhanced bilingual understanding through glyph-aware text encoding , progressive pre-training and post-training, and an efficient video super-resolution network. Leveraging these designs, we developed a unified framework capable of high-quality text-to-video and image-to-video generation across multiple durations and resolutions. Extensive experiments demonstrate that this compact and proficient model establishes a new state-of-the-art among open-source models. By releasing the code and weights of HunyuanVideo 1.5, we provide the community with a high-performance foundation that significantly lowers the cost of video creation and research, making advanced video generation more accessible to all.
|
| 86 |
+
|
| 87 |
+
|
| 88 |
+
## ✨ Key Features
|
| 89 |
+
- **Lightweight High-Performance Architecture**: We propose an efficient architecture that integrates an 8.3B-parameter Diffusion Transformer (DiT) with a 3D causal VAE, achieving compression ratios of 16× in spatial dimensions and 4× along the temporal axis. Additionally, the innovative SSTA (Selective and Sliding Tile Attention) mechanism prunes redundant spatiotemporal kv blocks, significantly reduces computational overhead for long video sequences and accelerates inference, achieving an end-to-end speedup of $1.87 \times$ in 10-second 720p video synthesis compared to FlashAttention-3.
|
| 90 |
+
|
| 91 |
+
<div align="center">
|
| 92 |
+
<img src="./assets/hy_video_1_5_dit.png" alt="HunyuanVideo-1.5 DiT" width="600">
|
| 93 |
+
</div>
|
| 94 |
+
|
| 95 |
+
|
| 96 |
+
- **Video Super-Resolution Enhancement**: We develop an efficient few-step super-resolution network that upscales outputs to 1080p. It enhances sharpness while correcting distortions, thereby refining details and overall visual texture.
|
| 97 |
+
|
| 98 |
+
<div align="center">
|
| 99 |
+
<img src="./assets/hy_video_1_5_vsr.png" alt="HunyuanVideo-1.5 VSR" width="600">
|
| 100 |
+
</div>
|
| 101 |
+
|
| 102 |
+
- **End-to-End Training Optimization**: This work employs a multi-stage, progressive training strategy covering the entire pipeline from pre-training to post-training. Combined with the Muon optimizer to accelerate convergence, this approach holistically refines motion coherence, aesthetic quality, and human preference alignment, achieving professional-grade content generation.
|
| 103 |
+
|
| 104 |
+
## 📜 System Requirements
|
| 105 |
+
|
| 106 |
+
### Hardware Requirements
|
| 107 |
+
|
| 108 |
+
- **GPU**: NVIDIA GPU with CUDA support
|
| 109 |
+
- **Minimum GPU Memory**: 14 GB (with model offloading enabled)
|
| 110 |
+
|
| 111 |
+
> **Note:** The memory requirements above are measured with model offloading enabled. If your GPU has sufficient memory, you may disable offloading for improved inference speed.
|
| 112 |
+
|
| 113 |
+
### Software Requirements
|
| 114 |
+
|
| 115 |
+
- **Operating System**: Linux
|
| 116 |
+
- **Python**: Python 3.10 or higher
|
| 117 |
+
- **CUDA**: Compatible CUDA version for your PyTorch installation
|
| 118 |
+
|
| 119 |
+
## 🛠️ Dependencies and Installation
|
| 120 |
+
|
| 121 |
+
### Step 1: Clone the Repository
|
| 122 |
+
|
| 123 |
+
```bash
|
| 124 |
+
git clone https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5.git
|
| 125 |
+
cd HunyuanVideo-1.5
|
| 126 |
+
```
|
| 127 |
+
|
| 128 |
+
### Step 2: Install Basic Dependencies
|
| 129 |
+
|
| 130 |
+
```bash
|
| 131 |
+
pip install -r requirements.txt
|
| 132 |
+
pip install -i https://mirrors.tencent.com/pypi/simple/ --upgrade tencentcloud-sdk-python
|
| 133 |
+
```
|
| 134 |
+
|
| 135 |
+
### Step 3: Install Attention Libraries
|
| 136 |
+
|
| 137 |
+
* Flash Attention
|
| 138 |
+
It's recommended to install Flash Attention for faster inference and reduced GPU memory consumption.
|
| 139 |
+
Detailed installation instructions are available at [Flash Attention](https://github.com/Dao-AILab/flash-attention).
|
| 140 |
+
|
| 141 |
+
* Flex-Block-Attention
|
| 142 |
+
flex-block-attn is only required for sparse attention to achieve faster inference and can be installed by the following command:
|
| 143 |
+
```bash
|
| 144 |
+
git clone https://github.com/Tencent-Hunyuan/flex-block-attn.git
|
| 145 |
+
cd flex-block-attn
|
| 146 |
+
python3 setup.py install
|
| 147 |
+
```
|
| 148 |
+
|
| 149 |
+
* SageAttention
|
| 150 |
+
```bash
|
| 151 |
+
git clone https://github.com/cooper1637/SageAttention.git
|
| 152 |
+
cd SageAttention
|
| 153 |
+
export EXT_PARALLEL=4 NVCC_APPEND_FLAGS="--threads 8" MAX_JOBS=32 # Optional
|
| 154 |
+
python3 setup.py install
|
| 155 |
+
```
|
| 156 |
+
|
| 157 |
+
## 🧱 Download Pretrained Models
|
| 158 |
+
|
| 159 |
+
Download the pretrained models before generating videos. Detailed instructions are available at [checkpoints-download.md](checkpoints-download.md).
|
| 160 |
+
|
| 161 |
+
## 📝 Prompt Guide
|
| 162 |
+
### Prompt Writing Handbook
|
| 163 |
+
Prompt enhancement plays a crucial role in enabling our model to generate high-quality videos. By writing longer and more detailed prompts, the generated video will be significantly improved. We encourage you to craft comprehensive and descriptive prompts to achieve the best possible video quality. we recommend community partners consulting our official guide on how to write effective prompts.
|
| 164 |
+
|
| 165 |
+
**Reference:** **[HunyuanVideo 1.5 Prompt Handbook](https://doc.weixin.qq.com/doc/w3_AXcAcwZSAGgCNACVygLxeQjyn4FYS?scode=AJEAIQdfAAoSfXnTj0AAkA-gaeACk)**
|
| 166 |
+
|
| 167 |
+
### System Prompts for Automatic Prompt Enhancement
|
| 168 |
+
For users seeking to optimize prompts for other large models, it is recommended to consult the definition of `t2v_rewrite_system_prompt` in the file `hyvideo/utils/rewrite/t2v_prompt.py` to guide text-to-video rewriting. Similarly, for image-to-video rewriting, refer to the definition of `i2v_rewrite_system_prompt` in `hyvideo/utils/rewrite/i2v_prompt.py`.
|
| 169 |
+
|
| 170 |
+
## 🔑 Usage
|
| 171 |
+
### Video Generation
|
| 172 |
+
|
| 173 |
+
For prompt rewriting, we recommend using Gemini or models deployed via vLLM. This codebase currently only supports models compatible with the vLLM API. If you wish to use Gemini, you will need to implement your own interface calls.
|
| 174 |
+
|
| 175 |
+
For models with a vLLM API, note that T2V (text-to-video) and I2V (image-to-video) have different recommended models and environment variables:
|
| 176 |
+
|
| 177 |
+
- T2V: use [Qwen3-235B-A22B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-235B-A22B-Thinking-2507), configure `T2V_REWRITE_BASE_URL` and `T2V_REWRITE_MODEL_NAME`
|
| 178 |
+
- I2V: use [Qwen3-VL-235B-A22B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Instruct), configure `I2V_REWRITE_BASE_URL` and `I2V_REWRITE_MODEL_NAME`
|
| 179 |
+
|
| 180 |
+
> You may set the above model names to any other vLLM-compatible models you have deployed (including HuggingFace models).
|
| 181 |
+
> Rewriting is enabled by default; to disable it explicitly, use the `--disable_rewrite` flag. If no vLLM endpoint is configured, the pipeline runs without remote rewriting.
|
| 182 |
+
|
| 183 |
+
Example: Generate a video (works for both T2V and I2V; set `IMAGE_PATH=none` for T2V or provide an image path for I2V)
|
| 184 |
+
|
| 185 |
+
```bash
|
| 186 |
+
export T2V_REWRITE_BASE_URL="<your_vllm_server_base_url>"
|
| 187 |
+
export T2V_REWRITE_MODEL_NAME="<your_model_name>"
|
| 188 |
+
export I2V_REWRITE_BASE_URL="<your_vllm_server_base_url>"
|
| 189 |
+
export I2V_REWRITE_MODEL_NAME="<your_model_name>"
|
| 190 |
+
|
| 191 |
+
PROMPT="A close-up shot captures a scene on a polished, light-colored granite kitchen counter, illuminated by soft natural light from an unseen window. Initially, the frame focuses on a tall, clear glass filled with golden, translucent apple juice standing next to a single, shiny red apple with a green leaf still attached to its stem. The camera moves horizontally to the right. As the shot progresses, a white ceramic plate smoothly enters the frame, revealing a fresh arrangement of about seven or eight more apples, a mix of vibrant reds and greens, piled neatly upon it. A shallow depth of field keeps the focus sharply on the fruit and glass, while the kitchen backsplash in the background remains softly blurred. The scene is in a realistic style."
|
| 192 |
+
|
| 193 |
+
IMAGE_PATH=./data/reference_image.png # Optional, 'none' or <image path>
|
| 194 |
+
SEED=1
|
| 195 |
+
ASPECT_RATIO=16:9
|
| 196 |
+
RESOLUTION=480p
|
| 197 |
+
OUTPUT_PATH=./outputs/output.mp4
|
| 198 |
+
|
| 199 |
+
# Configuration
|
| 200 |
+
N_INFERENCE_GPU=8 # Parallel inference GPU count
|
| 201 |
+
CFG_DISTILLED=true # Inference with CFG distilled model, 2x speedup
|
| 202 |
+
SPARSE_ATTN=true # Inference with sparse attention
|
| 203 |
+
SAGE_ATTN=false # Inference with SageAttention
|
| 204 |
+
MODEL_PATH=ckpts # Path to pretrained model
|
| 205 |
+
|
| 206 |
+
torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
|
| 207 |
+
--prompt "$PROMPT" \
|
| 208 |
+
--image_path $IMAGE_PATH \
|
| 209 |
+
--resolution $RESOLUTION \
|
| 210 |
+
--aspect_ratio $ASPECT_RATIO \
|
| 211 |
+
--seed $SEED \
|
| 212 |
+
--cfg_distilled $CFG_DISTILLED \
|
| 213 |
+
--sparse_attn $SPARSE_ATTN \
|
| 214 |
+
--use_sageattn $SAGE_ATTN \
|
| 215 |
+
--output_path $OUTPUT_PATH \
|
| 216 |
+
--save_pre_sr_video \
|
| 217 |
+
--model_path $MODEL_PATH
|
| 218 |
+
```
|
| 219 |
+
|
| 220 |
+
### Command Line Arguments
|
| 221 |
+
|
| 222 |
+
| Argument | Type | Required | Default | Description |
|
| 223 |
+
|----------|------|----------|---------|-------------|
|
| 224 |
+
| `--prompt` | str | Yes | - | Text prompt for video generation |
|
| 225 |
+
| `--negative_prompt` | str | No | `''` | Negative prompt for video generation |
|
| 226 |
+
| `--resolution` | str | Yes | - | Video resolution: `480p` or `720p` |
|
| 227 |
+
| `--model_path` | str | Yes | - | Path to pretrained model directory |
|
| 228 |
+
| `--aspect_ratio` | str | No | `16:9` | Aspect ratio of the output video |
|
| 229 |
+
| `--num_inference_steps` | int | No | `50` | Number of inference steps |
|
| 230 |
+
| `--video_length` | int | No | `121` | Number of frames to generate |
|
| 231 |
+
| `--seed` | int | No | `123` | Random seed for reproducibility |
|
| 232 |
+
| `--image_path` | str | No | `None` | Path to reference image (enables i2v mode). Use `none` or `None` to explicitly use text-to-video mode |
|
| 233 |
+
| `--output_path` | str | No | `None` | Output file path (if not provided, saves to `./outputs/output_{transformer_version}_{timestamp}.mp4`) |
|
| 234 |
+
| `--sr` | bool | No | `true` | Enable super resolution (use `--sr false` or `--sr 0` to disable) |
|
| 235 |
+
| `--save_pre_sr_video` | bool | No | `false` | Save original video before super resolution (use `--save_pre_sr_video` or `--save_pre_sr_video true` to enable, only effective when super resolution is enabled) |
|
| 236 |
+
| `--rewrite` | bool | No | `true` | Enable prompt rewriting (use `--rewrite false` or `--rewrite 0` to disable, may result in lower quality video generation) |
|
| 237 |
+
| `--cfg_distilled` | bool | No | `false` | Enable CFG distilled model for faster inference (~2x speedup, use `--cfg_distilled` or `--cfg_distilled true` to enable) |
|
| 238 |
+
| `--sparse_attn` | bool | No | `false` | Enable sparse attention for faster inference (~1.5-2x speedup, requires H-series GPUs, auto-enables CFG distilled, use `--sparse_attn` or `--sparse_attn true` to enable) |
|
| 239 |
+
| `--offloading` | bool | No | `true` | Enable CPU offloading (use `--offloading false` or `--offloading 0` to disable for faster inference if GPU memory allows) |
|
| 240 |
+
| `--group_offloading` | bool | No | `None` | Enable group offloading (default: None, automatically enabled if offloading is enabled. Use `--group_offloading` or `--group_offloading true/1` to enable, `--group_offloading false/0` to disable) |
|
| 241 |
+
| `--dtype` | str | No | `bf16` | Data type for transformer: `bf16` (faster, lower memory) or `fp32` (better quality, slower, higher memory) |
|
| 242 |
+
| `--use_sageattn` | bool | No | `false` | Enable SageAttention (use `--use_sageattn` or `--use_sageattn true/1` to enable, `--use_sageattn false/0` to disable) |
|
| 243 |
+
| `--sage_blocks_range` | str | No | `0-53` | SageAttention blocks range (e.g., `0-5` or `0,1,2,3,4,5`) |
|
| 244 |
+
| `--enable_torch_compile` | bool | No | `false` | Enable torch compile for transformer (use `--enable_torch_compile` or `--enable_torch_compile true/1` to enable, `--enable_torch_compile false/0` to disable) |
|
| 245 |
+
|
| 246 |
+
**Note:** Use `--nproc_per_node` to specify the number of GPUs. For example, `--nproc_per_node=8` uses 8 GPUs.
|
| 247 |
+
|
| 248 |
+
|
| 249 |
+
## 🧱 Models Cards
|
| 250 |
+
|ModelName| Download |
|
| 251 |
+
|-|---------------------------|
|
| 252 |
+
|HunyuanVideo 1.5-480P-T2V|[480P-T2V](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_t2v) |
|
| 253 |
+
|HunyuanVideo 1.5-480p-I2V |[480p-I2V](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_i2v) |
|
| 254 |
+
|HunyuanVideo 1.5-480p-T2V-distill | [480p-T2V-distill](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_t2v_distilled) |
|
| 255 |
+
|HunyuanVideo 1.5-480p-I2V-distill |[480p-I2V-distill](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_i2v_distilled) |
|
| 256 |
+
|HunyuanVideo 1.5-720P-T2V|[720P-T2V](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/720p_t2v) |
|
| 257 |
+
|HunyuanVideo 1.5-720P-I2V |[720P-I2V](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/720p_i2v) |
|
| 258 |
+
|HunyuanVideo 1.5-720P-T2V-distiill| Comming soon |
|
| 259 |
+
|HunyuanVideo 1.5-720P-I2V-distiill |[720P-I2V-distiill](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/720p_i2v_distilled) |
|
| 260 |
+
|HunyuanVideo 1.5-720P-T2V-sparse-distiill| Comming soon |
|
| 261 |
+
|HunyuanVideo 1.5-720P-I2V-sparse-distiill |[720P-I2V-sparse-distiill](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/720p_i2v_distilled_sparse) |
|
| 262 |
+
|HunyuanVideo 1.5-720p-sr |[720p-sr](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/720p_sr_distilled) |
|
| 263 |
+
|HunyuanVideo 1.5-1080p-sr |[1080p-sr](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/1080p_sr_distilled) |
|
| 264 |
+
|
| 265 |
+
|
| 266 |
+
|
| 267 |
+
## 🎬 More Examples
|
| 268 |
+
|Features|Demo1|Demo2|
|
| 269 |
+
|------|------|------|
|
| 270 |
+
|Strong Instruction Following|<video src="https://github.com/user-attachments/assets/fdc3c27b-69f5-46a1-b707-0b57510fa32f" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```一名哀伤的黑发中国女子凝望天空,复古胶片风格烘托出怀旧戏剧氛围``` </details> <details><summary>📋 Show rewrite prompt</summary> ```俯视角度,一位有着深色,略带凌乱的长卷发的年轻中国女性,佩戴着闪耀的珍珠项链和圆形金色耳环,她凌乱的头发被风吹散,她微微抬头,望向天空,神情十分哀伤,眼中含着泪水。嘴唇涂着红色口红。背景是带有华丽红色花纹的图案。画面呈现复古电影风格,色调低饱和,带着轻微柔焦,烘托情绪氛围,质感仿佛20世纪90年代的经典胶片风格,营造出怀旧且富有戏剧性的感觉。``` </details>|<video src="https://github.com/user-attachments/assets/3fcb42cc-cdd3-4651-86a6-645a858561c4" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```建筑蓝图上的线条化为实体,瞬间生长出一个完整的复古工业风办公空间。``` </details> <details><summary>📋 Show rewrite prompt</summary> ```一座空旷的现代阁楼里,有一张铺展在地板中央的建筑蓝图。忽然间,图纸上的线条泛起微光,仿佛被某种无形的力量唤醒。紧接着,那些发光的线条开始向上延伸,从平面中挣脱,勾勒出立体的轮廓——就像在空中进行一场无声的3D打印。随后,奇迹在加速发生:极简的橡木办公桌、优雅的伊姆斯风格皮质椅、高挑的工业风金属书架,还有几盏爱迪生灯泡,以光纹为骨架迅速“生长”出来。转瞬间,线条被真实的材质填充——木材的温润、皮革的质感、金属的冷静,都在眨眼间完整呈现。最终,所有家具稳固落地,蓝图的光芒悄然褪去。一个完整的办公空间,就这样从二维的图纸中诞生。``` </details>|
|
| 271 |
+
|Smooth Motion Generation|<video src="https://github.com/user-attachments/assets/21f9da05-33d0-4521-b188-ea009e7fdd3f" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```A cosmic loaf of bread, with a volcanic black crust, is precisely sliced open to reveal a swirling nebula interior.``` </details> <details><summary>📋 Show rewrite prompt</summary> ```Cinematic 8K footage, with a stark, moody aesthetic. Under a dramatic top-down spotlight, a loaf of what appears to be bread rests on a slab of polished marble, which is flecked with silver that glitters like a starfield. The loaf's crust is a deep, matte black, cracked like cooled volcanic rock. A sleek, modern santoku knife, its sharp edge gleaming under the single light source, begins a series of clean, rhythmic cuts. With each precise, repetitive slice that falls away, the loaf’s impossible interior is revealed: not dough, but a compressed, swirling nebula of deep purples and blues, alive with pinpricks of glittering light. As the knife continues its precise motion, a fine, shimmering dust of cosmic particles settles on the marble. The extreme macro view focuses on the mesmerizing contrast between the blade’s cold steel and the ethereal, galaxy-filled substance of the bread. This is hyper-realistic macro videography at its finest.``` </details>|<video src="https://github.com/user-attachments/assets/49057fe8-a102-4fd7-bd92-e9561abb9f45" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```A figure skater performs a rapid, graceful Biellmann spin, captured from all angles.``` </details> <details><summary>📋 Show rewrite prompt</summary> ```The video captures a figure skater performing a Biellmann spin on ice. The subject is a female skater in a glittering costume. Initially, she spins on one leg. Then, she reaches back and pulls her free leg up. Next, she spins rapidly, becoming a blur of motion, with ice shavings spraying from her skate blade. The background is an ice rink with blurred advertising boards. The camera circles around the subject to capture the spin from all angles. The lighting is spotlit, creating lens flares and sparkles on her costume. The overall video presents a graceful artistic sports style.``` </details>|
|
| 272 |
+
|Cinematic Aesthetics|<video src="https://github.com/user-attachments/assets/4098cf72-357d-4b81-97df-6752064ce0c3" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```固定镜头,焦点在图片里的挂钟上,镜头轻微摇晃营造手持摄影感,wjw,filmphotos,Film Grain,Reversal film photography,Wong Kar-wai movies,cinematic photography, HK film style,neon lighting, in the style of Wong Kar Wai film``` </details> <details><summary>📋 Show rewrite prompt</summary> ```Handheld lens shooting, the camera focuses on the wall clock hanging on the green-toned wall, shaking slightly. The second hand sweeps steadily across the clock face, and the shadow of the clock cast on the wall shifts subtly with the movement of the lens.``` </details>|<video src="https://github.com/user-attachments/assets/2b4575e5-79f1-4011-bed0-e8380198f7c9" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```The leaves of calamus shine in the sunlight, dotted with dewdrops that trickle down to the ground with the breeze.``` </details> <details><summary>📋 Show rewrite prompt</summary> ```A macro shot focuses on long, slender calamus leaves, rendered in a cinematic photography realistic style. The main leaf, a vibrant, deep green, is positioned diagonally across the frame. Its surface is covered in tiny, glistening spherical dewdrops that catch and refract the bright morning sunlight, creating sparkling highlights. Initially, a larger, perfectly round dewdrop clings to the upper section of the leaf, its surface tension holding it in place. Then, as the leaf sways almost imperceptibly, the dewdrop begins to slowly dislodge. Next, it starts to trickle down the central vein of the leaf, its shape elongating slightly as it moves, leaving a subtle, glistening wet trail in its path. Finally, it reaches the pointed tip of the leaf, hangs for a brief moment, and falls out of the bottom of the frame. In the background, other leaves and blades of grass are softly blurred, creating a beautiful bokeh effect with soft, out-of-focus circles of light. The environment is bathed in the warm, golden glow of early morning sunlight, which streams in from behind the leaves, backlighting them and causing their wet edges to shine brilliantly. The overall impression is one of serene, natural beauty, captured in a highly realistic and detailed manner. This is a macro shot. The camera tilts down very slowly, following the path of the main dewdrop as it travels down the leaf. The lighting is soft and natural, with strong backlighting to create a radiant, glowing effect on the dewdrops and leaf edges, characteristic of professional nature photography. The atmosphere is peaceful and serene. The overall video presents a cinematic photography realistic style.``` </details>|
|
| 273 |
+
|Text Rendering|<video src="https://github.com/user-attachments/assets/7c964fc5-c27e-4bd0-bf3f-eb8fca2caef6" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```赛博朋克风格的夜晚街角,一个巨大的招牌上, “Hunyuan Video 1.5”的霓虹灯管轮廓已经安装好。镜头推进,霓虹灯从“H”开始,伴随着‘滋滋’的电流声,每个字母依次亮起粉紫色的光芒,直到全部点亮,照亮了潮湿的街道。赛博朋克,城市美学``` </details> <details><summary>📋 Show rewrite prompt</summary> ```On a wet street corner in a cyberpunk city at night, a large neon sign reading "Hunyuan Video 1.5" lights up sequentially, illuminating the dark, rainy environment with a pinkish-purple glow. he scene is a dark, rain-slicked street corner in a futuristic, cinematic cyberpunk city. Mounted on the metallic, weathered facade of a building is a massive, unlit neon sign. The sign's glass tube framework clearly spells out the words "Hunyuan Video 1.5". Initially, the street is dimly lit, with ambient light from distant skyscrapers creating shimmering reflections on the wet asphalt below. Then, the camera zooms in slowly toward the sign. As it moves, a low electrical sizzling sound begins. In the background, the dense urban landscape of the cyberpunk metropolis is visible through a light atmospheric haze, with towering structures adorned with their own flickering advertisements. A complex web of cables and pipes crisscrosses between the buildings. The shot is at a low angle, looking up at the sign to emphasize its grand scale. The lighting is high-contrast and dramatic, dominated by the neon glow which creates sharp, specular reflections and deep shadows. The atmosphere is moody and tech-noir. The overall video presents a cinematic photography realistic style.,``` </details>|<video src="https://github.com/user-attachments/assets/94ce62d9-5788-4912-8e89-b7dc84d7bdc4" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```黑色背景上展示着艺术字体"Hunyuan Video 1.5",每个字母都由不同的流体构成,持续缓慢流动。多种不同质地、不互溶的彩色液体(如金属、牛奶、透明凝胶)在无重力环境中漂浮、碰撞``` </details> <details><summary>📋 Show rewrite prompt</summary> ```The artistic words "Hunyuan Video 1.5" are rendered in the center of the screen, with each character composed of a unique, slowly moving fluid, set against a deep black background, while colorful, immiscible liquid blobs float and collide around them in a zero-gravity environment. The main subject is the text "Hunyuan Video 1.5". The characters for "Hunyuan" are filled with a lustrous, molten gold liquid that swirls slowly. The letters for "Video" are composed of a creamy, opaque white fluid resembling milk, with gentle currents visible beneath its surface. The numbers "1.5" are made from a viscous, transparent blue gel that subtly undulates. Each fluid moves independently within the confines of its character's shape, creating a mesmerizing internal motion. This high-quality 3D CGI animation presents the fluids with photorealistic textures. In the surrounding space, several immiscible liquid blobs drift in zero gravity. A large, spherical blob of pearlescent liquid slowly floats from the upper left. A smaller, amorphous blob of shimmering, metallic silver drifts from the lower right, and a translucent, pink gelatinous mass wobbles nearby. Initially, these blobs drift aimlessly. Then, the silver blob slowly collides with the larger pearlescent one. As they make contact, their surfaces deform and ripple dynamically, but the liquids do not mix, pushing against each other before gently bouncing off and continuing their slow, separate paths in the pristine black void. The shot is at an eye-level angle, presenting a front view of the text. The camera remains static, ensuring the entire text "Hunyuan Video 1.5" is fully visible throughout the shot. The scene is lit by a soft, diffused light that highlights the brilliant reflections on the metallic fluids and the inner glow of the translucent gels, enhancing the high-quality 3D CGI animation. The atmosphere is quiet, abstract, and mesmerizing. The overall video has the polished look of a high-quality 3D CGI animation with a focus on abstract fluid dynamics.``` </details>|
|
| 274 |
+
|Physics Compliance|<video src="https://github.com/user-attachments/assets/07fa4dcd-0bd1-4935-bb89-323428cce6fc" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```The wind blows through the shabby bookshelf, and the pages flutter on it. ``` </details> <details><summary>📋 Show rewrite prompt</summary> ```In a dimly lit, dusty room, a gentle wind causes the pages of old books on a shabby wooden bookshelf to flutter. The bookshelf, made of dark, weathered wood, shows signs of age with peeling varnish, scratches, and a fine layer of settled dust on its surfaces. Several old books with faded, worn covers are arranged on the shelves; some stand upright while others lie on their sides. Initially, the scene is quiet. Then, a soft breeze enters the frame from the left, disturbing the dust on the shelves. Next, the yellowed, brittle pages of an open book lying flat begin to lift and ripple delicately. As the breeze continues, the pages of other books also start to flutter, some turning over slowly and gracefully, revealing aged text and faint illustrations within. In the background, the wall has faded, peeling wallpaper, and the overall atmosphere is one of quiet neglect and the passage of time. The shot is at an eye-level angle with the main subject. The camera pans to the left slowly. Soft, diffused sunlight filters through a dusty, off-camera window, creating distinct beams of light that cut through the dimness. This lighting highlights the texture of the old wood and the floating dust particles in the air, enhancing the photorealistic detail of the scene. The mood is melancholic and peaceful. The overall video presents a cinematic photography realistic style.``` </details>|<video src="https://github.com/user-attachments/assets/81065925-c008-421b-8cf0-b3cbf1e77eac" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```An intact soda can is slowly crushed by a hand.``` </details> <details><summary>📋 Show rewrite prompt</summary> ```In a medium close-up, a hand slowly crushes an intact red and white soda can on a wooden table. A male hand with visible, realistic skin texture is wrapped firmly around the middle of an intact, pristine red and white aluminum soda can. The can, covered in glistening condensation droplets, rests on a dark, polished wooden surface. The cinematic realism captures every minute detail of the scene. Initially, the hand's grip is steady, with the can's cylindrical shape perfectly preserved. Then, the fingers begin to tighten slowly, the knuckles whitening slightly from the exertion. Next, the smooth aluminum surface starts to buckle under the controlled pressure, a sharp crease forming vertically down its side as the metallic sheen distorts. As the hand continues its deliberate squeeze, the can collapses inward progressively, the vibrant red paint wrinkling as the metal structure crumples. Finally, the can is left significantly crushed, its form now an irregular, crumpled shape held tightly in the fist. The scene takes place on a dark, polished wooden tabletop that catches soft, diffuse reflections. The grain of the wood is faintly discernible, adding a layer of texture to the foreground. The background is completely out of focus, rendered as a soft, dark, and non-descript blur, which isolates the main action and enhances the photorealistic quality of the shot. The shot is a medium close-up, presented in a cinematic photography realistic style. The camera remains static at a slightly high angle, looking down to provide a clear and unobstructed view of the can's deformation. Soft side lighting creates high contrast, sculpting the muscles and tendons of the hand while casting specular highlights on the metallic can and the water droplets. The atmosphere is focused and intense. The overall video presents a cinematic photography realistic style.``` </details>|
|
| 275 |
+
|Camera Movement|<video src="https://github.com/user-attachments/assets/6deacbfe-4cca-48d7-a2be-cb638a3e01cb" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```圣诞节的家中,小女孩靠着妈妈听妈妈读书,背景是下着雪的窗外,镜头缓慢下移,一只可爱的长毛小白猫戴着圣诞帽趴在温暖的地摊上``` </details> <details><summary>📋 Show rewrite prompt</summary> ```In a cozy home on Christmas, a young girl leans against her mother as they read a book, and the camera moves down to reveal a fluffy white cat in a Santa hat resting on a warm rug. In a warmly lit living room on a snowy Christmas evening, a young mother and her little daughter are sitting together on a comfortable sofa. The mother, with a gentle expression and wearing a cream-colored knitted sweater, holds an open storybook with colorful illustrations. Her daughter, a small girl with brown hair in pigtails and a red pajama set, leans her head affectionately on her mother's shoulder, her eyes fixed on the book. On the floor below them, a fluffy, long-haired white cat is curled up on a plush, beige wool rug. The cat wears a tiny red and white Santa hat perched between its ears. Initially, the shot focuses on the mother and daughter, capturing their quiet, shared moment. The mother’s finger gently rests on the page of the book. Then, the camera slowly moves downward, gliding past the book and their laps. Finally, the camera settles at a low angle, bringing the adorable white cat into sharp focus as the primary subject. The cat's chest gently rises and falls with each breath, its eyes peacefully closed. Through a large window in the background, large, soft snowflakes can be seen falling silently against the dark blue twilight sky, creating a peaceful and serene backdrop. Faint, out-of-focus golden Christmas lights twinkle in the corner of the room, adding to the warm, festive atmosphere. The scene is imbued with a sense of comfort and holiday warmth, creating a beautiful cinematic photography realistic image. The camera slowly moves downward. The shot uses soft, warm interior lighting that casts gentle shadows, creating a high-contrast, cinematic look. A shallow depth of field keeps the focus on the subjects while beautifully blurring the background elements. The mood is heartwarming, peaceful, and festive. The overall video presents a cinematic photography realistic style.``` </details>|<video src="https://github.com/user-attachments/assets/8e72ed0f-f8ac-445b-97e5-eb4b16fbc121" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```The hiker begins walking forward along the trail, causing the water bottle to swing rhythmically with each step. The camera gradually pulls back and rises to reveal a vast desert landscape stretching out ahead.``` </details> <details><summary>📋 Show rewrite prompt</summary> ```The hiker begins walking forward along the trail, causing the water bottle to swing rhythmically with each step. The camera gradually pulls back and rises to reveal a vast desert landscape stretching out ahead, while the sun position shifts from afternoon to dusk, casting increasingly longer shadows across the terrain as the figure becomes smaller in the frame.``` </details>|
|
| 276 |
+
|Multi-Style Support|<video src="https://github.com/user-attachments/assets/65b2c5a5-e6ba-43be-9462-a98b03b675f1" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```Have the cake man begin to take chunks out of himself and eat it.``` </details> <details><summary>📋 Show rewrite prompt</summary> ```The cake man sits on the chair, with his hands resting on his knees. Then, he slowly raises his right hand and breaks off a piece of cake from his left shoulder. Next, he brings the piece of cake to his mouth and begins to chew. At the same time, his eyes widen slightly, and his mouth parts gently. After that, he raises his right hand again, breaks off another piece of cake from his right arm, and repeats the action of bringing it to his mouth to chew.``` </details>|<video src="https://github.com/user-attachments/assets/de5f7480-b79c-4fc1-b345-c5880a3b5f9e" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```A little girl, carrying a colorful handbag, skips through the garden. The video uses claymation style.``` </details> <details><summary>📋 Show rewrite prompt</summary> ```A little girl with a colorful handbag skips through a whimsical claymation garden. In a vibrant garden constructed entirely from clay, a young girl, meticulously crafted in a claymation style, skips joyfully. She has chunky, sculpted yellow clay hair tied in pigtails that bounce with a slight stiffness, simple black button eyes, and a wide, permanently etched smile. She wears a simple pink clay dress with a white collar. In her left hand, she carries a small handbag molded from bright red and blue clay, which swings in a slightly jerky arc as she moves. Initially, the girl lifts her right leg high, her body momentarily suspended in a classic stop-motion pose. Then, she hops forward, landing lightly as her left leg swings through for the next skip. Her arms move in an exaggerated, back-and-forth rhythm, characteristic of stop-motion animation. Her movements are intentionally not perfectly fluid, highlighting the frame-by-frame nature of the claymation technique. The garden around her is a whimsical, textured world. In the foreground and mid-ground, oversized flowers with swirled purple and orange petals stand on thick green stems. The ground is a textured mat of green clay, showing subtle fingerprints and tool marks that add to the handmade charm. In the background, a pale blue clay backdrop features a simplified, smiling sun molded from yellow clay. The shot is at an eye-level angle with the main subject. The camera follows the subject, moving smoothly to the right to keep her in the frame. The lighting is bright and even, casting soft shadows that emphasize the rounded, three-dimensional forms of the clay models. The overall video presents a charming and detailed claymation style.``` </details>|
|
| 277 |
+
|High Image-Video Consistency|<img src="https://github.com/user-attachments/assets/3bc8e55d-c211-454e-8067-128c0e215eb6"> <video src="https://github.com/user-attachments/assets/3e6b7ee9-ec66-4e46-a446-801b1c1a1c81" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```女孩放下书,站起身,转身向屋内走去。镜头拉远。``` </details> <details><summary>📋 Show rewrite prompt</summary> ```女孩合上手中的书,将书放在身侧的窗台上。随后,她缓缓站起身,转身向屋内走去,身影逐渐没入门后的阴影中。镜头缓缓拉远,露出更多被绿植覆盖的屋檐和墙体。``` </details>|<img src="https://github.com/user-attachments/assets/7657ce60-90b5-4fdc-b713-0eaa55829b09"> <video src="https://github.com/user-attachments/assets/9ca24021-2353-40d5-8a4d-0f8e67d51826" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```女人手上的鸟亲了女人一口``` </details> <details><summary>📋 Show rewrite prompt</summary> ```女人手臂上的白色鹦鹉缓缓转过头,将喙轻轻触碰女人的脸颊,随后收回头部。女人嘴角微微上扬,目光温柔地注视着鹦鹉。背景中的绿植保持静止。``` </details>|
|
| 278 |
+
|
| 279 |
+
|
| 280 |
+
|
| 281 |
+
## 📊 Evaluation
|
| 282 |
+
|
| 283 |
+
### Rating
|
| 284 |
+
We assess text-to-video generation using a comprehensive rating methodology that considers five key dimensions: text-video consistency, visual quality, structural stability, motion effects, and the aesthetic quality of individual frames. For image-to-video generation, the evaluation encompasses image-video consistency, instruction responsiveness, visual quality, structural stability, and motion effects.
|
| 285 |
+
|
| 286 |
+
<div align="center">
|
| 287 |
+
<img src="./assets/T2V_Rating.png" alt="rating result of t2v" width="800">
|
| 288 |
+
</div>
|
| 289 |
+
|
| 290 |
+
---
|
| 291 |
+
|
| 292 |
+
<div align="center">
|
| 293 |
+
<img src="./assets/I2V_Rating.png" alt="rating result of i2v" width="800">
|
| 294 |
+
</div>
|
| 295 |
+
|
| 296 |
+
|
| 297 |
+
### GSB
|
| 298 |
+
The GSB(Good/Same/Bad) approach is widely used to evaluate the relative performance of two models based on overall video perception quality.We carefully construct 300 diverse text prompts and 300 image samples to cover balanced application scenarios for both text-to-video and image-to-video tasks. For each prompt or image input, an equal number of video samples are generated by each model in a single run to ensure comparability. To maintain fairness, inference is performed only once per input without any cherry-picking of results. All competing models are evaluated using their default configurations. The evaluation is conducted by over 100 professional assessors
|
| 299 |
+
|
| 300 |
+
<div align="center">
|
| 301 |
+
<img src="./assets/T2V_GSB.png" alt="gsb result of t2v" width="800">
|
| 302 |
+
</div>
|
| 303 |
+
|
| 304 |
+
---
|
| 305 |
+
|
| 306 |
+
<div align="center">
|
| 307 |
+
<img src="./assets/I2V_GSB.png" alt="gsb result of i2v" width="800">
|
| 308 |
+
</div>
|
| 309 |
+
|
| 310 |
+
|
| 311 |
+
## 📚 Citation
|
| 312 |
+
|
| 313 |
+
```bibtex
|
| 314 |
+
@misc{hunyuanvideo2025,
|
| 315 |
+
title={HunyuanVideo 1.5 Technical Report},
|
| 316 |
+
author={Tencent Hunyuan Foundation Model Team},
|
| 317 |
+
year={2025},
|
| 318 |
+
publisher = {GitHub},
|
| 319 |
+
howpublished = {\url{https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5}},
|
| 320 |
+
}
|
| 321 |
+
```
|
| 322 |
+
|
| 323 |
+
## 🙏 Acknowledgements
|
| 324 |
+
We would like to thank the contributors to the [Transformers](https://github.com/huggingface/transformers), [Diffusers](https://github.com/huggingface/diffusers) , [HuggingFace](https://huggingface.co/) and [Qwen-VL](https://github.com/QwenLM/Qwen-VL), for their open research and exploration.
|
| 325 |
+
|
| 326 |
+
## 🌟 Github Star History
|
README_CN.md
ADDED
|
@@ -0,0 +1,328 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[Read in English](./README.md)
|
| 2 |
+
|
| 3 |
+
# HunyuanVideo-1.5
|
| 4 |
+
|
| 5 |
+
<div align="center">
|
| 6 |
+
|
| 7 |
+
<img src="./assets/logo.png" alt="HunyuanVideo-1.5 Logo" width="80%">
|
| 8 |
+
|
| 9 |
+
# 🎬 HunyuanVideo-1.5: 一款领先的轻量级视频生成模型
|
| 10 |
+
|
| 11 |
+
</div>
|
| 12 |
+
|
| 13 |
+
|
| 14 |
+
<div align="center">
|
| 15 |
+
<!-- <img src="./assets/banner.png" alt="HunyuanVideo-1.5 Banner" width="800"> -->
|
| 16 |
+
|
| 17 |
+
</div>
|
| 18 |
+
|
| 19 |
+
|
| 20 |
+
HunyuanVideo-1.5作为一款轻量级视频生成模型,仅需83亿参数即可提供顶级画质,大幅降低使用门槛。该模型在消费级显卡上运行流畅,让每位开发者和创作者都能轻松使用。本代码库提供生成创意视频所需的实现方案与工具集。
|
| 21 |
+
|
| 22 |
+
|
| 23 |
+
<div align="center">
|
| 24 |
+
<a href="https://hunyuan.tencent.com/video/zh?tabIndex=0" target="_blank"><img src=https://img.shields.io/badge/Official%20Site-333399.svg?logo=homepage height=22px></a>
|
| 25 |
+
<a href=https://huggingface.co/tencent/HunyuanVideo-1.5 target="_blank"><img src=https://img.shields.io/badge/%F0%9F%A4%97%20Models-d96902.svg height=22px></a>
|
| 26 |
+
<a href=https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5 target="_blank"><img src= https://img.shields.io/badge/Page-bb8a2e.svg?logo=github height=22px></a>
|
| 27 |
+
<a href="https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5/blob/main/assets/HunyuanVideo_1_5.pdf" target="_blank"><img src=https://img.shields.io/badge/Report-b5212f.svg?logo=arxiv height=22px></a>
|
| 28 |
+
<a href=https://x.com/TencentHunyuan target="_blank"><img src=https://img.shields.io/badge/Hunyuan-black.svg?logo=x height=22px></a>
|
| 29 |
+
</div>
|
| 30 |
+
|
| 31 |
+
|
| 32 |
+
<p align="center">
|
| 33 |
+
👏 加入我们的 <a href="./assets/wechat.png" target="_blank">微信社区</a> 和 <a href="https://discord.gg/ehjWMqF5wY">Discord</a> |
|
| 34 |
+
💻 <a href="https://hunyuan.tencent.com/video/zh?tabIndex=0">官方网站 立即体验模型!</a>  
|
| 35 |
+
</p>
|
| 36 |
+
|
| 37 |
+
## 🔥🔥🔥 最新动态
|
| 38 |
+
👋 2025年11月20日: 我们开源了 HunyuanVideo-1.5的代码和推理权重
|
| 39 |
+
|
| 40 |
+
## 🎥 演示视频
|
| 41 |
+
<div align="center">
|
| 42 |
+
<video src="https://github.com/user-attachments/assets/d45ec78e-ea40-47f1-8d4d-f4d9a0682e2d" width="60%"> </video>
|
| 43 |
+
</div>
|
| 44 |
+
|
| 45 |
+
## 🧩 社区贡献
|
| 46 |
+
|
| 47 |
+
如果您在项目中使用或开发了 HunyuanVideo,欢迎告知我们。
|
| 48 |
+
|
| 49 |
+
- **ComfyUI** - [ComfyUI](https://github.com/comfyanonymous/ComfyUI): 一个强大且模块化的扩散模型图形界面,采用节点式工作流。ComfyUI 支持 HunyuanVideo-1.5,并提供多种工程加速优化以实现快速推理。
|
| 50 |
+
|
| 51 |
+
- **LightX2V** - [LightX2V](https://github.com/ModelTC/LightX2V): 一个轻量级高效的视频生成框架,集成了 HunyuanVideo-1.5,支持多种工程加速技术以实现快速推理。
|
| 52 |
+
|
| 53 |
+
## 📑 开源计划
|
| 54 |
+
- HunyuanVideo-1.5 (文生视频/图生视频)
|
| 55 |
+
- [x] 推理代码和模型权重
|
| 56 |
+
- [ ] Diffusers 支持
|
| 57 |
+
- [ ] 发布所有模型权重(稀疏注意力、蒸馏模型和超分辨率模型)
|
| 58 |
+
|
| 59 |
+
|
| 60 |
+
## 📋 目录
|
| 61 |
+
- [🔥🔥🔥 最新动态](#-最新动态)
|
| 62 |
+
- [🎥 演示视频](#-演示视频)
|
| 63 |
+
- [🧩 社区贡献](#-社区贡献)
|
| 64 |
+
- [📑 开源计划](#-开源计划)
|
| 65 |
+
- [📖 模型介绍](#-模型介绍)
|
| 66 |
+
- [✨ 核心特性](#-核心特性)
|
| 67 |
+
- [📜 系统要求](#-系统要求)
|
| 68 |
+
- [🛠️ 依赖安装](#️-依赖安装)
|
| 69 |
+
- [🧱 下载预训练模型](#-下载预训练模型)
|
| 70 |
+
- [📝 提示词指南](#-提示词指南)
|
| 71 |
+
- [🔑 使用方法](#-使用方法)
|
| 72 |
+
- [视频生成](#视频生成)
|
| 73 |
+
- [命令行参数](#命令行参数)
|
| 74 |
+
- [🧱 模型卡片](#-模型卡片)
|
| 75 |
+
- [🎬 更多示例](#-更多示例)
|
| 76 |
+
- [📊 性能评估](#-性能评估)
|
| 77 |
+
- [📚 引用](#-引用)
|
| 78 |
+
- [🙏 致谢](#-致谢)
|
| 79 |
+
- [🌟 GitHub Star 历史](#-github-star-历史)
|
| 80 |
+
|
| 81 |
+
|
| 82 |
+
## 📖 Introduction
|
| 83 |
+
我们推出了 HunyuanVideo 1.5,一个轻量级但功能强大的视频生成模型。该模型仅使用8.3B参数就实现了开源最先进的视觉质量和运动连贯性,并能在消费级 GPU 上进行高效推理。这一成果基于几个关键组件,包括精细的数据整理、采用稀疏注意力SSTA的DiT 架构、通过专用 OCR 编码增强的双语理解能力、渐进式预训练和后训练,以及高效的视频超分辨率网络。利用这些设计,我们开发了一个统一的框架,能够跨多种时长和分辨率生成高质量的文生视频和图生视频。大量实验证明,这个紧凑而高效的模型在开源模型中确立了新的技术标杆。通过发布 HunyuanVideo 1.5 的代码和权重,我们为社区提供了一个高性能的基础,显著降低了视频创作和研究的成本,使先进的视频生成技术对所有人更加触手可及。
|
| 84 |
+
|
| 85 |
+
## ✨ Key Features
|
| 86 |
+
- **轻量级高性能架构**:我们提出了一种高效架构,将 83 亿参数的 Diffusion Transformer(DiT)与 3D 因果 VAE 相结合,在空间维度实现了 16 倍的压缩,在时间轴上实现了 4 倍的压缩。此外,创新的 SSTA机制修剪了冗余的时空 kv 块,显著减少了长视频序列的计算开销,并加速了推理,在 10 秒 720p 视频合成中,相比 FlashAttention-3 实现了端到端 $1.87 \times $ 的加速。
|
| 87 |
+
|
| 88 |
+
|
| 89 |
+
<div align="center">
|
| 90 |
+
<img src="./assets/hy_video_1_5_dit.png" alt="HunyuanVideo-1.5 DiT" width="600">
|
| 91 |
+
</div>
|
| 92 |
+
|
| 93 |
+
|
| 94 |
+
- **视频超分辨率增强**:我们开发了一个高效的少步数超分辨率网络,可将输出上采样至 1080p。它在增强锐度的同时校正失真,从而优化细节和整体视觉纹理。
|
| 95 |
+
|
| 96 |
+
<div align="center">
|
| 97 |
+
<img src="./assets/hy_video_1_5_vsr.png" alt="HunyuanVideo-1.5 VSR" width="600">
|
| 98 |
+
</div>
|
| 99 |
+
|
| 100 |
+
- **端到端训练优化**:本工作采用了多阶段、渐进式的训练策略,覆盖了从预训练到后训练的整个流程。结合 Muon 优化器加速收敛,这种方法整体上优化了运动连贯性、美学质量和对人类偏好的对齐,实现了专业级的内容生成。
|
| 101 |
+
|
| 102 |
+
|
| 103 |
+
## 📜系统要求
|
| 104 |
+
|
| 105 |
+
### 硬件要求
|
| 106 |
+
|
| 107 |
+
- **GPU**:支持 CUDA 的 NVIDIA GPU
|
| 108 |
+
- **最低 GPU 显存**:14 GB(启用模型卸载时)
|
| 109 |
+
|
| 110 |
+
> **注意:** 上述内存要求是在启用模型卸载的情况下测量的。如果您的 GPU 有足够的显存,可以禁用卸载以提高推理速度。
|
| 111 |
+
|
| 112 |
+
### 软件要求
|
| 113 |
+
|
| 114 |
+
- **操作系统**:Linux
|
| 115 |
+
- **Python**:Python 3.10 或更高版本
|
| 116 |
+
- **CUDA**:与您的 PyTorch 安装兼容的 CUDA 版本
|
| 117 |
+
|
| 118 |
+
## 🛠️ 依赖安装
|
| 119 |
+
|
| 120 |
+
### 步骤 1:克隆仓库
|
| 121 |
+
|
| 122 |
+
```bash
|
| 123 |
+
git clone https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5.git
|
| 124 |
+
cd HunyuanVideo-1.5
|
| 125 |
+
```
|
| 126 |
+
|
| 127 |
+
### 步骤 2:安装基础依赖
|
| 128 |
+
|
| 129 |
+
```bash
|
| 130 |
+
pip install -r requirements.txt
|
| 131 |
+
pip install -i https://mirrors.tencent.com/pypi/simple/ --upgrade tencentcloud-sdk-python
|
| 132 |
+
```
|
| 133 |
+
|
| 134 |
+
### 步骤 3:安装注意力库
|
| 135 |
+
|
| 136 |
+
* Flash Attention
|
| 137 |
+
建议安装 Flash Attention 以实现更快的推理速度和更低的 GPU 内存消耗。
|
| 138 |
+
详细安装说明请参考 [Flash Attention](https://github.com/Dao-AILab/flash-attention)。
|
| 139 |
+
|
| 140 |
+
* Flex-Block-Attention
|
| 141 |
+
flex-block-attn 仅在使用稀疏注意力以实现更快推理时需要,可以通过以下命令安装:
|
| 142 |
+
```bash
|
| 143 |
+
git clone https://github.com/Tencent-Hunyuan/flex-block-attn.git
|
| 144 |
+
cd flex-block-attn
|
| 145 |
+
python3 setup.py install
|
| 146 |
+
```
|
| 147 |
+
|
| 148 |
+
* SageAttention
|
| 149 |
+
|
| 150 |
+
```bash
|
| 151 |
+
git clone https://github.com/cooper1637/SageAttention.git
|
| 152 |
+
cd SageAttention
|
| 153 |
+
export EXT_PARALLEL=4 NVCC_APPEND_FLAGS="--threads 8" MAX_JOBS=32 # Optional
|
| 154 |
+
python3 setup.py install
|
| 155 |
+
```
|
| 156 |
+
|
| 157 |
+
## 🧱 下载预训练模型
|
| 158 |
+
|
| 159 |
+
在生成视频之前,请先下载预训练模型。详细说明请参考 [checkpoints-download.md](checkpoints-download.md)。
|
| 160 |
+
|
| 161 |
+
## 📝 提示词指南
|
| 162 |
+
### 提示词撰写手册
|
| 163 |
+
提示词增强在我们的模型生成高质量视频方面起着至关重要的作用。通过撰写更长、更详细的提示词,生成的视频质量将得到显著改善。我们鼓励您编写全面且描述性的提示词,以获得最佳的视频质量。我们建议社区伙伴参考我们的官方指南,了解如何撰写有效的提示词。
|
| 164 |
+
|
| 165 |
+
|
| 166 |
+
**参考:** **[HunyuanVideo 1.5 提示词手册](https://doc.weixin.qq.com/doc/w3_AXcAcwZSAGgCNhei2zzNUS8O4mKop?scode=AJEAIQdfAAoE1dhviFAAkA-gaeACk)**
|
| 167 |
+
|
| 168 |
+
|
| 169 |
+
### 自动提示词增强的系统提示词
|
| 170 |
+
对于希望为其他大模型优化提示词的用户,建议参考文件 `hyvideo/utils/rewrite/t2v_prompt.py` 中 `t2v_rewrite_system_prompt` 的定义来指导文生视频的提示词重写。同样,对于图生视频重写,请参考 `hyvideo/utils/rewrite/i2v_prompt.py` 中 `i2v_rewrite_system_prompt` 的定义。
|
| 171 |
+
|
| 172 |
+
|
| 173 |
+
## 🔑 使用方法
|
| 174 |
+
### 视频生成
|
| 175 |
+
|
| 176 |
+
对于提示词重写,我们推荐使用 Gemini 或通过 vLLM 部署的大模型。当前代码库仅支持兼容 vLLM 接口的模型,如果您希望使用 Gemini,需自行实现相关接口调用。
|
| 177 |
+
|
| 178 |
+
对于 vLLM 接口的模型,需要注意 T2V 和 I2V 推荐使用不同的模型和环境变量:
|
| 179 |
+
|
| 180 |
+
- 文生视频(T2V):推荐使用 [Qwen3-235B-A22B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-235B-A22B-Thinking-2507),并配置 `T2V_REWRITE_BASE_URL` 与 `T2V_REWRITE_MODEL_NAME`
|
| 181 |
+
- 图生视频(I2V):推荐使用 [Qwen3-VL-235B-A22B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Instruct),并配置 `I2V_REWRITE_BASE_URL` 与 `I2V_REWRITE_MODEL_NAME`
|
| 182 |
+
|
| 183 |
+
> 你也可以将上述模型名替换为任何你已部署、与 vLLM 兼容的模型(包括 HuggingFace 等模型)。
|
| 184 |
+
>
|
| 185 |
+
> 默认为开启提示词重写。若需显式关闭,可以使用 `--rewrite false` 或 `--rewrite 0`。如果未配置 vLLM 提示词重写相关服务,管道会在本地直接生成,无远程重写。
|
| 186 |
+
|
| 187 |
+
示例:生成视频(支持 T2V/I2V。T2V 模式下设置 `IMAGE_PATH=none`,I2V 模式下指定图像路径)
|
| 188 |
+
|
| 189 |
+
```bash
|
| 190 |
+
export T2V_REWRITE_BASE_URL="<your_vllm_server_base_url>"
|
| 191 |
+
export T2V_REWRITE_MODEL_NAME="<your_model_name>"
|
| 192 |
+
export I2V_REWRITE_BASE_URL="<your_vllm_server_base_url>"
|
| 193 |
+
export I2V_REWRITE_MODEL_NAME="<your_model_name>"
|
| 194 |
+
|
| 195 |
+
PROMPT="A close-up shot captures a scene on a polished, light-colored granite kitchen counter, illuminated by soft natural light from an unseen window. Initially, the frame focuses on a tall, clear glass filled with golden, translucent apple juice standing next to a single, shiny red apple with a green leaf still attached to its stem. The camera moves horizontally to the right. As the shot progresses, a white ceramic plate smoothly enters the frame, revealing a fresh arrangement of about seven or eight more apples, a mix of vibrant reds and greens, piled neatly upon it. A shallow depth of field keeps the focus sharply on the fruit and glass, while the kitchen backsplash in the background remains softly blurred. The scene is in a realistic style."
|
| 196 |
+
|
| 197 |
+
IMAGE_PATH=./data/reference_image.png # 可选,'none' 或 <图像路径>
|
| 198 |
+
SEED=1
|
| 199 |
+
ASPECT_RATIO=16:9
|
| 200 |
+
RESOLUTION=480p
|
| 201 |
+
OUTPUT_PATH=./outputs/output.mp4
|
| 202 |
+
|
| 203 |
+
# 配置
|
| 204 |
+
N_INFERENCE_GPU=8 # 并行推理 GPU 数量
|
| 205 |
+
CFG_DISTILLED=true # 使用 CFG 蒸馏模型进行推理,2倍加速
|
| 206 |
+
SPARSE_ATTN=true # 使用稀疏注意力进行推理
|
| 207 |
+
SAGE_ATTN=false # 使用 SageAttention 进行推理
|
| 208 |
+
MODEL_PATH=ckpts # 预训练模型路径
|
| 209 |
+
|
| 210 |
+
torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
|
| 211 |
+
--prompt "$PROMPT" \
|
| 212 |
+
--image_path $IMAGE_PATH \
|
| 213 |
+
--resolution $RESOLUTION \
|
| 214 |
+
--aspect_ratio $ASPECT_RATIO \
|
| 215 |
+
--seed $SEED \
|
| 216 |
+
--cfg_distilled $CFG_DISTILLED \
|
| 217 |
+
--sparse_attn $SPARSE_ATTN \
|
| 218 |
+
--use_sageattn $SAGE_ATTN \
|
| 219 |
+
--output_path $OUTPUT_PATH \
|
| 220 |
+
--save_pre_sr_video \
|
| 221 |
+
--model_path $MODEL_PATH
|
| 222 |
+
```
|
| 223 |
+
|
| 224 |
+
### 命令行参数
|
| 225 |
+
|
| 226 |
+
| 参数 | 类型 | 是否必需 | 默认值 | 描述 |
|
| 227 |
+
|----------|------|----------|---------|-------------|
|
| 228 |
+
| `--prompt` | str | 是 | - | 用于视频生成的文本提示 |
|
| 229 |
+
| `--negative_prompt` | str | 否 | `''` | 用于视频生成的负向提示词 |
|
| 230 |
+
| `--resolution` | str | 是 | - | 视频分辨率:`480p` 或 `720p` |
|
| 231 |
+
| `--model_path` | str | 是 | - | 预训练模型目录的路径 |
|
| 232 |
+
| `--aspect_ratio` | str | 否 | `16:9` | 输出视频的宽高比 |
|
| 233 |
+
| `--num_inference_steps` | int | 否 | `50` | 推理步数 |
|
| 234 |
+
| `--video_length` | int | 否 | `121` | 要生成的帧数 |
|
| 235 |
+
| `--seed` | int | 否 | `123` | 随机种子,用于可复现性 |
|
| 236 |
+
| `--image_path` | str | 否 | `None` | 参考图像的路径(启用图生视频模式)。使用 `none` 或 `None` 可明确使用文生视频模式 |
|
| 237 |
+
| `--output_path` | str | 否 | `None` | 输出文件路径(如果未提供,则保存到 `./outputs/output_{transformer_version}_{timestamp}.mp4`) |
|
| 238 |
+
| `--sr` | bool | 否 | `true` | 启用超分辨率(使用 `--sr false` 或 `--sr 0` 来禁用) |
|
| 239 |
+
| `--save_pre_sr_video` | bool | 否 | `false` | 保存超分辨率处理前的原始视频(使用 `--save_pre_sr_video` 或 `--save_pre_sr_video true` 来启用,仅在启用超分辨率时有效) |
|
| 240 |
+
| `--rewrite` | bool | 否 | `true` | 启用提示词重写(使用 `--rewrite false` 或 `--rewrite 0` 来禁用,禁用可能导致视频生成质量降低) |
|
| 241 |
+
| `--cfg_distilled` | bool | 否 | `false` | 启用 CFG 蒸馏模型以加速推理(约 2 倍加速,使用 `--cfg_distilled` 或 `--cfg_distilled true` 来启用) |
|
| 242 |
+
| `--sparse_attn` | bool | 否 | `false` | 启用稀疏注意力以加速推理(约 1.5-2 倍加速,需要 H 系列 GPU,会自动启用 CFG 蒸馏,使用 `--sparse_attn` 或 `--sparse_attn true` 来启用) |
|
| 243 |
+
| `--offloading` | bool | 否 | `true` | 启用 CPU 卸载(使用 `--offloading false` 或 `--offloading 0` 来禁用,如果 GPU 内存允许,禁用后速度会更快) |
|
| 244 |
+
| `--group_offloading` | bool | 否 | `None` | 启用组卸载(默认:None,如果启用了 offloading 则自动启用。使用 `--group_offloading` 或 `--group_offloading true/1` 来启用,`--group_offloading false/0` 来禁用) |
|
| 245 |
+
| `--dtype` | str | 否 | `bf16` | Transformer 的数据类型:`bf16`(更快,内存占用更低)或 `fp32`(质量更好,速度更慢,内存占用更高) |
|
| 246 |
+
| `--use_sageattn` | bool | 否 | `false` | 启用 SageAttention(使用 `--use_sageattn` 或 `--use_sageattn true/1` 来启用,`--use_sageattn false/0` 来禁用) |
|
| 247 |
+
| `--sage_blocks_range` | str | 否 | `0-53` | SageAttention 块范围(例如:`0-5` 或 `0,1,2,3,4,5`) |
|
| 248 |
+
| `--enable_torch_compile` | bool | 否 | `false` | 启用 torch compile 以优化 transformer(使用 `--enable_torch_compile` 或 `--enable_torch_compile true/1` 来启用,`--enable_torch_compile false/0` 来禁用) |
|
| 249 |
+
|
| 250 |
+
**注意:** 使用 `--nproc_per_node` 指定使用的 GPU 数量。例如,`--nproc_per_node=8` 表示使用 8 个 GPU。
|
| 251 |
+
|
| 252 |
+
|
| 253 |
+
## 🧱 模型卡片
|
| 254 |
+
|模型名称| 下载链接 |
|
| 255 |
+
|-|---------------------------|
|
| 256 |
+
|HunyuanVideo 1.5-480P-T2V|[480P-T2V](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_t2v) |
|
| 257 |
+
|HunyuanVideo 1.5-480p-I2V |[480p-I2V](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_i2v) |
|
| 258 |
+
|HunyuanVideo 1.5-480p-T2V-distill | [480p-T2V-distill](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_t2v_distilled) |
|
| 259 |
+
|HunyuanVideo 1.5-480p-I2V-distill |[480p-I2V-distill](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_i2v_distilled) |
|
| 260 |
+
|HunyuanVideo 1.5-720P-T2V|[720P-T2V](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/720p_t2v) |
|
| 261 |
+
|HunyuanVideo 1.5-720P-I2V |[720P-I2V](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/720p_i2v) |
|
| 262 |
+
|HunyuanVideo 1.5-720P-T2V-distiill| Comming soon |
|
| 263 |
+
|HunyuanVideo 1.5-720P-I2V-distiill |[720P-I2V-distiill](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/720p_i2v_distilled) |
|
| 264 |
+
|HunyuanVideo 1.5-720P-T2V-sparse-distiill| Comming soon |
|
| 265 |
+
|HunyuanVideo 1.5-720P-I2V-sparse-distiill |[720P-I2V-sparse-distiill](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/720p_i2v_distilled_sparse) |
|
| 266 |
+
|HunyuanVideo 1.5-720p-sr |[720p-sr](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/720p_sr_distilled) |
|
| 267 |
+
|HunyuanVideo 1.5-1080p-sr |[1080p-sr](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/1080p_sr_distilled) |
|
| 268 |
+
|
| 269 |
+
|
| 270 |
+
## 🎬 更多示例
|
| 271 |
+
|特性|示例1|示例2|
|
| 272 |
+
|------|------|------|
|
| 273 |
+
|指令跟随能力|<video src="https://github.com/user-attachments/assets/fdc3c27b-69f5-46a1-b707-0b57510fa32f" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```一名哀伤的黑发中国女子凝望天空,复古胶片风格烘托出怀旧戏剧氛围``` </details> <details><summary>📋 Show rewrite prompt</summary> ```俯视角度,一位有着深色,略带凌乱的长卷发的年轻中国女性,佩戴着闪耀的珍珠项链和圆形金色耳环,她凌乱的头发被风吹散,她微微抬头,望向天空,神情十分哀伤,眼中含着泪水。嘴唇涂着红色口红。背景是带有华丽红色花纹的图案。画面呈现复古电影风格,色调低饱和,带着轻微柔焦,烘托情绪氛围,质感仿佛20世纪90年代的经典胶片风格,营造出怀旧且富有戏剧性的感觉。``` </details>|<video src="https://github.com/user-attachments/assets/3fcb42cc-cdd3-4651-86a6-645a858561c4" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```建筑蓝图上的线条化为实体,瞬间生长出一个完整的复古工业风办公空间。``` </details> <details><summary>📋 Show rewrite prompt</summary> ```一座空旷的现代阁楼里,有一张铺展在地板中央的建筑蓝图。忽然间,图纸上的线条泛起微光,仿佛被某种无形的力量唤醒。紧接着,那些发光的线条开始向上延伸,从平面中挣脱,勾勒出立体的轮廓——就像在空中进行一场无声的3D打印。随后,奇迹在加速发生:极简的橡木办公桌、优雅的伊姆斯风格皮质椅、高挑的工业风金属书架,还有几盏爱迪生灯泡,以光纹为骨架迅速“生长”出来。转瞬间,线条被真实的材质填充——木材的温润、皮革的质感、金属的冷静,都在眨眼间完整呈现。最终,所有家具稳固落地,蓝图的光芒悄然褪去。一个完整的办公空间,就这样从二维的图纸中诞生。``` </details>|
|
| 274 |
+
|流畅运动生成|<video src="https://github.com/user-attachments/assets/21f9da05-33d0-4521-b188-ea009e7fdd3f" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```A cosmic loaf of bread, with a volcanic black crust, is precisely sliced open to reveal a swirling nebula interior.``` </details> <details><summary>📋 Show rewrite prompt</summary> ```Cinematic 8K footage, with a stark, moody aesthetic. Under a dramatic top-down spotlight, a loaf of what appears to be bread rests on a slab of polished marble, which is flecked with silver that glitters like a starfield. The loaf's crust is a deep, matte black, cracked like cooled volcanic rock. A sleek, modern santoku knife, its sharp edge gleaming under the single light source, begins a series of clean, rhythmic cuts. With each precise, repetitive slice that falls away, the loaf’s impossible interior is revealed: not dough, but a compressed, swirling nebula of deep purples and blues, alive with pinpricks of glittering light. As the knife continues its precise motion, a fine, shimmering dust of cosmic particles settles on the marble. The extreme macro view focuses on the mesmerizing contrast between the blade’s cold steel and the ethereal, galaxy-filled substance of the bread. This is hyper-realistic macro videography at its finest.``` </details>|<video src="https://github.com/user-attachments/assets/49057fe8-a102-4fd7-bd92-e9561abb9f45" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```A figure skater performs a rapid, graceful Biellmann spin, captured from all angles.``` </details> <details><summary>📋 Show rewrite prompt</summary> ```The video captures a figure skater performing a Biellmann spin on ice. The subject is a female skater in a glittering costume. Initially, she spins on one leg. Then, she reaches back and pulls her free leg up. Next, she spins rapidly, becoming a blur of motion, with ice shavings spraying from her skate blade. The background is an ice rink with blurred advertising boards. The camera circles around the subject to capture the spin from all angles. The lighting is spotlit, creating lens flares and sparkles on her costume. The overall video presents a graceful artistic sports style.``` </details>|
|
| 275 |
+
|电影级美学|<video src="https://github.com/user-attachments/assets/4098cf72-357d-4b81-97df-6752064ce0c3" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```固定镜头,焦点在图片里的挂钟上,镜头轻微摇晃营造手持摄影感,wjw,filmphotos,Film Grain,Reversal film photography,Wong Kar-wai movies,cinematic photography, HK film style,neon lighting, in the style of Wong Kar Wai film``` </details> <details><summary>📋 Show rewrite prompt</summary> ```Handheld lens shooting, the camera focuses on the wall clock hanging on the green-toned wall, shaking slightly. The second hand sweeps steadily across the clock face, and the shadow of the clock cast on the wall shifts subtly with the movement of the lens.``` </details>|<video src="https://github.com/user-attachments/assets/2b4575e5-79f1-4011-bed0-e8380198f7c9" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```The leaves of calamus shine in the sunlight, dotted with dewdrops that trickle down to the ground with the breeze.``` </details> <details><summary>📋 Show rewrite prompt</summary> ```A macro shot focuses on long, slender calamus leaves, rendered in a cinematic photography realistic style. The main leaf, a vibrant, deep green, is positioned diagonally across the frame. Its surface is covered in tiny, glistening spherical dewdrops that catch and refract the bright morning sunlight, creating sparkling highlights. Initially, a larger, perfectly round dewdrop clings to the upper section of the leaf, its surface tension holding it in place. Then, as the leaf sways almost imperceptibly, the dewdrop begins to slowly dislodge. Next, it starts to trickle down the central vein of the leaf, its shape elongating slightly as it moves, leaving a subtle, glistening wet trail in its path. Finally, it reaches the pointed tip of the leaf, hangs for a brief moment, and falls out of the bottom of the frame. In the background, other leaves and blades of grass are softly blurred, creating a beautiful bokeh effect with soft, out-of-focus circles of light. The environment is bathed in the warm, golden glow of early morning sunlight, which streams in from behind the leaves, backlighting them and causing their wet edges to shine brilliantly. The overall impression is one of serene, natural beauty, captured in a highly realistic and detailed manner. This is a macro shot. The camera tilts down very slowly, following the path of the main dewdrop as it travels down the leaf. The lighting is soft and natural, with strong backlighting to create a radiant, glowing effect on the dewdrops and leaf edges, characteristic of professional nature photography. The atmosphere is peaceful and serene. The overall video presents a cinematic photography realistic style.``` </details>|
|
| 276 |
+
|文字渲染|<video src="https://github.com/user-attachments/assets/7c964fc5-c27e-4bd0-bf3f-eb8fca2caef6" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```赛博朋克风格的夜晚街角,一个巨大的招牌上, “Hunyuan Video 1.5”的霓虹灯管轮廓已经安装好。镜头推进,霓虹灯从“H”开始,伴随着‘滋滋’的电流声,每个字母依次亮起粉紫色的光芒,直到全部点亮,照亮了潮湿的街道。赛博朋克,城市美学``` </details> <details><summary>📋 Show rewrite prompt</summary> ```On a wet street corner in a cyberpunk city at night, a large neon sign reading "Hunyuan Video 1.5" lights up sequentially, illuminating the dark, rainy environment with a pinkish-purple glow. he scene is a dark, rain-slicked street corner in a futuristic, cinematic cyberpunk city. Mounted on the metallic, weathered facade of a building is a massive, unlit neon sign. The sign's glass tube framework clearly spells out the words "Hunyuan Video 1.5". Initially, the street is dimly lit, with ambient light from distant skyscrapers creating shimmering reflections on the wet asphalt below. Then, the camera zooms in slowly toward the sign. As it moves, a low electrical sizzling sound begins. In the background, the dense urban landscape of the cyberpunk metropolis is visible through a light atmospheric haze, with towering structures adorned with their own flickering advertisements. A complex web of cables and pipes crisscrosses between the buildings. The shot is at a low angle, looking up at the sign to emphasize its grand scale. The lighting is high-contrast and dramatic, dominated by the neon glow which creates sharp, specular reflections and deep shadows. The atmosphere is moody and tech-noir. The overall video presents a cinematic photography realistic style.,``` </details>|<video src="https://github.com/user-attachments/assets/94ce62d9-5788-4912-8e89-b7dc84d7bdc4" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```黑色背景上展示着艺术字体"Hunyuan Video 1.5",每个字母都由不同的流体构成,持续缓慢流动。多种不同质地、不互溶的彩色液体(如金属、牛奶、透明凝胶)在无重力环境中漂浮、碰撞``` </details> <details><summary>📋 Show rewrite prompt</summary> ```The artistic words "Hunyuan Video 1.5" are rendered in the center of the screen, with each character composed of a unique, slowly moving fluid, set against a deep black background, while colorful, immiscible liquid blobs float and collide around them in a zero-gravity environment. The main subject is the text "Hunyuan Video 1.5". The characters for "Hunyuan" are filled with a lustrous, molten gold liquid that swirls slowly. The letters for "Video" are composed of a creamy, opaque white fluid resembling milk, with gentle currents visible beneath its surface. The numbers "1.5" are made from a viscous, transparent blue gel that subtly undulates. Each fluid moves independently within the confines of its character's shape, creating a mesmerizing internal motion. This high-quality 3D CGI animation presents the fluids with photorealistic textures. In the surrounding space, several immiscible liquid blobs drift in zero gravity. A large, spherical blob of pearlescent liquid slowly floats from the upper left. A smaller, amorphous blob of shimmering, metallic silver drifts from the lower right, and a translucent, pink gelatinous mass wobbles nearby. Initially, these blobs drift aimlessly. Then, the silver blob slowly collides with the larger pearlescent one. As they make contact, their surfaces deform and ripple dynamically, but the liquids do not mix, pushing against each other before gently bouncing off and continuing their slow, separate paths in the pristine black void. The shot is at an eye-level angle, presenting a front view of the text. The camera remains static, ensuring the entire text "Hunyuan Video 1.5" is fully visible throughout the shot. The scene is lit by a soft, diffused light that highlights the brilliant reflections on the metallic fluids and the inner glow of the translucent gels, enhancing the high-quality 3D CGI animation. The atmosphere is quiet, abstract, and mesmerizing. The overall video has the polished look of a high-quality 3D CGI animation with a focus on abstract fluid dynamics.``` </details>|
|
| 277 |
+
|物理合理性|<video src="https://github.com/user-attachments/assets/07fa4dcd-0bd1-4935-bb89-323428cce6fc" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```The wind blows through the shabby bookshelf, and the pages flutter on it. ``` </details> <details><summary>📋 Show rewrite prompt</summary> ```In a dimly lit, dusty room, a gentle wind causes the pages of old books on a shabby wooden bookshelf to flutter. The bookshelf, made of dark, weathered wood, shows signs of age with peeling varnish, scratches, and a fine layer of settled dust on its surfaces. Several old books with faded, worn covers are arranged on the shelves; some stand upright while others lie on their sides. Initially, the scene is quiet. Then, a soft breeze enters the frame from the left, disturbing the dust on the shelves. Next, the yellowed, brittle pages of an open book lying flat begin to lift and ripple delicately. As the breeze continues, the pages of other books also start to flutter, some turning over slowly and gracefully, revealing aged text and faint illustrations within. In the background, the wall has faded, peeling wallpaper, and the overall atmosphere is one of quiet neglect and the passage of time. The shot is at an eye-level angle with the main subject. The camera pans to the left slowly. Soft, diffused sunlight filters through a dusty, off-camera window, creating distinct beams of light that cut through the dimness. This lighting highlights the texture of the old wood and the floating dust particles in the air, enhancing the photorealistic detail of the scene. The mood is melancholic and peaceful. The overall video presents a cinematic photography realistic style.``` </details>|<video src="https://github.com/user-attachments/assets/81065925-c008-421b-8cf0-b3cbf1e77eac" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```An intact soda can is slowly crushed by a hand.``` </details> <details><summary>📋 Show rewrite prompt</summary> ```In a medium close-up, a hand slowly crushes an intact red and white soda can on a wooden table. A male hand with visible, realistic skin texture is wrapped firmly around the middle of an intact, pristine red and white aluminum soda can. The can, covered in glistening condensation droplets, rests on a dark, polished wooden surface. The cinematic realism captures every minute detail of the scene. Initially, the hand's grip is steady, with the can's cylindrical shape perfectly preserved. Then, the fingers begin to tighten slowly, the knuckles whitening slightly from the exertion. Next, the smooth aluminum surface starts to buckle under the controlled pressure, a sharp crease forming vertically down its side as the metallic sheen distorts. As the hand continues its deliberate squeeze, the can collapses inward progressively, the vibrant red paint wrinkling as the metal structure crumples. Finally, the can is left significantly crushed, its form now an irregular, crumpled shape held tightly in the fist. The scene takes place on a dark, polished wooden tabletop that catches soft, diffuse reflections. The grain of the wood is faintly discernible, adding a layer of texture to the foreground. The background is completely out of focus, rendered as a soft, dark, and non-descript blur, which isolates the main action and enhances the photorealistic quality of the shot. The shot is a medium close-up, presented in a cinematic photography realistic style. The camera remains static at a slightly high angle, looking down to provide a clear and unobstructed view of the can's deformation. Soft side lighting creates high contrast, sculpting the muscles and tendons of the hand while casting specular highlights on the metallic can and the water droplets. The atmosphere is focused and intense. The overall video presents a cinematic photography realistic style.``` </details>|
|
| 278 |
+
|摄像机运动|<video src="https://github.com/user-attachments/assets/6deacbfe-4cca-48d7-a2be-cb638a3e01cb" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```圣诞节的家中,小女孩靠着妈妈听妈妈读书,背景是下着雪的窗外,镜头缓慢下移,一只可爱的长毛小白猫戴着圣诞帽趴在温暖的地摊上``` </details> <details><summary>📋 Show rewrite prompt</summary> ```In a cozy home on Christmas, a young girl leans against her mother as they read a book, and the camera moves down to reveal a fluffy white cat in a Santa hat resting on a warm rug. In a warmly lit living room on a snowy Christmas evening, a young mother and her little daughter are sitting together on a comfortable sofa. The mother, with a gentle expression and wearing a cream-colored knitted sweater, holds an open storybook with colorful illustrations. Her daughter, a small girl with brown hair in pigtails and a red pajama set, leans her head affectionately on her mother's shoulder, her eyes fixed on the book. On the floor below them, a fluffy, long-haired white cat is curled up on a plush, beige wool rug. The cat wears a tiny red and white Santa hat perched between its ears. Initially, the shot focuses on the mother and daughter, capturing their quiet, shared moment. The mother’s finger gently rests on the page of the book. Then, the camera slowly moves downward, gliding past the book and their laps. Finally, the camera settles at a low angle, bringing the adorable white cat into sharp focus as the primary subject. The cat's chest gently rises and falls with each breath, its eyes peacefully closed. Through a large window in the background, large, soft snowflakes can be seen falling silently against the dark blue twilight sky, creating a peaceful and serene backdrop. Faint, out-of-focus golden Christmas lights twinkle in the corner of the room, adding to the warm, festive atmosphere. The scene is imbued with a sense of comfort and holiday warmth, creating a beautiful cinematic photography realistic image. The camera slowly moves downward. The shot uses soft, warm interior lighting that casts gentle shadows, creating a high-contrast, cinematic look. A shallow depth of field keeps the focus on the subjects while beautifully blurring the background elements. The mood is heartwarming, peaceful, and festive. The overall video presents a cinematic photography realistic style.``` </details>|<video src="https://github.com/user-attachments/assets/8e72ed0f-f8ac-445b-97e5-eb4b16fbc121" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```The hiker begins walking forward along the trail, causing the water bottle to swing rhythmically with each step. The camera gradually pulls back and rises to reveal a vast desert landscape stretching out ahead.``` </details> <details><summary>📋 Show rewrite prompt</summary> ```The hiker begins walking forward along the trail, causing the water bottle to swing rhythmically with each step. The camera gradually pulls back and rises to reveal a vast desert landscape stretching out ahead, while the sun position shifts from afternoon to dusk, casting increasingly longer shadows across the terrain as the figure becomes smaller in the frame.``` </details>|
|
| 279 |
+
|多风格支持|<video src="https://github.com/user-attachments/assets/65b2c5a5-e6ba-43be-9462-a98b03b675f1" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```Have the cake man begin to take chunks out of himself and eat it.``` </details> <details><summary>📋 Show rewrite prompt</summary> ```The cake man sits on the chair, with his hands resting on his knees. Then, he slowly raises his right hand and breaks off a piece of cake from his left shoulder. Next, he brings the piece of cake to his mouth and begins to chew. At the same time, his eyes widen slightly, and his mouth parts gently. After that, he raises his right hand again, breaks off another piece of cake from his right arm, and repeats the action of bringing it to his mouth to chew.``` </details>|<video src="https://github.com/user-attachments/assets/de5f7480-b79c-4fc1-b345-c5880a3b5f9e" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```A little girl, carrying a colorful handbag, skips through the garden. The video uses claymation style.``` </details> <details><summary>📋 Show rewrite prompt</summary> ```A little girl with a colorful handbag skips through a whimsical claymation garden. In a vibrant garden constructed entirely from clay, a young girl, meticulously crafted in a claymation style, skips joyfully. She has chunky, sculpted yellow clay hair tied in pigtails that bounce with a slight stiffness, simple black button eyes, and a wide, permanently etched smile. She wears a simple pink clay dress with a white collar. In her left hand, she carries a small handbag molded from bright red and blue clay, which swings in a slightly jerky arc as she moves. Initially, the girl lifts her right leg high, her body momentarily suspended in a classic stop-motion pose. Then, she hops forward, landing lightly as her left leg swings through for the next skip. Her arms move in an exaggerated, back-and-forth rhythm, characteristic of stop-motion animation. Her movements are intentionally not perfectly fluid, highlighting the frame-by-frame nature of the claymation technique. The garden around her is a whimsical, textured world. In the foreground and mid-ground, oversized flowers with swirled purple and orange petals stand on thick green stems. The ground is a textured mat of green clay, showing subtle fingerprints and tool marks that add to the handmade charm. In the background, a pale blue clay backdrop features a simplified, smiling sun molded from yellow clay. The shot is at an eye-level angle with the main subject. The camera follows the subject, moving smoothly to the right to keep her in the frame. The lighting is bright and even, casting soft shadows that emphasize the rounded, three-dimensional forms of the clay models. The overall video presents a charming and detailed claymation style.``` </details>|
|
| 280 |
+
|高图视一致性|<img src="https://github.com/user-attachments/assets/3bc8e55d-c211-454e-8067-128c0e215eb6"> <video src="https://github.com/user-attachments/assets/3e6b7ee9-ec66-4e46-a446-801b1c1a1c81" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```女孩放下书,站起身,转身向屋内走去。镜头拉远。``` </details> <details><summary>📋 Show rewrite prompt</summary> ```女孩合上手中的书,将书放在身侧的窗台上。随后,她缓缓站起身,转身向屋内走去,身影逐渐没入门后的阴影中。镜头缓缓拉远,露出更多被绿植覆盖的屋檐和墙体。``` </details>|<img src="https://github.com/user-attachments/assets/7657ce60-90b5-4fdc-b713-0eaa55829b09"> <video src="https://github.com/user-attachments/assets/9ca24021-2353-40d5-8a4d-0f8e67d51826" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```女人手上的鸟亲了女人一口``` </details> <details><summary>📋 Show rewrite prompt</summary> ```女人手臂上的白色鹦鹉缓缓转过头,将喙轻轻触碰女人的脸颊,随后收回头部。女人嘴角微微上扬,目光温柔地注视着鹦鹉。背景中的绿植保持静止。``` </details>|
|
| 281 |
+
|
| 282 |
+
|
| 283 |
+
|
| 284 |
+
## 📊 性能评估
|
| 285 |
+
### 评分
|
| 286 |
+
我们使用全面的评分方法来评估文生视频生成,考虑了五个关键维度:文本-视频一致性、视觉质量、结构稳定性、运动效果以及单帧的美学质量。对于图生视频生成,评估包括图像-视频一致性、指令响应性、视觉质量、结构稳定性和运动效果。
|
| 287 |
+
|
| 288 |
+
<div align="center">
|
| 289 |
+
<img src="./assets/T2V_Rating.png" alt="rating result of t2v" width="800">
|
| 290 |
+
</div>
|
| 291 |
+
|
| 292 |
+
---
|
| 293 |
+
|
| 294 |
+
<div align="center">
|
| 295 |
+
<img src="./assets/I2V_Rating.png" alt="rating result of i2v" width="800">
|
| 296 |
+
</div>
|
| 297 |
+
|
| 298 |
+
|
| 299 |
+
### GSB
|
| 300 |
+
GSB(Good/Same/Bad)评估法被广泛用于基于整体视频感知质量来评估两个模型的相对性能。我们精心构建了300个多样化文本提示词和300个图像样本,以覆盖文本生成视频和图像生成视频任务的平衡应用场景。针对每个提示词或图像输入,各模型均在单次运行中生成同等数量的视频样本以确保可比性。为保持公平性,每个输入仅执行一次推理且不进行任何结果筛选。所有参与对比的模型均采用其默认配置进行评估,并由百余名专业评估员完成评测过程。
|
| 301 |
+
|
| 302 |
+
|
| 303 |
+
<div align="center">
|
| 304 |
+
<img src="./assets/T2V_GSB.png" alt="rating result of t2v" width="800">
|
| 305 |
+
</div>
|
| 306 |
+
|
| 307 |
+
---
|
| 308 |
+
|
| 309 |
+
<div align="center">
|
| 310 |
+
<img src="./assets/I2V_GSB.png" alt="gsb result of i2v" width="800">
|
| 311 |
+
</div>
|
| 312 |
+
|
| 313 |
+
|
| 314 |
+
## 📚 引用
|
| 315 |
+
```bibtex
|
| 316 |
+
@misc{hunyuanvideo2025,
|
| 317 |
+
title={HunyuanVideo 1.5 Technical Report},
|
| 318 |
+
author={Tencent Hunyuan Foundation Model Team},
|
| 319 |
+
year={2025},
|
| 320 |
+
publisher = {GitHub},
|
| 321 |
+
howpublished = {\url{https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5}},
|
| 322 |
+
}
|
| 323 |
+
```
|
| 324 |
+
|
| 325 |
+
## 🙏 致谢
|
| 326 |
+
我们要感谢 [Transformers](https://github.com/huggingface/transformers), [Diffusers](https://github.com/huggingface/diffusers) , [HuggingFace](https://huggingface.co/) 以及 [Qwen-VL](https://github.com/QwenLM/Qwen-VL)的贡献者,感谢他们的公开研究和探索。
|
| 327 |
+
|
| 328 |
+
## 🌟 GitHub Star 历史
|