dpravinv commited on
Commit
2dc60eb
·
verified ·
1 Parent(s): d2227f4

Add new SentenceTransformer model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,616 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: nomic-ai/modernbert-embed-base
3
+ language:
4
+ - en
5
+ library_name: sentence-transformers
6
+ license: apache-2.0
7
+ metrics:
8
+ - cosine_accuracy
9
+ pipeline_tag: sentence-similarity
10
+ tags:
11
+ - sentence-transformers
12
+ - sentence-similarity
13
+ - feature-extraction
14
+ - generated_from_trainer
15
+ - dataset_size:800
16
+ - loss:TripletLoss
17
+ widget:
18
+ - source_sentence: Logistics Services Manager
19
+ sentences:
20
+ - 'The Logistics Contracts Manager/Logistics Programme Manager is responsible for
21
+ managing multiple logistics programmes and related customer service activities.
22
+ He/She is also responsible for managing the contracts to ensure customer requirements
23
+ are met and managing overall programme resources, including manpower, internal
24
+ assets and external vendors.
25
+
26
+
27
+ Analytical and logical, he is required to manage resources and obtain buy-in from
28
+ internal and external stakeholders. He is also expected to lead programmes and
29
+ make business decisions independently.'
30
+ - "The Quality Engineer identifies user requirements and expectations to inform\
31
+ \ quality standards for end-products, and analyses product development processes\
32
+ \ to identify relevant quality standards. He/She incorporates relevant and suitable\
33
+ \ international standards into product development processes, quality standards\
34
+ \ and testing processes. He identifies quality-testing types and variations based\
35
+ \ on business needs and requirements and develops testing processes. He identifies\
36
+ \ suitable measures of quality for testing and contributes to the development\
37
+ \ of test scenarios and plans. He conducts various quality tests, and analyses\
38
+ \ data to identify operating and usage conditions in which performance of quality\
39
+ \ measures starts to decline. He also automates quality testing for applicable\
40
+ \ and suitable tests.\n\nHe works in a team setting and is proficient in programming\
41
+ \ languages required by the organisation. He is familiar with international quality\
42
+ \ standards, and uses test automation frameworks and tools, as well as applicable\
43
+ \ quality testing and analysis tools. \n\nThe Quality Engineer possesses strong\
44
+ \ analytical ability with excellent communication and interpersonal skills. He\
45
+ \ is highly meticulous in nature, curious and work dynamically."
46
+ - 'The Customer Services Agent provides assistance to customers at check-in counters.
47
+ He/She ensures that passengers details match the information on travel documents
48
+ and handles customer issues regarding flight operations and automated check-in
49
+ systems. To maintain a safe working environment, he complies with all safety and/or
50
+ security standards and reports safety and/or security breaches to officers and
51
+ supervisors.
52
+
53
+
54
+ The Customer Services Agent demonstrates professional behaviour when responding
55
+ to passenger complaints and acts as a service ambassador for the organisation.
56
+ He works in shifts to accommodate round-the-clock flight arrivals and departures.
57
+ He is physically strong to assist passengers with lifting of their baggage. Furthermore,
58
+ he is service-oriented, possesses good communication skills as well as handles
59
+ passengers with special needs in an appropriate manner.'
60
+ - source_sentence: Rail Facility Maintenance Technician
61
+ sentences:
62
+ - "A Patient Service Assistant Supervisor is responsible for supporting the frontline\
63
+ \ services provided. S/He assists supervisors in the management of department\
64
+ \ operations and the team involved in providing frontline services. S/He is required\
65
+ \ to assist in managing complaints. S/He assist supervisors in the performance\
66
+ \ of risk and quality management.\n\nS/He may work in various locations such as\
67
+ \ private and public hospitals, community and primary care settings. S/He may\
68
+ \ assist to manage different counters including reception for patient registration,\
69
+ \ billing and payment as well as patient care. \n\nS/He should be proactive and\
70
+ \ meticulous. S/He should possess interpersonal, leadership and problem-solving\
71
+ \ skills.\n"
72
+ - 'The Technician (Mechanical and Electrical) works in a team to perform preventive
73
+ and corrective maintenance of mechanical and electrical systems at various rail
74
+ premises. He/She assists in the preparation of maintenance work and performs routine
75
+ maintenance under supervision. He supports the team in conducting fault analysis
76
+ and testing to improve the reliability of mechanical and electrical systems as
77
+ well as supervises the work of contractors and external stakeholders in ensuring
78
+ compliance to safety requirements and operating standards.
79
+
80
+
81
+ He is required to work in shifts and carries out his duties in the workshops and
82
+ at various train stations. He is technically inclined and adept in the repair
83
+ and maintenance of mechanical and electrical systems. He is capable of communicating
84
+ effectively within the team, able to multi-task and prioritise his assigned maintenance
85
+ workload in supporting maintenance activities.'
86
+ - 'The Product Manager drives the conceptualisation, development, launch and ongoing
87
+ evolution of specific products for the organisation to deliver the intended customer
88
+ experience. He/She develops the strategic roadmap for the products in alignment
89
+ with the overall product strategy, and ensures that the product roadmap supports
90
+ business drivers by defining key success criteria for the product. He directs
91
+ market research for gathering product feedback and identifying improvement areas
92
+ and opportunities for the product and/or associated services. He also collaborates
93
+ with various teams to develop engaging marketing materials for integrated product
94
+ and content/service offerings.
95
+
96
+
97
+ The work involves collaboration with the organisation''s leadership for defining
98
+ the strategic direction for the product to drive the operational efficiency and
99
+ customer reach. He is expected to keep an eye on the market for tracking the evolution
100
+ of technologies, competitors and customer behaviour that could impact the product
101
+ and/or service.
102
+
103
+
104
+ He should be an effective leader, with a broad sense of perspective and strong
105
+ business acumen. He ought to possess the ability to inspire and influence key
106
+ internal and external stakeholders and should be able to build and manage wider
107
+ relationships. He should also be seen as a key industry expert in his domain.'
108
+ - source_sentence: Business Development Analyst
109
+ sentences:
110
+ - 'The Production Supervisor supervises production staff to ensure production targets
111
+ are met, in accordance with organisation policies and workplace safety and health
112
+ regulations. He/She is responsible for planning, assigning and directing work,
113
+ coordinating weekly meetings, addressing product and employee complaints, and
114
+ resolving problems. He also implements policies and procedures and recommends
115
+ improvements with a view to increase efficiency and productivity in production
116
+ methods, equipment, operating procedures and working conditions.
117
+
118
+
119
+ He works with his colleagues in a manufacturing plant setting. He possesses leadership
120
+ and communication skills to set direction to achieve organisational goals.'
121
+ - "The Business Analyst/Market Research Analyst/Market Analyst supports the operational\
122
+ \ insights for the development of business strategies. He/She identifies areas\
123
+ \ for new business development opportunities by gathering data, analysing information\
124
+ \ and generating reports based on industry and market trends. \n\nThe Business\
125
+ \ Analyst/Market Research Analyst/Market Analyst possesses good communication,\
126
+ \ planning and organisational skills. He is also able to manage stakeholders and\
127
+ \ work effectively in a team. He is a highly driven, motivated and confident individual,\
128
+ \ and is able to deliver results in a dynamic business environment."
129
+ - "The Lead Product Designer drives the design and development of the product line\
130
+ \ lifecycle, including the end-to-end iterative design process. He/She empowers\
131
+ \ the team to drive product development in the conceptualisation and design phase,\
132
+ \ including formulation of design strategies and achieving design solutions based\
133
+ \ on insights researched by the team.\n\nHe evaluates design concepts and drawings\
134
+ \ to determine the best product. He has a strong understanding on how product\
135
+ \ technologies and frameworks can formulate impactful design concepts, is well-versed\
136
+ \ in product development lifecycles and stays abreast of the latest emerging industry\
137
+ \ trends in terms of product design.\n\nThe Lead Product Designer translates market\
138
+ \ insights, emerging industry trends and feedback from teams, into novel product\
139
+ \ design strategies, with a clear view of how this sits within the product development\
140
+ \ lifecycle. He is articulate and a strong communicator with internal and external\
141
+ \ stakeholders and manages stakeholders\x92 expectations as well as coach the\
142
+ \ team to build their competencies in product design. "
143
+ - source_sentence: In-Flight Catering Operations Manager
144
+ sentences:
145
+ - 'The Manager (Production/Catering-Cabin) leads collaborative efforts with other
146
+ departments and airlines to review catering operations and ensure compliance with
147
+ food hygiene and quality standards. He/She is responsible for driving continuous
148
+ improvement and business development initiatives to improve productivity and meet
149
+ customer needs. He develops Standard Operating Procedures (SOPs) and systems to
150
+ mitigate safety and/or security risks and oversees adherence to safety and/or
151
+ security standards. He also develops the teams technical capabilities through
152
+ coaching and maintains positive morale within the teams.
153
+
154
+
155
+ The Manager (Production/Catering-Cabin) has an in-depth knowledge of supply chain
156
+ operations, food handling and production processes in the airline industry. He
157
+ also possesses remarkable interpersonal and stakeholder management skills to build
158
+ and maintain relationships with internal and external stakeholders. In addition,
159
+ he has strong communication and people management skills to lead staff and teams
160
+ with extensive knowledge of policy requirements and quality and hygiene regulations
161
+ of the organisation and internationally.'
162
+ - "The Waste Recycling Executive/Waste Recovery Executive assists with the management\
163
+ \ of waste sorting and materials recovery operations. He/She consolidates relevant\
164
+ \ data to research on the existing and emerging trends on waste and recyclables\
165
+ \ sorting processes. He also recommends suitable equipment and/or technologies\
166
+ \ to improve waste and recyclables sorting operations. He is required to evaluate\
167
+ \ reported mechanical faults to rectify issues. In performing most of these functions,\
168
+ \ he recommends and facilitate the implementation of effective work processes,\
169
+ \ maintenance schedules of equipment and manage incidents related to waste sorting\
170
+ \ operations. \n\nHe works in a waste management facility where he is exposed\
171
+ \ to unpleasant sights and smells, and may at times be exposed to dangerous and/or\
172
+ \ toxic substances. He oversees the handling of potentially dangerous materials\
173
+ \ and ensures that all activities are completed in a safe and efficient manner.\
174
+ \ He is also required to manage teams and incidents relating to waste sorting\
175
+ \ and materials recovery operations and to communicate with relevant stakeholders\
176
+ \ and clients. \n\nHe is organised, responsive, approachable, able to multi-task\
177
+ \ and capable of interacting with stakeholders."
178
+ - The Associate Business Analyst assists in the identification and analysis of business
179
+ requirements and systems specifications. He/She conducts feasibility studies and
180
+ analysis on the risk and benefits of proposed solutions. He analyses systems and
181
+ processes to identify enhancement opportunities to resolve system gaps, evaluates
182
+ the ability of an existing system to support proposed changes, and identifies
183
+ systems deficiencies and performance gaps. He assists with translating business
184
+ requirements into functional specifications, and documents specifications and
185
+ interfaces between legacy and new systems, and systems enhancements and detailed
186
+ specifications. He supports users on change control and systems updates and User
187
+ Acceptance Testing and integration testing in accordance with the implementation
188
+ plan. He is knowledgeable of techniques to elicit and manage requirements, as
189
+ well as software development models including Agile methodologies. He is also
190
+ familiar with requirements life cycle management, analysis planning and monitoring,
191
+ requirements analysis and design definition. The Associate Business Analyst possesses
192
+ an analytical mind, and is able to see interlinkages with system solutions and
193
+ usability. He adopts a systematic approach in handling ambiguous or complex issues,
194
+ and actively discusses his perspectives to arrive at effective solutions.
195
+ - source_sentence: Senior Aerospace Quality Control Engineer
196
+ sentences:
197
+ - The Senior NDT Level 3 Engineer (Aircraft Maintenance) manages non-destructive
198
+ testing (NDT) operations for assessing the quality of aircraft structures. He/She
199
+ establishes new NDT techniques and qualifies new procedures. He drives compliance
200
+ of all NDT inspections with the requirements of customers, original equipment
201
+ manufacturer (OEM) and EN 4179, NAS 410, NADCAP as appropriate. He drives collaboration
202
+ with workshops and engineering teams for failure investigations and recommends
203
+ engineering solutions for structural flaws and defects. He develops special process
204
+ control plans and manages equipment maintenance and operator certification programmes.
205
+ He also monitors results of NDT for trends and corrective actions, and leads technical
206
+ audits to ensure compliance with relevant standards and NDT requirements. He reviews
207
+ compliance with airworthiness and legislative requirements, while proposing enhancements
208
+ to the organisation's standard operating procedures (SOPs), and safety, health
209
+ and quality systems. He proactively contributes to the development of lean and
210
+ sustainability practices, and conducts research and digital innovation in NDT
211
+ for continuous process improvements. As a team leader, he appraises staff performance
212
+ and conducts coaching and training for level 1 and level 2 NDT personnel. He is
213
+ able to work cross-functionally, employing critical reasoning, analytical thinking
214
+ and problem-solving skills to identify deviations and mitigate potential quality
215
+ risks in aircraft maintenance processes.
216
+ - "The Workplace Safety and Health (WSH) Coordinator is responsible for coordinating\
217
+ \ health and safety systems in the organisation, and conducting periodic inspections\
218
+ \ to ensure that the implemented risk control measures are being observed and\
219
+ \ practiced. He/she investigates and reports WSH incidents and coordinate implementation\
220
+ \ of emergency preparedness and response plans. \n\nHe/She is required to work\
221
+ \ on-site in his course of work. \n\nThe WSH Coordinator is practical and meticulous.\
222
+ \ He is required to observe safety hazards and deal with them in a prompt and\
223
+ \ decisive manner.\n"
224
+ - 'The Automation Coordinator/Robot Coordinator oversees automated equipment and
225
+ robots used in manufacturing processes. He/She is the primary responder, responsible
226
+ for troubleshooting automated production systems and performing preventive and
227
+ predictive maintenance on equipment. He also contributes to process optimisation
228
+ by managing data from automated manufacturing systems to facilitate real-time
229
+ insight gathering and decision-making.
230
+
231
+
232
+ He may be required to work on rotating shifts in a factory setting, and under
233
+ strict compliance to workplace safety and health requirements, organisational
234
+ quality control and other parameters.
235
+
236
+
237
+ He is able to work independently, and as part of a team, to achieve production
238
+ and quality targets, and interact effectively with others to ensure that all issues
239
+ are resolved appropriately and efficiently.'
240
+ model-index:
241
+ - name: modernbert-job-role-matcher
242
+ results:
243
+ - task:
244
+ type: triplet
245
+ name: Triplet
246
+ dataset:
247
+ name: Unknown
248
+ type: unknown
249
+ metrics:
250
+ - type: cosine_accuracy
251
+ value: 0.9800000190734863
252
+ name: Cosine Accuracy
253
+ ---
254
+
255
+ # modernbert-job-role-matcher
256
+
257
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [nomic-ai/modernbert-embed-base](https://huggingface.co/nomic-ai/modernbert-embed-base). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
258
+
259
+ ## Model Details
260
+
261
+ ### Model Description
262
+ - **Model Type:** Sentence Transformer
263
+ - **Base model:** [nomic-ai/modernbert-embed-base](https://huggingface.co/nomic-ai/modernbert-embed-base) <!-- at revision d556a88e332558790b210f7bdbe87da2fa94a8d8 -->
264
+ - **Maximum Sequence Length:** 8192 tokens
265
+ - **Output Dimensionality:** 768 dimensions
266
+ - **Similarity Function:** Cosine Similarity
267
+ <!-- - **Training Dataset:** Unknown -->
268
+ - **Language:** en
269
+ - **License:** apache-2.0
270
+
271
+ ### Model Sources
272
+
273
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
274
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
275
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
276
+
277
+ ### Full Model Architecture
278
+
279
+ ```
280
+ SentenceTransformer(
281
+ (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel
282
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
283
+ (2): Normalize()
284
+ )
285
+ ```
286
+
287
+ ## Usage
288
+
289
+ ### Direct Usage (Sentence Transformers)
290
+
291
+ First install the Sentence Transformers library:
292
+
293
+ ```bash
294
+ pip install -U sentence-transformers
295
+ ```
296
+
297
+ Then you can load this model and run inference.
298
+ ```python
299
+ from sentence_transformers import SentenceTransformer
300
+
301
+ # Download from the 🤗 Hub
302
+ model = SentenceTransformer("dpravinv/modernbert-job-role-matcher")
303
+ # Run inference
304
+ sentences = [
305
+ 'Senior Aerospace Quality Control Engineer',
306
+ "The Senior NDT Level 3 Engineer (Aircraft Maintenance) manages non-destructive testing (NDT) operations for assessing the quality of aircraft structures. He/She establishes new NDT techniques and qualifies new procedures. He drives compliance of all NDT inspections with the requirements of customers, original equipment manufacturer (OEM) and EN 4179, NAS 410, NADCAP as appropriate. He drives collaboration with workshops and engineering teams for failure investigations and recommends engineering solutions for structural flaws and defects. He develops special process control plans and manages equipment maintenance and operator certification programmes. He also monitors results of NDT for trends and corrective actions, and leads technical audits to ensure compliance with relevant standards and NDT requirements. He reviews compliance with airworthiness and legislative requirements, while proposing enhancements to the organisation's standard operating procedures (SOPs), and safety, health and quality systems. He proactively contributes to the development of lean and sustainability practices, and conducts research and digital innovation in NDT for continuous process improvements. As a team leader, he appraises staff performance and conducts coaching and training for level 1 and level 2 NDT personnel. He is able to work cross-functionally, employing critical reasoning, analytical thinking and problem-solving skills to identify deviations and mitigate potential quality risks in aircraft maintenance processes.",
307
+ 'The Automation Coordinator/Robot Coordinator oversees automated equipment and robots used in manufacturing processes. He/She is the primary responder, responsible for troubleshooting automated production systems and performing preventive and predictive maintenance on equipment. He also contributes to process optimisation by managing data from automated manufacturing systems to facilitate real-time insight gathering and decision-making.\n\nHe may be required to work on rotating shifts in a factory setting, and under strict compliance to workplace safety and health requirements, organisational quality control and other parameters.\n\nHe is able to work independently, and as part of a team, to achieve production and quality targets, and interact effectively with others to ensure that all issues are resolved appropriately and efficiently.',
308
+ ]
309
+ embeddings = model.encode(sentences)
310
+ print(embeddings.shape)
311
+ # [3, 768]
312
+
313
+ # Get the similarity scores for the embeddings
314
+ similarities = model.similarity(embeddings, embeddings)
315
+ print(similarities.shape)
316
+ # [3, 3]
317
+ ```
318
+
319
+ <!--
320
+ ### Direct Usage (Transformers)
321
+
322
+ <details><summary>Click to see the direct usage in Transformers</summary>
323
+
324
+ </details>
325
+ -->
326
+
327
+ <!--
328
+ ### Downstream Usage (Sentence Transformers)
329
+
330
+ You can finetune this model on your own dataset.
331
+
332
+ <details><summary>Click to expand</summary>
333
+
334
+ </details>
335
+ -->
336
+
337
+ <!--
338
+ ### Out-of-Scope Use
339
+
340
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
341
+ -->
342
+
343
+ ## Evaluation
344
+
345
+ ### Metrics
346
+
347
+ #### Triplet
348
+
349
+ * Evaluated with [<code>TripletEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator)
350
+
351
+ | Metric | Value |
352
+ |:--------------------|:---------|
353
+ | **cosine_accuracy** | **0.98** |
354
+
355
+ <!--
356
+ ## Bias, Risks and Limitations
357
+
358
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
359
+ -->
360
+
361
+ <!--
362
+ ### Recommendations
363
+
364
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
365
+ -->
366
+
367
+ ## Training Details
368
+
369
+ ### Training Dataset
370
+
371
+ #### Unnamed Dataset
372
+
373
+ * Size: 800 training samples
374
+ * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
375
+ * Approximate statistics based on the first 800 samples:
376
+ | | anchor | positive | negative |
377
+ |:--------|:---------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
378
+ | type | string | string | string |
379
+ | details | <ul><li>min: 4 tokens</li><li>mean: 6.82 tokens</li><li>max: 14 tokens</li></ul> | <ul><li>min: 73 tokens</li><li>mean: 180.48 tokens</li><li>max: 380 tokens</li></ul> | <ul><li>min: 73 tokens</li><li>mean: 181.7 tokens</li><li>max: 380 tokens</li></ul> |
380
+ * Samples:
381
+ | anchor | positive | negative |
382
+ |:----------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
383
+ | <code>Port Vessel Navigator</code> | <code>The Helmsman manoeuvres and handles boats or crafts operating within the Port Limit of Singapore Territorial Waters. He/She is able to use the craft's navigational, fire-fighting and safety equipment and appreciate weather conditions, tides and tidal currents. He also performs basic chartwork, monitors and anticipates potential problems that may arise during daily operations and alerts the relevant authorities to them. He must pass a colour vision test and fulfil the requirements of the Port Limit Helmsman Licence issued by the Maritime and Port Authority of Singapore (MPA).</code> | <code>The Associate Counsellor assists in providing counselling services and support to individuals and families experiencing socioemotional and mental health challenges. This includes case management for cases of low complexity and risk and provision of clinical services through various modes of counselling such as face-to-face or online counselling and group work.He/She may also support department research through data collection and coordinate internal and external training/programmes.<br><br>A patient and compassionate professional, the Associate Counsellor works in diverse settings across social services, including care homes, educational institutes, family service centres and healthcare facilities. He works under supervision as part of a collaborative team.</code> |
384
+ | <code>Room Reservations Manager</code> | <code>The Reservations Executive/Reservations Supervisor is responsible for supervising the operations of the department in selling rooms and managing room inventory to maximise sales. He/She ensures that all guest requests, concerns and feedback relating to rooms reservations are addressed in a timely and professional manner and collaborates with relevant departments on booking requirements and special guest requests to provide a seamless guest experience.<br><br>He performs checks to ensure the accuracy of reservation bookings and records, keeps track of room availability and inventory, monitors room sales and occupancy levels and analyses reservations forecast to maximise the property's occupancy potential. He assists to meet monthly revenue targets by identifying new contacts and proposing promotional packages to increase room sales and revenue. He is also responsible for monitoring the team's compliance with the property's policies and procedures for reservations operations. He guides and coa...</code> | <code>The Membership Director/Assistant Director assumes overall responsibility in driving member attraction, recruitment and retention. He/she develops membership development and engagement strategies with the support of member research. He works with multiple stakeholders to advocate for member needs and interests to the relevant government agencies. He also oversees the execution and delivery of membership activities and events. The Membership Director/Assistant Director is highly driven, detail-oriented and strategic in handling all aspects of member relations. He is articulate and has excellent communication and people management skills to develop and maintain strong relationships among various stakeholders. He is able to multi-task and rally his team to deliver excellent membership experiences.</code> |
385
+ | <code>Quality Assurance Engineer</code> | <code>The Quality Engineer identifies user requirements and expectations to inform quality standards for end-products, and analyses product development processes to identify relevant quality standards. He/She incorporates relevant and suitable international standards into product development processes, quality standards and testing processes. He identifies quality-testing types and variations based on business needs and requirements and develops testing processes. He identifies suitable measures of quality for testing and contributes to the development of test scenarios and plans. He conducts various quality tests, and analyses data to identify operating and usage conditions in which performance of quality measures starts to decline. He also automates quality testing for applicable and suitable tests.<br><br>He works in a team setting and is proficient in programming languages required by the organisation. He is familiar with international quality standards, and uses test automation frameworks and...</code> | <code>The Industry Development Director/Assistant Director plays a key role in collaborating with key government agencies and other organisations to drive industry development and transformation. He/she collaborates with multiple stakeholders to represent the industry needs and interests at relevant platforms. He drives industry innovation and adoption of technology, and oversees the execution and delivery of local industry projects and initiatives. He also builds effective relationships with strategic partners and stakeholders to identify growth opportunities for the industry.The Industry Development Director/Assistant Director is an effective communicator and presenter, able to develop strong working relationships with relevant stakeholders and strategic partners. He has good listening skills and is analytical and professional in addressing the concerns of the industry. He is forward-looking, able to set out a clear strategic direction and inspire the team towards achieving desired outcome...</code> |
386
+ * Loss: [<code>TripletLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#tripletloss) with these parameters:
387
+ ```json
388
+ {
389
+ "distance_metric": "TripletDistanceMetric.EUCLIDEAN",
390
+ "triplet_margin": 5
391
+ }
392
+ ```
393
+
394
+ ### Evaluation Dataset
395
+
396
+ #### Unnamed Dataset
397
+
398
+ * Size: 200 evaluation samples
399
+ * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
400
+ * Approximate statistics based on the first 200 samples:
401
+ | | anchor | positive | negative |
402
+ |:--------|:---------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
403
+ | type | string | string | string |
404
+ | details | <ul><li>min: 4 tokens</li><li>mean: 6.95 tokens</li><li>max: 12 tokens</li></ul> | <ul><li>min: 73 tokens</li><li>mean: 178.67 tokens</li><li>max: 313 tokens</li></ul> | <ul><li>min: 73 tokens</li><li>mean: 173.81 tokens</li><li>max: 380 tokens</li></ul> |
405
+ * Samples:
406
+ | anchor | positive | negative |
407
+ |:-------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
408
+ | <code>Room Sales Coordinator</code> | <code>The Reservations Executive/Reservations Supervisor is responsible for supervising the operations of the department in selling rooms and managing room inventory to maximise sales. He/She ensures that all guest requests, concerns and feedback relating to rooms reservations are addressed in a timely and professional manner and collaborates with relevant departments on booking requirements and special guest requests to provide a seamless guest experience.<br><br>He performs checks to ensure the accuracy of reservation bookings and records, keeps track of room availability and inventory, monitors room sales and occupancy levels and analyses reservations forecast to maximise the property's occupancy potential. He assists to meet monthly revenue targets by identifying new contacts and proposing promotional packages to increase room sales and revenue. He is also responsible for monitoring the team's compliance with the property's policies and procedures for reservations operations. He guides and coa...</code> | <code>The Technician supports the team to perform routine bus servicing and preventive corrective maintenance activities. His/Her duties include preparation of work activities, perform assigned servicing and maintenance tasks of different bus sub-systems, perform general housekeeping of workshop tools and equipment as well as adheres to Workplace Safety and Health (WSH) procedures. He may be deployed to support on-the-road bus breakdown assistance and recovery.<br><br>He is required to work in the bus workshop and/or depot environment based on rotating shifts. He is able to interact effectively with others when carrying out his duties and has the opportunity to gain experience, knowledge as well as deepen his technical and maintenance skills on various bus sub-systems.</code> |
409
+ | <code>Data Compliance Officer</code> | <code>The Data Protection Officer executes data governance policies and procedures. He/She ensures the Data Protection Act is implemented and enforced in the organisation, and amongst the respective teams and users. He collaborates with business and project teams in projects and ensures alignment and compliance with the organisation's data protection guidelines and policies, and with industry standards and guidelines. He also directs a team of professionals and third-party vendors or service providers to achieve organisational goals in accordance with the data governance and data protection policies. He manages risks and data breach incidents. The Data Protection Officer is knowledgeable in areas of data governance, compliance and data protection policies and frameworks, and works within and across teams to mitigate data breaches. He is expected to be proficient in the requirements under the Personal Data Protection Act 2012. The Data Protection Officer adopts a broad and global perspective ...</code> | <code>The Senior Process Safety Engineer provides technical advice and guidance on process safety-related activities. He/She leads the implementation of the Process Safety Management (PSM) framework in the organisation, and reviews plant safeguarding system requirements to ensure compliance with process safety standards. In addition, he provides technical input for the development and maintenance of the organisations Major Hazard Installation (MHI) Safety Case.<br><br>The Senior Process Safety Engineer administers the Workplace Safety and Health (WSH) and Environmental Management Systems (EMS) by advising on the development and improvement of Safe System of Work (SSoW) frameworks, and by ensuring proper closure of process safety incident investigations and their notification to relevant authorities. He provides support and advice for asset integrity assurance and compliance, and leads process safety reviews during new projects. In addition, he contributes to staff capability development by coachin...</code> |
410
+ | <code>Maritime Craft Handler</code> | <code>The Helmsman manoeuvres and handles boats or crafts operating within the Port Limit of Singapore Territorial Waters. He/She is able to use the craft's navigational, fire-fighting and safety equipment and appreciate weather conditions, tides and tidal currents. He also performs basic chartwork, monitors and anticipates potential problems that may arise during daily operations and alerts the relevant authorities to them. He must pass a colour vision test and fulfil the requirements of the Port Limit Helmsman Licence issued by the Maritime and Port Authority of Singapore (MPA).</code> | <code>The Head of Design strategises the design and development of the product line lifecycle, including the end-to-end iterative design process. He/She establishes design policy principles to drive product development in the conceptualisation and design phase, including endorsement of design strategies, and achieving design solutions based on insights researched by the team<br><br>He provides insightful directives based on the evaluation of design concepts and drawings by the team to determine the best product and ensure that it is aligned to the latest market trends. He has a strong understanding on how product technologies and frameworks can formulate impactful design concepts, is well-versed in product development lifecycles and stays abreast of the latest emerging industry trends in terms of product design. <br><br>The Head of Design adopts a global mindset while distilling market trends to incorporate them into novel product design strategies, with a clear view of how this sits within the product ...</code> |
411
+ * Loss: [<code>TripletLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#tripletloss) with these parameters:
412
+ ```json
413
+ {
414
+ "distance_metric": "TripletDistanceMetric.EUCLIDEAN",
415
+ "triplet_margin": 5
416
+ }
417
+ ```
418
+
419
+ ### Training Hyperparameters
420
+ #### Non-Default Hyperparameters
421
+
422
+ - `eval_strategy`: epoch
423
+ - `per_device_train_batch_size`: 4
424
+ - `per_device_eval_batch_size`: 4
425
+ - `gradient_accumulation_steps`: 4
426
+ - `learning_rate`: 2e-05
427
+ - `lr_scheduler_type`: cosine
428
+ - `warmup_ratio`: 0.1
429
+ - `load_best_model_at_end`: True
430
+ - `batch_sampler`: no_duplicates
431
+
432
+ #### All Hyperparameters
433
+ <details><summary>Click to expand</summary>
434
+
435
+ - `overwrite_output_dir`: False
436
+ - `do_predict`: False
437
+ - `eval_strategy`: epoch
438
+ - `prediction_loss_only`: True
439
+ - `per_device_train_batch_size`: 4
440
+ - `per_device_eval_batch_size`: 4
441
+ - `per_gpu_train_batch_size`: None
442
+ - `per_gpu_eval_batch_size`: None
443
+ - `gradient_accumulation_steps`: 4
444
+ - `eval_accumulation_steps`: None
445
+ - `torch_empty_cache_steps`: None
446
+ - `learning_rate`: 2e-05
447
+ - `weight_decay`: 0.0
448
+ - `adam_beta1`: 0.9
449
+ - `adam_beta2`: 0.999
450
+ - `adam_epsilon`: 1e-08
451
+ - `max_grad_norm`: 1.0
452
+ - `num_train_epochs`: 3
453
+ - `max_steps`: -1
454
+ - `lr_scheduler_type`: cosine
455
+ - `lr_scheduler_kwargs`: {}
456
+ - `warmup_ratio`: 0.1
457
+ - `warmup_steps`: 0
458
+ - `log_level`: passive
459
+ - `log_level_replica`: warning
460
+ - `log_on_each_node`: True
461
+ - `logging_nan_inf_filter`: True
462
+ - `save_safetensors`: True
463
+ - `save_on_each_node`: False
464
+ - `save_only_model`: False
465
+ - `restore_callback_states_from_checkpoint`: False
466
+ - `no_cuda`: False
467
+ - `use_cpu`: False
468
+ - `use_mps_device`: False
469
+ - `seed`: 42
470
+ - `data_seed`: None
471
+ - `jit_mode_eval`: False
472
+ - `use_ipex`: False
473
+ - `bf16`: False
474
+ - `fp16`: False
475
+ - `fp16_opt_level`: O1
476
+ - `half_precision_backend`: auto
477
+ - `bf16_full_eval`: False
478
+ - `fp16_full_eval`: False
479
+ - `tf32`: None
480
+ - `local_rank`: 0
481
+ - `ddp_backend`: None
482
+ - `tpu_num_cores`: None
483
+ - `tpu_metrics_debug`: False
484
+ - `debug`: []
485
+ - `dataloader_drop_last`: False
486
+ - `dataloader_num_workers`: 0
487
+ - `dataloader_prefetch_factor`: None
488
+ - `past_index`: -1
489
+ - `disable_tqdm`: False
490
+ - `remove_unused_columns`: True
491
+ - `label_names`: None
492
+ - `load_best_model_at_end`: True
493
+ - `ignore_data_skip`: False
494
+ - `fsdp`: []
495
+ - `fsdp_min_num_params`: 0
496
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
497
+ - `fsdp_transformer_layer_cls_to_wrap`: None
498
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
499
+ - `deepspeed`: None
500
+ - `label_smoothing_factor`: 0.0
501
+ - `optim`: adamw_torch
502
+ - `optim_args`: None
503
+ - `adafactor`: False
504
+ - `group_by_length`: False
505
+ - `length_column_name`: length
506
+ - `ddp_find_unused_parameters`: None
507
+ - `ddp_bucket_cap_mb`: None
508
+ - `ddp_broadcast_buffers`: False
509
+ - `dataloader_pin_memory`: True
510
+ - `dataloader_persistent_workers`: False
511
+ - `skip_memory_metrics`: True
512
+ - `use_legacy_prediction_loop`: False
513
+ - `push_to_hub`: False
514
+ - `resume_from_checkpoint`: None
515
+ - `hub_model_id`: None
516
+ - `hub_strategy`: every_save
517
+ - `hub_private_repo`: None
518
+ - `hub_always_push`: False
519
+ - `gradient_checkpointing`: False
520
+ - `gradient_checkpointing_kwargs`: None
521
+ - `include_inputs_for_metrics`: False
522
+ - `include_for_metrics`: []
523
+ - `eval_do_concat_batches`: True
524
+ - `fp16_backend`: auto
525
+ - `push_to_hub_model_id`: None
526
+ - `push_to_hub_organization`: None
527
+ - `mp_parameters`:
528
+ - `auto_find_batch_size`: False
529
+ - `full_determinism`: False
530
+ - `torchdynamo`: None
531
+ - `ray_scope`: last
532
+ - `ddp_timeout`: 1800
533
+ - `torch_compile`: False
534
+ - `torch_compile_backend`: None
535
+ - `torch_compile_mode`: None
536
+ - `dispatch_batches`: None
537
+ - `split_batches`: None
538
+ - `include_tokens_per_second`: False
539
+ - `include_num_input_tokens_seen`: False
540
+ - `neftune_noise_alpha`: None
541
+ - `optim_target_modules`: None
542
+ - `batch_eval_metrics`: False
543
+ - `eval_on_start`: False
544
+ - `use_liger_kernel`: False
545
+ - `eval_use_gather_object`: False
546
+ - `average_tokens_across_devices`: False
547
+ - `prompts`: None
548
+ - `batch_sampler`: no_duplicates
549
+ - `multi_dataset_batch_sampler`: proportional
550
+
551
+ </details>
552
+
553
+ ### Training Logs
554
+ | Epoch | Step | Training Loss | Validation Loss | cosine_accuracy |
555
+ |:-------:|:-------:|:-------------:|:---------------:|:---------------:|
556
+ | 1.0 | 50 | - | 4.3018 | 0.9800 |
557
+ | 2.0 | 100 | 4.3737 | 4.1720 | 0.9850 |
558
+ | **3.0** | **150** | **-** | **4.1523** | **0.98** |
559
+
560
+ * The bold row denotes the saved checkpoint.
561
+
562
+ ### Framework Versions
563
+ - Python: 3.12.6
564
+ - Sentence Transformers: 4.1.0
565
+ - Transformers: 4.48.0.dev0
566
+ - PyTorch: 2.6.0+cu124
567
+ - Accelerate: 1.2.1
568
+ - Datasets: 3.1.0
569
+ - Tokenizers: 0.21.0
570
+
571
+ ## Citation
572
+
573
+ ### BibTeX
574
+
575
+ #### Sentence Transformers
576
+ ```bibtex
577
+ @inproceedings{reimers-2019-sentence-bert,
578
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
579
+ author = "Reimers, Nils and Gurevych, Iryna",
580
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
581
+ month = "11",
582
+ year = "2019",
583
+ publisher = "Association for Computational Linguistics",
584
+ url = "https://arxiv.org/abs/1908.10084",
585
+ }
586
+ ```
587
+
588
+ #### TripletLoss
589
+ ```bibtex
590
+ @misc{hermans2017defense,
591
+ title={In Defense of the Triplet Loss for Person Re-Identification},
592
+ author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
593
+ year={2017},
594
+ eprint={1703.07737},
595
+ archivePrefix={arXiv},
596
+ primaryClass={cs.CV}
597
+ }
598
+ ```
599
+
600
+ <!--
601
+ ## Glossary
602
+
603
+ *Clearly define terms in order to be accessible across audiences.*
604
+ -->
605
+
606
+ <!--
607
+ ## Model Card Authors
608
+
609
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
610
+ -->
611
+
612
+ <!--
613
+ ## Model Card Contact
614
+
615
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
616
+ -->
config.json ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "nomic-ai/modernbert-embed-base",
3
+ "architectures": [
4
+ "ModernBertModel"
5
+ ],
6
+ "attention_bias": false,
7
+ "attention_dropout": 0.0,
8
+ "bos_token_id": 50281,
9
+ "classifier_activation": "gelu",
10
+ "classifier_bias": false,
11
+ "classifier_dropout": 0.0,
12
+ "classifier_pooling": "mean",
13
+ "cls_token_id": 50281,
14
+ "decoder_bias": true,
15
+ "deterministic_flash_attn": false,
16
+ "embedding_dropout": 0.0,
17
+ "eos_token_id": 50282,
18
+ "global_attn_every_n_layers": 3,
19
+ "global_rope_theta": 160000.0,
20
+ "gradient_checkpointing": false,
21
+ "hidden_activation": "gelu",
22
+ "hidden_size": 768,
23
+ "initializer_cutoff_factor": 2.0,
24
+ "initializer_range": 0.02,
25
+ "intermediate_size": 1152,
26
+ "layer_norm_eps": 1e-05,
27
+ "local_attention": 128,
28
+ "local_rope_theta": 10000.0,
29
+ "max_position_embeddings": 8192,
30
+ "mlp_bias": false,
31
+ "mlp_dropout": 0.0,
32
+ "model_type": "modernbert",
33
+ "norm_bias": false,
34
+ "norm_eps": 1e-05,
35
+ "num_attention_heads": 12,
36
+ "num_hidden_layers": 22,
37
+ "pad_token_id": 50283,
38
+ "position_embedding_type": "absolute",
39
+ "reference_compile": true,
40
+ "sep_token_id": 50282,
41
+ "sparse_pred_ignore_index": -100,
42
+ "sparse_prediction": false,
43
+ "torch_dtype": "float32",
44
+ "transformers_version": "4.48.0.dev0",
45
+ "vocab_size": 50368
46
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "4.1.0",
4
+ "transformers": "4.48.0.dev0",
5
+ "pytorch": "2.6.0+cu124"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "cosine"
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c9f6ebe6e62e7f09894ddae37e5350c67f3a83ad8f585cd9d4463cda3139c8df
3
+ size 596070136
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 8192,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": true,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,945 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "|||IP_ADDRESS|||",
5
+ "lstrip": false,
6
+ "normalized": true,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": false
10
+ },
11
+ "1": {
12
+ "content": "<|padding|>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "50254": {
20
+ "content": " ",
21
+ "lstrip": false,
22
+ "normalized": true,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": false
26
+ },
27
+ "50255": {
28
+ "content": " ",
29
+ "lstrip": false,
30
+ "normalized": true,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": false
34
+ },
35
+ "50256": {
36
+ "content": " ",
37
+ "lstrip": false,
38
+ "normalized": true,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": false
42
+ },
43
+ "50257": {
44
+ "content": " ",
45
+ "lstrip": false,
46
+ "normalized": true,
47
+ "rstrip": false,
48
+ "single_word": false,
49
+ "special": false
50
+ },
51
+ "50258": {
52
+ "content": " ",
53
+ "lstrip": false,
54
+ "normalized": true,
55
+ "rstrip": false,
56
+ "single_word": false,
57
+ "special": false
58
+ },
59
+ "50259": {
60
+ "content": " ",
61
+ "lstrip": false,
62
+ "normalized": true,
63
+ "rstrip": false,
64
+ "single_word": false,
65
+ "special": false
66
+ },
67
+ "50260": {
68
+ "content": " ",
69
+ "lstrip": false,
70
+ "normalized": true,
71
+ "rstrip": false,
72
+ "single_word": false,
73
+ "special": false
74
+ },
75
+ "50261": {
76
+ "content": " ",
77
+ "lstrip": false,
78
+ "normalized": true,
79
+ "rstrip": false,
80
+ "single_word": false,
81
+ "special": false
82
+ },
83
+ "50262": {
84
+ "content": " ",
85
+ "lstrip": false,
86
+ "normalized": true,
87
+ "rstrip": false,
88
+ "single_word": false,
89
+ "special": false
90
+ },
91
+ "50263": {
92
+ "content": " ",
93
+ "lstrip": false,
94
+ "normalized": true,
95
+ "rstrip": false,
96
+ "single_word": false,
97
+ "special": false
98
+ },
99
+ "50264": {
100
+ "content": " ",
101
+ "lstrip": false,
102
+ "normalized": true,
103
+ "rstrip": false,
104
+ "single_word": false,
105
+ "special": false
106
+ },
107
+ "50265": {
108
+ "content": " ",
109
+ "lstrip": false,
110
+ "normalized": true,
111
+ "rstrip": false,
112
+ "single_word": false,
113
+ "special": false
114
+ },
115
+ "50266": {
116
+ "content": " ",
117
+ "lstrip": false,
118
+ "normalized": true,
119
+ "rstrip": false,
120
+ "single_word": false,
121
+ "special": false
122
+ },
123
+ "50267": {
124
+ "content": " ",
125
+ "lstrip": false,
126
+ "normalized": true,
127
+ "rstrip": false,
128
+ "single_word": false,
129
+ "special": false
130
+ },
131
+ "50268": {
132
+ "content": " ",
133
+ "lstrip": false,
134
+ "normalized": true,
135
+ "rstrip": false,
136
+ "single_word": false,
137
+ "special": false
138
+ },
139
+ "50269": {
140
+ "content": " ",
141
+ "lstrip": false,
142
+ "normalized": true,
143
+ "rstrip": false,
144
+ "single_word": false,
145
+ "special": false
146
+ },
147
+ "50270": {
148
+ "content": " ",
149
+ "lstrip": false,
150
+ "normalized": true,
151
+ "rstrip": false,
152
+ "single_word": false,
153
+ "special": false
154
+ },
155
+ "50271": {
156
+ "content": " ",
157
+ "lstrip": false,
158
+ "normalized": true,
159
+ "rstrip": false,
160
+ "single_word": false,
161
+ "special": false
162
+ },
163
+ "50272": {
164
+ "content": " ",
165
+ "lstrip": false,
166
+ "normalized": true,
167
+ "rstrip": false,
168
+ "single_word": false,
169
+ "special": false
170
+ },
171
+ "50273": {
172
+ "content": " ",
173
+ "lstrip": false,
174
+ "normalized": true,
175
+ "rstrip": false,
176
+ "single_word": false,
177
+ "special": false
178
+ },
179
+ "50274": {
180
+ "content": " ",
181
+ "lstrip": false,
182
+ "normalized": true,
183
+ "rstrip": false,
184
+ "single_word": false,
185
+ "special": false
186
+ },
187
+ "50275": {
188
+ "content": " ",
189
+ "lstrip": false,
190
+ "normalized": true,
191
+ "rstrip": false,
192
+ "single_word": false,
193
+ "special": false
194
+ },
195
+ "50276": {
196
+ "content": " ",
197
+ "lstrip": false,
198
+ "normalized": true,
199
+ "rstrip": false,
200
+ "single_word": false,
201
+ "special": false
202
+ },
203
+ "50277": {
204
+ "content": "|||EMAIL_ADDRESS|||",
205
+ "lstrip": false,
206
+ "normalized": true,
207
+ "rstrip": false,
208
+ "single_word": false,
209
+ "special": false
210
+ },
211
+ "50278": {
212
+ "content": "|||PHONE_NUMBER|||",
213
+ "lstrip": false,
214
+ "normalized": true,
215
+ "rstrip": false,
216
+ "single_word": false,
217
+ "special": false
218
+ },
219
+ "50279": {
220
+ "content": "<|endoftext|>",
221
+ "lstrip": false,
222
+ "normalized": false,
223
+ "rstrip": false,
224
+ "single_word": false,
225
+ "special": true
226
+ },
227
+ "50280": {
228
+ "content": "[UNK]",
229
+ "lstrip": false,
230
+ "normalized": false,
231
+ "rstrip": false,
232
+ "single_word": false,
233
+ "special": true
234
+ },
235
+ "50281": {
236
+ "content": "[CLS]",
237
+ "lstrip": false,
238
+ "normalized": false,
239
+ "rstrip": false,
240
+ "single_word": false,
241
+ "special": true
242
+ },
243
+ "50282": {
244
+ "content": "[SEP]",
245
+ "lstrip": false,
246
+ "normalized": false,
247
+ "rstrip": false,
248
+ "single_word": false,
249
+ "special": true
250
+ },
251
+ "50283": {
252
+ "content": "[PAD]",
253
+ "lstrip": false,
254
+ "normalized": false,
255
+ "rstrip": false,
256
+ "single_word": false,
257
+ "special": true
258
+ },
259
+ "50284": {
260
+ "content": "[MASK]",
261
+ "lstrip": true,
262
+ "normalized": false,
263
+ "rstrip": false,
264
+ "single_word": false,
265
+ "special": true
266
+ },
267
+ "50285": {
268
+ "content": "[unused0]",
269
+ "lstrip": false,
270
+ "normalized": true,
271
+ "rstrip": false,
272
+ "single_word": false,
273
+ "special": false
274
+ },
275
+ "50286": {
276
+ "content": "[unused1]",
277
+ "lstrip": false,
278
+ "normalized": true,
279
+ "rstrip": false,
280
+ "single_word": false,
281
+ "special": false
282
+ },
283
+ "50287": {
284
+ "content": "[unused2]",
285
+ "lstrip": false,
286
+ "normalized": true,
287
+ "rstrip": false,
288
+ "single_word": false,
289
+ "special": false
290
+ },
291
+ "50288": {
292
+ "content": "[unused3]",
293
+ "lstrip": false,
294
+ "normalized": true,
295
+ "rstrip": false,
296
+ "single_word": false,
297
+ "special": false
298
+ },
299
+ "50289": {
300
+ "content": "[unused4]",
301
+ "lstrip": false,
302
+ "normalized": true,
303
+ "rstrip": false,
304
+ "single_word": false,
305
+ "special": false
306
+ },
307
+ "50290": {
308
+ "content": "[unused5]",
309
+ "lstrip": false,
310
+ "normalized": true,
311
+ "rstrip": false,
312
+ "single_word": false,
313
+ "special": false
314
+ },
315
+ "50291": {
316
+ "content": "[unused6]",
317
+ "lstrip": false,
318
+ "normalized": true,
319
+ "rstrip": false,
320
+ "single_word": false,
321
+ "special": false
322
+ },
323
+ "50292": {
324
+ "content": "[unused7]",
325
+ "lstrip": false,
326
+ "normalized": true,
327
+ "rstrip": false,
328
+ "single_word": false,
329
+ "special": false
330
+ },
331
+ "50293": {
332
+ "content": "[unused8]",
333
+ "lstrip": false,
334
+ "normalized": true,
335
+ "rstrip": false,
336
+ "single_word": false,
337
+ "special": false
338
+ },
339
+ "50294": {
340
+ "content": "[unused9]",
341
+ "lstrip": false,
342
+ "normalized": true,
343
+ "rstrip": false,
344
+ "single_word": false,
345
+ "special": false
346
+ },
347
+ "50295": {
348
+ "content": "[unused10]",
349
+ "lstrip": false,
350
+ "normalized": true,
351
+ "rstrip": false,
352
+ "single_word": false,
353
+ "special": false
354
+ },
355
+ "50296": {
356
+ "content": "[unused11]",
357
+ "lstrip": false,
358
+ "normalized": true,
359
+ "rstrip": false,
360
+ "single_word": false,
361
+ "special": false
362
+ },
363
+ "50297": {
364
+ "content": "[unused12]",
365
+ "lstrip": false,
366
+ "normalized": true,
367
+ "rstrip": false,
368
+ "single_word": false,
369
+ "special": false
370
+ },
371
+ "50298": {
372
+ "content": "[unused13]",
373
+ "lstrip": false,
374
+ "normalized": true,
375
+ "rstrip": false,
376
+ "single_word": false,
377
+ "special": false
378
+ },
379
+ "50299": {
380
+ "content": "[unused14]",
381
+ "lstrip": false,
382
+ "normalized": true,
383
+ "rstrip": false,
384
+ "single_word": false,
385
+ "special": false
386
+ },
387
+ "50300": {
388
+ "content": "[unused15]",
389
+ "lstrip": false,
390
+ "normalized": true,
391
+ "rstrip": false,
392
+ "single_word": false,
393
+ "special": false
394
+ },
395
+ "50301": {
396
+ "content": "[unused16]",
397
+ "lstrip": false,
398
+ "normalized": true,
399
+ "rstrip": false,
400
+ "single_word": false,
401
+ "special": false
402
+ },
403
+ "50302": {
404
+ "content": "[unused17]",
405
+ "lstrip": false,
406
+ "normalized": true,
407
+ "rstrip": false,
408
+ "single_word": false,
409
+ "special": false
410
+ },
411
+ "50303": {
412
+ "content": "[unused18]",
413
+ "lstrip": false,
414
+ "normalized": true,
415
+ "rstrip": false,
416
+ "single_word": false,
417
+ "special": false
418
+ },
419
+ "50304": {
420
+ "content": "[unused19]",
421
+ "lstrip": false,
422
+ "normalized": true,
423
+ "rstrip": false,
424
+ "single_word": false,
425
+ "special": false
426
+ },
427
+ "50305": {
428
+ "content": "[unused20]",
429
+ "lstrip": false,
430
+ "normalized": true,
431
+ "rstrip": false,
432
+ "single_word": false,
433
+ "special": false
434
+ },
435
+ "50306": {
436
+ "content": "[unused21]",
437
+ "lstrip": false,
438
+ "normalized": true,
439
+ "rstrip": false,
440
+ "single_word": false,
441
+ "special": false
442
+ },
443
+ "50307": {
444
+ "content": "[unused22]",
445
+ "lstrip": false,
446
+ "normalized": true,
447
+ "rstrip": false,
448
+ "single_word": false,
449
+ "special": false
450
+ },
451
+ "50308": {
452
+ "content": "[unused23]",
453
+ "lstrip": false,
454
+ "normalized": true,
455
+ "rstrip": false,
456
+ "single_word": false,
457
+ "special": false
458
+ },
459
+ "50309": {
460
+ "content": "[unused24]",
461
+ "lstrip": false,
462
+ "normalized": true,
463
+ "rstrip": false,
464
+ "single_word": false,
465
+ "special": false
466
+ },
467
+ "50310": {
468
+ "content": "[unused25]",
469
+ "lstrip": false,
470
+ "normalized": true,
471
+ "rstrip": false,
472
+ "single_word": false,
473
+ "special": false
474
+ },
475
+ "50311": {
476
+ "content": "[unused26]",
477
+ "lstrip": false,
478
+ "normalized": true,
479
+ "rstrip": false,
480
+ "single_word": false,
481
+ "special": false
482
+ },
483
+ "50312": {
484
+ "content": "[unused27]",
485
+ "lstrip": false,
486
+ "normalized": true,
487
+ "rstrip": false,
488
+ "single_word": false,
489
+ "special": false
490
+ },
491
+ "50313": {
492
+ "content": "[unused28]",
493
+ "lstrip": false,
494
+ "normalized": true,
495
+ "rstrip": false,
496
+ "single_word": false,
497
+ "special": false
498
+ },
499
+ "50314": {
500
+ "content": "[unused29]",
501
+ "lstrip": false,
502
+ "normalized": true,
503
+ "rstrip": false,
504
+ "single_word": false,
505
+ "special": false
506
+ },
507
+ "50315": {
508
+ "content": "[unused30]",
509
+ "lstrip": false,
510
+ "normalized": true,
511
+ "rstrip": false,
512
+ "single_word": false,
513
+ "special": false
514
+ },
515
+ "50316": {
516
+ "content": "[unused31]",
517
+ "lstrip": false,
518
+ "normalized": true,
519
+ "rstrip": false,
520
+ "single_word": false,
521
+ "special": false
522
+ },
523
+ "50317": {
524
+ "content": "[unused32]",
525
+ "lstrip": false,
526
+ "normalized": true,
527
+ "rstrip": false,
528
+ "single_word": false,
529
+ "special": false
530
+ },
531
+ "50318": {
532
+ "content": "[unused33]",
533
+ "lstrip": false,
534
+ "normalized": true,
535
+ "rstrip": false,
536
+ "single_word": false,
537
+ "special": false
538
+ },
539
+ "50319": {
540
+ "content": "[unused34]",
541
+ "lstrip": false,
542
+ "normalized": true,
543
+ "rstrip": false,
544
+ "single_word": false,
545
+ "special": false
546
+ },
547
+ "50320": {
548
+ "content": "[unused35]",
549
+ "lstrip": false,
550
+ "normalized": true,
551
+ "rstrip": false,
552
+ "single_word": false,
553
+ "special": false
554
+ },
555
+ "50321": {
556
+ "content": "[unused36]",
557
+ "lstrip": false,
558
+ "normalized": true,
559
+ "rstrip": false,
560
+ "single_word": false,
561
+ "special": false
562
+ },
563
+ "50322": {
564
+ "content": "[unused37]",
565
+ "lstrip": false,
566
+ "normalized": true,
567
+ "rstrip": false,
568
+ "single_word": false,
569
+ "special": false
570
+ },
571
+ "50323": {
572
+ "content": "[unused38]",
573
+ "lstrip": false,
574
+ "normalized": true,
575
+ "rstrip": false,
576
+ "single_word": false,
577
+ "special": false
578
+ },
579
+ "50324": {
580
+ "content": "[unused39]",
581
+ "lstrip": false,
582
+ "normalized": true,
583
+ "rstrip": false,
584
+ "single_word": false,
585
+ "special": false
586
+ },
587
+ "50325": {
588
+ "content": "[unused40]",
589
+ "lstrip": false,
590
+ "normalized": true,
591
+ "rstrip": false,
592
+ "single_word": false,
593
+ "special": false
594
+ },
595
+ "50326": {
596
+ "content": "[unused41]",
597
+ "lstrip": false,
598
+ "normalized": true,
599
+ "rstrip": false,
600
+ "single_word": false,
601
+ "special": false
602
+ },
603
+ "50327": {
604
+ "content": "[unused42]",
605
+ "lstrip": false,
606
+ "normalized": true,
607
+ "rstrip": false,
608
+ "single_word": false,
609
+ "special": false
610
+ },
611
+ "50328": {
612
+ "content": "[unused43]",
613
+ "lstrip": false,
614
+ "normalized": true,
615
+ "rstrip": false,
616
+ "single_word": false,
617
+ "special": false
618
+ },
619
+ "50329": {
620
+ "content": "[unused44]",
621
+ "lstrip": false,
622
+ "normalized": true,
623
+ "rstrip": false,
624
+ "single_word": false,
625
+ "special": false
626
+ },
627
+ "50330": {
628
+ "content": "[unused45]",
629
+ "lstrip": false,
630
+ "normalized": true,
631
+ "rstrip": false,
632
+ "single_word": false,
633
+ "special": false
634
+ },
635
+ "50331": {
636
+ "content": "[unused46]",
637
+ "lstrip": false,
638
+ "normalized": true,
639
+ "rstrip": false,
640
+ "single_word": false,
641
+ "special": false
642
+ },
643
+ "50332": {
644
+ "content": "[unused47]",
645
+ "lstrip": false,
646
+ "normalized": true,
647
+ "rstrip": false,
648
+ "single_word": false,
649
+ "special": false
650
+ },
651
+ "50333": {
652
+ "content": "[unused48]",
653
+ "lstrip": false,
654
+ "normalized": true,
655
+ "rstrip": false,
656
+ "single_word": false,
657
+ "special": false
658
+ },
659
+ "50334": {
660
+ "content": "[unused49]",
661
+ "lstrip": false,
662
+ "normalized": true,
663
+ "rstrip": false,
664
+ "single_word": false,
665
+ "special": false
666
+ },
667
+ "50335": {
668
+ "content": "[unused50]",
669
+ "lstrip": false,
670
+ "normalized": true,
671
+ "rstrip": false,
672
+ "single_word": false,
673
+ "special": false
674
+ },
675
+ "50336": {
676
+ "content": "[unused51]",
677
+ "lstrip": false,
678
+ "normalized": true,
679
+ "rstrip": false,
680
+ "single_word": false,
681
+ "special": false
682
+ },
683
+ "50337": {
684
+ "content": "[unused52]",
685
+ "lstrip": false,
686
+ "normalized": true,
687
+ "rstrip": false,
688
+ "single_word": false,
689
+ "special": false
690
+ },
691
+ "50338": {
692
+ "content": "[unused53]",
693
+ "lstrip": false,
694
+ "normalized": true,
695
+ "rstrip": false,
696
+ "single_word": false,
697
+ "special": false
698
+ },
699
+ "50339": {
700
+ "content": "[unused54]",
701
+ "lstrip": false,
702
+ "normalized": true,
703
+ "rstrip": false,
704
+ "single_word": false,
705
+ "special": false
706
+ },
707
+ "50340": {
708
+ "content": "[unused55]",
709
+ "lstrip": false,
710
+ "normalized": true,
711
+ "rstrip": false,
712
+ "single_word": false,
713
+ "special": false
714
+ },
715
+ "50341": {
716
+ "content": "[unused56]",
717
+ "lstrip": false,
718
+ "normalized": true,
719
+ "rstrip": false,
720
+ "single_word": false,
721
+ "special": false
722
+ },
723
+ "50342": {
724
+ "content": "[unused57]",
725
+ "lstrip": false,
726
+ "normalized": true,
727
+ "rstrip": false,
728
+ "single_word": false,
729
+ "special": false
730
+ },
731
+ "50343": {
732
+ "content": "[unused58]",
733
+ "lstrip": false,
734
+ "normalized": true,
735
+ "rstrip": false,
736
+ "single_word": false,
737
+ "special": false
738
+ },
739
+ "50344": {
740
+ "content": "[unused59]",
741
+ "lstrip": false,
742
+ "normalized": true,
743
+ "rstrip": false,
744
+ "single_word": false,
745
+ "special": false
746
+ },
747
+ "50345": {
748
+ "content": "[unused60]",
749
+ "lstrip": false,
750
+ "normalized": true,
751
+ "rstrip": false,
752
+ "single_word": false,
753
+ "special": false
754
+ },
755
+ "50346": {
756
+ "content": "[unused61]",
757
+ "lstrip": false,
758
+ "normalized": true,
759
+ "rstrip": false,
760
+ "single_word": false,
761
+ "special": false
762
+ },
763
+ "50347": {
764
+ "content": "[unused62]",
765
+ "lstrip": false,
766
+ "normalized": true,
767
+ "rstrip": false,
768
+ "single_word": false,
769
+ "special": false
770
+ },
771
+ "50348": {
772
+ "content": "[unused63]",
773
+ "lstrip": false,
774
+ "normalized": true,
775
+ "rstrip": false,
776
+ "single_word": false,
777
+ "special": false
778
+ },
779
+ "50349": {
780
+ "content": "[unused64]",
781
+ "lstrip": false,
782
+ "normalized": true,
783
+ "rstrip": false,
784
+ "single_word": false,
785
+ "special": false
786
+ },
787
+ "50350": {
788
+ "content": "[unused65]",
789
+ "lstrip": false,
790
+ "normalized": true,
791
+ "rstrip": false,
792
+ "single_word": false,
793
+ "special": false
794
+ },
795
+ "50351": {
796
+ "content": "[unused66]",
797
+ "lstrip": false,
798
+ "normalized": true,
799
+ "rstrip": false,
800
+ "single_word": false,
801
+ "special": false
802
+ },
803
+ "50352": {
804
+ "content": "[unused67]",
805
+ "lstrip": false,
806
+ "normalized": true,
807
+ "rstrip": false,
808
+ "single_word": false,
809
+ "special": false
810
+ },
811
+ "50353": {
812
+ "content": "[unused68]",
813
+ "lstrip": false,
814
+ "normalized": true,
815
+ "rstrip": false,
816
+ "single_word": false,
817
+ "special": false
818
+ },
819
+ "50354": {
820
+ "content": "[unused69]",
821
+ "lstrip": false,
822
+ "normalized": true,
823
+ "rstrip": false,
824
+ "single_word": false,
825
+ "special": false
826
+ },
827
+ "50355": {
828
+ "content": "[unused70]",
829
+ "lstrip": false,
830
+ "normalized": true,
831
+ "rstrip": false,
832
+ "single_word": false,
833
+ "special": false
834
+ },
835
+ "50356": {
836
+ "content": "[unused71]",
837
+ "lstrip": false,
838
+ "normalized": true,
839
+ "rstrip": false,
840
+ "single_word": false,
841
+ "special": false
842
+ },
843
+ "50357": {
844
+ "content": "[unused72]",
845
+ "lstrip": false,
846
+ "normalized": true,
847
+ "rstrip": false,
848
+ "single_word": false,
849
+ "special": false
850
+ },
851
+ "50358": {
852
+ "content": "[unused73]",
853
+ "lstrip": false,
854
+ "normalized": true,
855
+ "rstrip": false,
856
+ "single_word": false,
857
+ "special": false
858
+ },
859
+ "50359": {
860
+ "content": "[unused74]",
861
+ "lstrip": false,
862
+ "normalized": true,
863
+ "rstrip": false,
864
+ "single_word": false,
865
+ "special": false
866
+ },
867
+ "50360": {
868
+ "content": "[unused75]",
869
+ "lstrip": false,
870
+ "normalized": true,
871
+ "rstrip": false,
872
+ "single_word": false,
873
+ "special": false
874
+ },
875
+ "50361": {
876
+ "content": "[unused76]",
877
+ "lstrip": false,
878
+ "normalized": true,
879
+ "rstrip": false,
880
+ "single_word": false,
881
+ "special": false
882
+ },
883
+ "50362": {
884
+ "content": "[unused77]",
885
+ "lstrip": false,
886
+ "normalized": true,
887
+ "rstrip": false,
888
+ "single_word": false,
889
+ "special": false
890
+ },
891
+ "50363": {
892
+ "content": "[unused78]",
893
+ "lstrip": false,
894
+ "normalized": true,
895
+ "rstrip": false,
896
+ "single_word": false,
897
+ "special": false
898
+ },
899
+ "50364": {
900
+ "content": "[unused79]",
901
+ "lstrip": false,
902
+ "normalized": true,
903
+ "rstrip": false,
904
+ "single_word": false,
905
+ "special": false
906
+ },
907
+ "50365": {
908
+ "content": "[unused80]",
909
+ "lstrip": false,
910
+ "normalized": true,
911
+ "rstrip": false,
912
+ "single_word": false,
913
+ "special": false
914
+ },
915
+ "50366": {
916
+ "content": "[unused81]",
917
+ "lstrip": false,
918
+ "normalized": true,
919
+ "rstrip": false,
920
+ "single_word": false,
921
+ "special": false
922
+ },
923
+ "50367": {
924
+ "content": "[unused82]",
925
+ "lstrip": false,
926
+ "normalized": true,
927
+ "rstrip": false,
928
+ "single_word": false,
929
+ "special": false
930
+ }
931
+ },
932
+ "clean_up_tokenization_spaces": true,
933
+ "cls_token": "[CLS]",
934
+ "extra_special_tokens": {},
935
+ "mask_token": "[MASK]",
936
+ "model_input_names": [
937
+ "input_ids",
938
+ "attention_mask"
939
+ ],
940
+ "model_max_length": 8192,
941
+ "pad_token": "[PAD]",
942
+ "sep_token": "[SEP]",
943
+ "tokenizer_class": "PreTrainedTokenizerFast",
944
+ "unk_token": "[UNK]"
945
+ }