Buradasın
New spaCy Turkish Models Overview
medium.com/google-developer-experts/brand-new-spacy-turkish-models-304da649eaccYapay zekadan makale özeti
- Model Variants
- Three Turkish spaCy models available: tr_core_web_md, tr_core_web_lg, and tr_core_web_trf
- tr_core_web_lg is CNN-based with good accuracy and speed
- tr_core_web_md uses smaller vectors and same architecture
- tr_core_web_trf is Transformer-based with high accuracy
- Key Components
- All models include tokenizer, lemmatizer, POS tagger, dependency parser, morphologizer, and NER
- Turkish models trained on Turkish Wiki NER Dataset with 19 tags
- Turkish BOUN Treebank used for training parser, tagger, and morphologizer components
- Technical Implementation
- Turkish morphology requires subword strategies and Floret vectors
- Transformers decompose words into parts for better representation
- Training done on Google Cloud with c2-standard-16 instance
- Models available for download on Hugging Face