AI Tools.

Search

fill mask

bert-base-multilingual-cased

BERT-base-multilingual-cased is Google's multilingual BERT trained on 104-language Wikipedia data with case preserved, making it better suited than the uncased variant for named entity recognition and tasks where capitalization carries semantic meaning. It shares the same 12-layer Transformer architecture and 768-dimensional embedding space as BERT-base-uncased. Despite its age, it remains a common transfer learning starting point for multilingual tasks.

Last reviewed

Use cases

  • Multilingual named entity recognition where proper noun casing matters
  • Cross-lingual sequence labeling and part-of-speech tagging
  • Zero-shot classification across the 104 supported languages
  • Baseline transfer learning evaluation for low-resource language research

Pros

  • Preserves case information critical for NER performance across languages
  • Single model spans 104 languages with a shared vocabulary
  • Broadly supported across HuggingFace pipelines and downstream NLP libraries

Cons

  • Outperformed on nearly all tasks by XLM-RoBERTa-base and larger variants
  • Fixed 512-token limit is problematic for longer multilingual documents
  • Shared multilingual vocabulary dilutes effective token budget per language

FAQ

What is bert-base-multilingual-cased used for?

Multilingual named entity recognition where proper noun casing matters. Cross-lingual sequence labeling and part-of-speech tagging. Zero-shot classification across the 104 supported languages. Baseline transfer learning evaluation for low-resource language research.

Is bert-base-multilingual-cased free to use?

bert-base-multilingual-cased is an open-source model published on HuggingFace. License terms vary by model — check the model card for the specific license.

How do I run bert-base-multilingual-cased locally?

Most HuggingFace models can be loaded with transformers or the appropriate framework library. See the model card for framework-specific instructions and hardware requirements.

Tags

transformerspytorchtfjaxsafetensorsbertfill-maskmultilingualafsqaranhyastazbaeubarbebn