chariot
Deliver the ready-to-train data to your NLP model.
Introduction
- Prepare Dataset
- You can prepare typical NLP datasets through the chazutsu.
- Build & Run Preprocess
- You can build the preprocess pipeline like scikit-learn Pipeline.
- Preprocesses for each dataset column are executed in parallel by Joblib.
- Multi-language text tokenization is supported by spaCy.
- Format Batch
- Sampling a batch from preprocessed dataset and format it to train the model (padding etc).
- You can use pre-trained word vectors through the chakin
chariot enables you to concentrate on training your model!