chariot

Deliver the ready-to-train data to your NLP model.

Prepare Dataset
- You can prepare typical NLP datasets through the chazutsu.
Build & Run Preprocess
- You can build the preprocess pipeline like scikit-learn Pipeline.
- Preprocesses for each dataset column are executed in parallel by Joblib.
- Multi-language text tokenization is supported by spaCy.
Format Batch
- Sampling a batch from preprocessed dataset and format it to train the model (padding etc).
- You can use pre-trained word vectors through the chakin

chariot enables you to concentrate on training your model!

chariot flow