What is the structure of the sent.model file inside the corpusEN.bin zip archive?
It's a strikingly small file for something called corpus. Say I have a larger corpus, or a corpus in a different language, how would I go about building one of these sent.model files with more data?
The corpusEN.bin file is the training data provided by OpenNLP which I used to split sentences (http://opennlp.sourceforge.net/models-1.5/). It's not the training data used for summarization.
What is the structure of the sent.model file inside the corpusEN.bin zip archive?
It's a strikingly small file for something called corpus. Say I have a larger corpus, or a corpus in a different language, how would I go about building one of these sent.model files with more data?