Creating Tokenizer using OpenNLP

Here is small snippet for tokenization using OpenNLP. Of course, you need to download tokenization model and place it in data folder under resources.


How to Configure Solr


Download the latest version of SOLR:

Or from the terminal by using wget in the terminal

Extract the shell script by executing the following command

Allow execute permission to the shell script by typing the following command

Now type the following command for running the solr service

Creating a New Core and Indexing Data:

You can create your own core by typing the following command in the terminal

Open the terminal and type the following command


collection_name = enter the name for the core

data_driven_schema_configs = name of the basic configuration file located in /opt/solr-6.6.0/server/solr/configsets/basic_configs

Now its time to index the data for the created collection. To add documents to the index, use bin/post. For example:

directory_name= Name for the directory which contain the data files. It will take time depend on your documents number so be patient.