Based on the Langchain RecursiveCharacterTextSplitter, This text splitter is the recommended for generic text. It is parameterized by a list of characters. It tries to split on them in order until the chunks are small enough. The default list is ["\n\n", "\n", " ", ""]. This has the effect of trying to keep all paragraphs (and then sentences, and then words) together as long as possible, as those would generically seem to be the strongest semantically related pieces of text.Here's a brief overview of how it typically works:
Input Text: You provide the algorithm with a piece of text that you want to split.
Chunk Size : You provide the chunk size limit that the text will be split on
Splitting Criteria: The text will be split based on ["\n\n", "\n", " ", ""].
Recursive Process: The algorithm applies the splitting criteria recursively to the text. It repeatedly breaks down the text into smaller segments until a certain condition is met.
Output: Once the splitting process is complete, you get the split segments of the text as the output.
For example, if you're splitting text into words, the algorithm will recursively divide the text at each space character until no further splitting is possible.