Write some pre processing code that step through the source text file and work out the offsets of the strings in the raw output data. So the output data has a header containing the number of strings say, followed by a list of (32bit) offsets to the first character of the string in the data heap. Depending upon the language you could store the size of the string at this offset followed by the raw characters, or store it null terminated.. Bellow i've assume they'd be null terminated..
Here's how such a structure might when it cotains 2 strings.
Header (0) = Number of Strings
StringOffset(4) = Offset of String 1 (20 bytes from start of mem block to first chr of string)
StringOffset(8) = Offset of String 2 (26 bytes from start of mem block to first chr of string)
StringData(20) = "Hello"
StringData(26) = "World"
When converting the text data you can have it removed duplicates for example. Or token the text and store them as list of tokens, but that's a bit more stuffing around with the data up front. You ould do the string operations in memblocks also, as DBpro string oeprations aren't that quick.