[DBP] Supervised Machine Learning Functions: Naive Bayes Classifier (Easy-to-use!)

Author

Message

Sixty Squares

17

Years of Service

User Offline

Joined: 7th Jun 2006

Location: Somewhere in the world

Posted: 16th Aug 2013 07:39 Edited at: 16th Aug 2013 08:42

Link

NOTE: The code for the functions is at the bottom of this post. Examples are in the next post.

Hi all,

Long time no see!

I know this looks long and tedious, but trust me, it's not so bad. If you want to skip the theory just read the bits in bold and then look at the simple example in my next post. You should be ready to brainstorm ideas for your game in no time. In fact, here's a super quick sample demonstrating how little needs to be done/understood:

+ Code Snippet

` This program will tell you what usually comes after "Hello"!
#Constant NB_DATATYPE string
NB_DefaultSetup()

id = NB_NewClassifier()

DIM Data$(2)
Data$(1) = "Hello"
Data$(2) = "World!"
NB_AddTrainData(id, get arrayptr(Data$()))
Data$(1) = "Hello"
Data$(2) = "World!"
NB_AddTrainData(id, get arrayptr(Data$()))
Data$(1) = "Hello"
Data$(2) = "Moon"
NB_AddTrainData(id, get arrayptr(Data$()))
NB_Train(id)

NB_AddClassifyParam(id, 2)
NB_PrepareClassify(id)
Data$(1) = "Hello"
Data$(2) = "???"
NB_SetClassifyData(id, get arrayptr(Data$()))

second_word$ = NB_Classify(id, 2)

print "Hello ", second_word$
wait key
end

Before I begin, I'd like to say that everything in this post is based on my own understanding of the topics at hand. If anyone spots a flaw, please do point it out

.

I've been reading a bit about machine learning lately and decided to implement a Naive Bayes Classifier in DBP. This algorithm is a Supervised Learning Algorithm, meaning you need to provide it with some data to use as a base to learn from. Later I may implement an Unsupervised Learning Algorithm (Genetic Algorithm), whereby the computer learns via trial and error, which would probably be more useful and fun for gaming because even the game's creator wouldn't know what the computer would do next

. But for now, here's a simple and popular supervised algorithm.

What is a Naive Bayes Classifier?

In simplest terms, a Naive Bayes Classifier is something which, when given some features, will produce what it expects another feature to be based on data it has already seen. It does this by seeing how likely each option for the feature in question is given the known features, and then picking the most likely case. This process is entirely based on Bayes' Theorem, a formula describing the probability of some event A given an event B has occurred. We could also calculate it for A given B, C, D, etc, which is what the code in this post does. Anyway, it's Naive because it assumes that B, C, D, etc. do not depend on each other in any way. That is, B happening has nothing to do with C or D happening. This could be false (ex. if B is "lightning strikes a power line" and C is "power goes out", then these two events are clearly related). But we're naive and assume everything is independent because it makes the math nicer. This assumption can cause some problems (ex. if everything in your dataset depends completely on everything else), but usually it works out well. You "train" the Classifier on some data, it "learns" from the data and then interprets any new data you give it based on what the old data said.

But enough technical stuff. Here's a concrete example:

Suppose we want to know what spell the player will use next in a turn-based RPG game. We know that some spells are commonly used together, but we don't know which, and we don't want to sit there and think of all the possible combos the player might have come up with. What to do? Well, here are a few creatively named spell sequences that the player has used on us in the past.

Freeze Feet ---> Hover Boulder Over Head ---> Make Boulder Fall

Hover Boulder Over Head ---> Make Boulder Fall ---> Heal Self

Freeze Feet ---> Shatter Ice ---> Reassemble Ice

Freeze Feet ---> Hover Boulder Over Head ---> Heal Self

Freeze Feet ---> Hover Boulder Over Head ---> Make Boulder Fall

Freeze Feet ---> Shatter Ice --> Freeze Feet

Freeze Feet ---> Hover Boulder Over Head ---> Make Boulder Fall

Freeze Feet ---> Hover Boulder Over Head ---> Shatter Ice

Freeze Feet ---> Hover Boulder Over Head ---> Make Boulder Fall

Make Boulder Fall ---> Heal Self ---> Heal Self

Hmmm. Our player really likes freezing things.

Pretend our player just used Freeze Feet followed by Hover Boulder Over Head. What will he do next? Well, in the past when he froze our feet and hovered a boulder over our head, he made it fall on us. But he also used Heal Self or Shatter Ice. So which will he do this time? Well, in the 6 cases that he's used Freeze Feet followed by Hover Boulder Over Head, 4/6 of those times he used Make Boulder Fall afterwards. The other two times he used either Heal Self or Shatter Ice. But most of the time he just makes the boulder fall. So the AI assumes that's what he's doing this time and prepares for it by using Deflect Boulder.

Bam.

The boulder goes flying back at the player, he loses health and curses the AI for figuring out his favorite combo.

But what if the player used Heal Self followed by Hover Boulder Over Head? Well now we're in trouble. We've never seen that combo before. But here's where the probability element comes in. In the past when the second move was Hover Boulder Over Head, the player usually used Make Boulder Fall afterwards. Because of this, the Naive Bayes Classifier will still conclude that the player is using Make Boulder Fall next.

The problem arises in a case where the player uses, say, Freeze Feet ---> Heal Self. What will he do next? We haven't seen that before either. Since Freeze Feet was used first, the AI will still think the player is doing the boulder combo. This goes back to our assumption of independence. The AI doesn't know that using Make Boulder Fall depends on whether or not Hover Boulder Over Head was used, so it doesn't realize that its guess can't be accurate.

To get around this you might do a second classification. Given the player used Freeze Feet --> ???? --> Make Boulder Fall, what is ??? most likely to be? It'll probably say Hover Boulder Over Head. Since he didn't do that and used Heal Self instead, you know the combo is not exactly what the computer thinks it is. This is not a tried and true solution though-- I just thought of it

...

There is also a function for retrieving the actual probability obtained from the calculation that might come in handy.

What can I use this for?

Examples in gaming that I can think of:

Given the player's party is comprised of a Mage and 3 Clerics, what team composition should the AI use against it?

The player has selected a Warrior, added lots of Agility and equipped a sword. Based on things that I did while testing the game, what abilities will the player most likely select?

The player uses a Rocket Launcher as his main weapon, likes to spin around in circles and spends more than half of his time jumping in the air. Should the AI use landmines?

The AI has access to the same spells as the player and wants to use his combos against him. What spell does the player usually start with? What's his favorite combo from there?

How is the player going to defend against the AI's next attack? What does he think the AI is about to do?

The tic-tac-toe board looks just so. What should the AI do next? (I believe you could train an AI to play tic-tac-toe without even knowing the rules this way if you wanted to. I may try this later).

Examples outside of gaming that I can think of:

Spam filter. Given an email contains the phrases "FREE!", "BUY NOW!", "check out my page" and a tinyurl link, is it spam?

Determining the gender of a name based on endings.

Determining whether or not a picture contains a fruit.

Determining the type of bird based on its wingspan, flight speed, and rest period duration.

Basically anything that requires the classification of something based on the features it has. That's why it's called a Classifier.

The power here lies in your ability to pick good features! Be more creative than I am!

Actual Code Stuff

TODO

Better documentation

More examples

Clean up the code/comment it better

Fix any issues you guys discover

More error handling? Maybe?

How fast is it for a realtime game?: Not sure. Only one way to find out

Commands (I would go through the heavily commented simple example in my next post for a good overview of what's going on)

+ Code Snippet

NB_Setup(num_classifiers, num_training_points, num_training_params, num_param_values, num_classify_params)
NB_DefaultSetup()
id = NB_NewClassifier()
NB_SetPriors(m, p)
NB_AddTrainData(id, ptr)
NB_ClearTrainData(id)
NB_AddClassifyParam(id, param)
NB_SetClassifyData(id, ptr)
NB_Train(id)
NB_SaveTrainData(id, filename as string, write_func_name as string)
NB_LoadTrainData(id, filename as string, read_func_name as string)
NB_PrepareClassify(id)
NB_Classify(id, param_to_classify)
NB_GetLastProbability(id)

Code as of August 15, 2013 (Alpha) (Requires IanM's Matrix1 Utils) (Examples in next post). Just include the code with your project to use the functions.

Rem ***** Included Source File *****
remstart
******************************************************
Supervised Machine Learning: Naive Bayes Classifier
-Coded by Sixty Squares, 2013
******************************************************

Uses Bayes' Theorem with naive assumptions of independence to determine the most likely outcome, A, given
parameters/features B, C, D, E, etc.

remend
#Constant NB_INACTIVE -2
#Constant NB_ACTIVE -1
#Constant NB_MIN_INT -2147483648

Type NB_Classifiers_T
    active as integer
    m as integer
    p as float
    train_point as integer ` # of training data points
    train_param as integer ` # of train params
    classify_param as integer ` # of classify params
    last_prob_product as float ` The probability product from the last most likely classification.
Endtype

Type NB_TrainingData_T
    value as NB_DATATYPE
Endtype

Type NB_ClassifyParams_T
    train_param as integer
Endtype

Type NB_PossibleParamValues_T
    value as NB_DATATYPE
    times_param_is_class as integer
Endtype

Type NB_ParamData_T
    num_values as integer ` Number of possible parameter values
Endtype

function NB_DefaultSetup()
    num_classifiers = 2
    num_training_points = 100
    num_training_params = 5
    num_param_values = 20
    num_classify_params = 5
    NB_Setup(num_classifiers, num_training_points, num_training_params, num_param_values, num_classify_params)
endfunction

function NB_Setup(num_classifiers, num_training_points, num_training_params, num_param_values, num_classify_params)
    Global NB_NUM_POSSIBLE_PARAM_VALUES as integer : NB_NUM_POSSIBLE_PARAM_VALUES = num_param_values
    Global NB_NUM_CLASSIFIERS as integer : NB_NUM_CLASSIFIERS = num_classifiers
    Global NB_NUM_CLASSIFY_PARAMS as integer : NB_NUM_CLASSIFY_PARAMS = num_classify_params
    DIM NB_Classifiers(num_classifiers) as NB_Classifiers_T
    for c = 1 to num_classifiers
        NB_Classifiers(c).active = NB_INACTIVE
    next c
    DIM NB_TrainingData(num_classifiers, num_training_points, num_training_params) as NB_TrainingData_T
    DIM NB_PossibleParamValues(num_classifiers, num_training_params, num_param_values) as NB_PossibleParamValues_T
    DIM NB_ClassifyParams(num_classifiers, num_classify_params) as NB_ClassifyParams_T
    DIM NB_ParamData(num_classifiers, num_training_params) as NB_ParamData_T
endfunction

function NB_NewClassifier()
    id = NB_GetFreeClassifier()
    NB_Classifiers(id).active = NB_ACTIVE
    NB_Classifiers(id).train_point = 0
    NB_Classifiers(id).m = -1
    NB_Classifiers(id).p = -1
endfunction id

function NB_SetPriors(id, m as integer, p as float)
    NB_Classifiers(id).m = m
    NB_Classifiers(id).p = p
endfunction

function NB_AddTrainData(id, ptr)
    inc NB_Classifiers(id).train_point, 1
    DIM NB_Arr() as NB_DATATYPE
    link array NB_Arr(), ptr
    ac = array count(NB_Arr())
    NB_Classifiers(id).train_param = ac
    for i = 1 to ac
        NB_TrainingData(id, NB_Classifiers(id).train_point, i).value = NB_Arr(i)
    next i
    unlink array NB_Arr()
    UNDIM NB_Arr()
endfunction

function NB_ClearTrainData(id)
    NB_Classifiers(id).train_point = 0
endfunction

function NB_AddClassifyParam(id, param)
    ` Add a parameter from the training params to classify with
    inc NB_Classifiers(id).classify_param, 1
    NB_ClassifyParams(id, NB_Classifiers(id).classify_param).train_param = param
endfunction

function NB_ClearClassifyParams(id)
    NB_Classifiers(id).classify_param = 0
Endfunction

function NB_SetClassifyData(id, ptr)
    `Classify data is stored in the 0th index of training data
    DIM NB_Arr()
    link array NB_Arr(), ptr
    ac = array count(NB_Arr())
    for i = 1 to ac
        NB_TrainingData(id, 0, i).value = NB_Arr(i)
    next i
    unlink array NB_Arr()
    UNDIM NB_Arr()
endfunction

` Indexes all of the possible values for each training parameter.
function NB_Train(id)
    local class as NB_DATATYPE
    for tpoint = 1 to NB_Classifiers(id).train_point
        for param = 1 to NB_Classifiers(id).train_param
            ` Get param value for this data point
            class = NB_TrainingData(id, tpoint, param).value
            
            ` Check if we already know about it.
            found = 0
            max_class = NB_ParamData(id, param).num_values

for idx = 1 to max_class
                if NB_PossibleParamValues(id, param, idx).value = class
                    found = 1
                endif
            next idx
            
            ` If not, add it.
            if found = 0
                inc max_class, 1
                NB_PossibleParamValues(id, param, max_class).value = class
                NB_ParamData(id, param).num_values = max_class
            endif
        next param
    next tpoint
endfunction

function NB_SaveTrainData(id, filename as string, write_func_name as string)
    local write_ptr as dword
    local file as integer
    write_ptr = get ptr to function(write_func_name)
    file = find free datafile()
    if file exist(filename) then delete file filename
    open datafile to write file, filename
    `Write Metadata
    write datafile string file, "TRAIN_POINT"
    write datafile integer file, NB_Classifiers(id).train_point
    write datafile string file, "TRAIN_PARAM"
    write datafile integer file, NB_Classifiers(id).train_param
    write datafile string file, "METADATA_END"
    `Write actual training data
    for tpoint = 1 to NB_Classifiers(id).train_point
        for param = 1 to NB_Classifiers(id).train_param
            call function ptr write_ptr, file, NB_TrainingData(id, tpoint, param).value
        next param
    next tpoint
    close datafile file
endfunction

function NB_LoadTrainData(id, filename as string, read_func_name as string)
    read_ptr = get ptr to function(read_func_name)
    file = find free datafile()
    open datafile to read file, filename
    ` Read metadata
    repeat
        s$ = datafile string$(file)
        if s$ = "TRAIN_POINT"
            train_point = datafile integer(file)
        endif
        if s$ = "TRAIN_PARAM"
            train_param = datafile integer(file)
        endif
    until s$ = "METADATA_END"
    
    local dim NB_TEMP_DATA(train_param) as NB_DATATYPE
    arr_ptr = get arrayptr(NB_TEMP_DATA())
    local temp_dat as NB_DATATYPE
    
    ` Read in training data. Store in temp local array for storage.
    for tpoint = 1 to train_point
        for param = 1 to train_param
            temp_dat = call function ptr(read_ptr, file)
            NB_TEMP_DATA(param) = temp_dat
        next param
        NB_AddTrainData(id, arr_ptr)
    next tpoint
    close datafile file
endfunction

function NB_PrepareClassify(id)
    for c = 1 to NB_Classifiers(id).classify_param
        ` Get the parameter we are calculating for (i.e. Given our data to classify, what will PARAM be?)
        param = NB_ClassifyParams(id, c).train_param
        ` Iterate over each possible classification for this parameter
        local class as NB_DATATYPE
        for idx = 1 to NB_ParamData(id, param).num_values
            class = NB_PossibleParamValues(id, param, idx).value
            ` Calculate the number of times the parameter = this class
            times_param_is_class = 0
            for tpoint = 1 to NB_Classifiers(id).train_point
                if NB_TrainingData(id, tpoint, param).value = class
                    inc times_param_is_class
                endif
            next tpoint
            ` Store it for easy acces later when classifying
            NB_PossibleParamValues(id, param, idx).times_param_is_class = times_param_is_class
        next idx
    next c
endfunction

function NB_Classify(id, param_to_classify)
    local other_param_is_our_values as integer
    local prob_product as float
    local best_prob as float : best_prob = NB_MIN_INT * 1.0
    local times_param_is_class as integer
    local our_specific_case as integer
    local specific_case_product as float
    local m# as float : local p# as float

` Get the parameter we are calculating for. We want to know what it will be given our data.
    param = param_to_classify
    
    ` Get m-estimate values to take care of 0 probabilities.
    p# = NB_Classifiers(id).p
    m# = NB_Classifiers(id).m
    ` Default values for m and p if none specified.
    if p# < 0 : p# = 1.0 / (NB_ParamData(id, param).num_values*1.0) : endif
    if m# < 0 : m# = 2 : endif
               
    ` Iterate over each possible classification for this parameter
    local class as NB_DATATYPE
    local best_class as NB_DATATYPE
    for idx = 1 to NB_ParamData(id, param).num_values
        class = NB_PossibleParamValues(id, param, idx).value
        ` We want to calculate P(param_to_classify_value=class | [classification_data]). We can do this by asking ourselves-- if our data did belong
        ` to this particular class, how likely is it that our data would look the way it does? The formula looks like this:
        ` P(dataparam1=our_value | param_to_classify=class) * P(dataparam2=our_value | param_to_classify=class) 
        ` * ... * P(dataparam_n=our_value | param_to_classify_value=class). We then have to multiply by the likelihood that
        ` this class even shows up at all, so we multiply by P(param_to_classify_value=class) at the end. We can also divide by the probability that our specific case
        ` shows up exactly to give us a better guess at how correct our answer is (Bayes' Formula includes this as part of its definition, but since it's constant
        ` for a given set of classification data many people ignore it in their implementation of this algorithm.)
        ` This process will tell us how likely it is that this class is the class that our [classification data] belongs to.
        ` To find an individual P(dataparam_n=its_value | param_to_classify=class), we can use the training set like so:
        `                                                       (# of cases in training set where dataparam_n=our_value AND param_to_classify=class)
        ` P(dataparam_n=our_value | param_to_classify=class) =                                          /
        `                                                                      (# of cases in training set where param_to_classify=class)
        
        ` This takes care of the (# of cases in training set where param_to_classify=class).
        ` We calculated it in PrepareClassify().
        times_param_is_class = NB_PossibleParamValues(id, param, idx).times_param_is_class
        
        ` Now calculate the number of times parameter = this class AND the other parameters = our values. This is the numerator above.
        prob_product = 0
        specific_case_product = 0
        for other_param = 1 to NB_Classifiers(id).train_param
            other_param_is_our_values_and = 0
            our_specific_case = 0
            ` Do not calculate it for the parameter we're looking at.
            if other_param <> param
                ` Iterate over all data points in training set
                for tpoint = 1 to NB_Classifiers(id).train_point
                    ` Increase counter if other_parameter = our value for other_parameter AND parameter = this class
                    if NB_TrainingData(id, tpoint, other_param).value = NB_TrainingData(id, 0, other_param).value
                        if NB_TrainingData(id, tpoint, param).value = class
                            inc other_param_is_our_values_and, 1
                        endif
                        ` Increase counter for probability that our specific case shows up for this parameter
                        inc our_specific_case, 1
                    endif
                next tpoint
                ` Continue to build our probability product for this particular class.
                `print "Adding to prob product: ", (other_param_is_our_values_and * 1.0 + m# * p#)/(times_param_is_class * 1.0 + m#)
                prob_product = prob_product + log((other_param_is_our_values_and * 1.0 + m# * p#)/(times_param_is_class * 1.0 + m#))
                ` Continue to build the probability that our specific case shows up for all parameters
                specific_case_product = specific_case_product + log((our_specific_case * 1.0 + m# * p#) / (NB_Classifiers(id).train_point * 1.0 + m#))
                `print "SPECIFIC CASE FOR ", NB_TrainingData(id, 0, other_param).value,": ", our_specific_case
            endif
        next other_param
        
        ` Save the result for this potential classification. Finish off the probability calculation by multiplying by
        ` P(param_to_classify=class) = times_param_is_class / (total number of data points)
        ` Use logarithms to avoid integer underflow (#'s might get too tiny if we keep multiplying them).
        ` This is why we add/subtract rather than multiply/divide.
        prob_product = prob_product + log(times_param_is_class * 1.0 / (NB_Classifiers(id).train_point * 1.0))
        prob_product = prob_product - specific_case_product
        if prob_product >= best_prob
            best_prob = prob_product
            best_class = class
            NB_Classifiers(id).last_prob_product = best_prob
        endif
    next idx
endfunction best_class

function NB_GetLastProbability(id)
    prob# = exp( NB_Classifiers(id).last_prob_product)
endfunction prob#

function NB_GetFreeClassifier()
    id = 0
    repeat
        id = id + 1
    until NB_Classifiers(id).active = NB_INACTIVE
endfunction id

+ Code Snippet

Rem ***** Included Source File *****
remstart
******************************************************
Supervised Machine Learning: Naive Bayes Classifier
-Coded by Sixty Squares, 2013
******************************************************

Uses Bayes' Theorem with naive assumptions of independence to determine the most likely outcome, A, given
parameters/features B, C, D, E, etc.

remend
#Constant NB_INACTIVE -2
#Constant NB_ACTIVE -1
#Constant NB_MIN_INT -2147483648

Type NB_Classifiers_T
    active as integer
    m as integer
    p as float
    train_point as integer ` # of training data points
    train_param as integer ` # of train params
    classify_param as integer ` # of classify params
    last_prob_product as float ` The probability product from the last most likely classification.
Endtype

Type NB_TrainingData_T
    value as NB_DATATYPE
Endtype

Type NB_ClassifyParams_T
    train_param as integer
Endtype

Type NB_PossibleParamValues_T
    value as NB_DATATYPE
    times_param_is_class as integer
Endtype

Type NB_ParamData_T
    num_values as integer ` Number of possible parameter values
Endtype

function NB_DefaultSetup()
    num_classifiers = 2
    num_training_points = 100
    num_training_params = 5
    num_param_values = 20
    num_classify_params = 5
    NB_Setup(num_classifiers, num_training_points, num_training_params, num_param_values, num_classify_params)
endfunction

function NB_Setup(num_classifiers, num_training_points, num_training_params, num_param_values, num_classify_params)
    Global NB_NUM_POSSIBLE_PARAM_VALUES as integer : NB_NUM_POSSIBLE_PARAM_VALUES = num_param_values
    Global NB_NUM_CLASSIFIERS as integer : NB_NUM_CLASSIFIERS = num_classifiers
    Global NB_NUM_CLASSIFY_PARAMS as integer : NB_NUM_CLASSIFY_PARAMS = num_classify_params
    DIM NB_Classifiers(num_classifiers) as NB_Classifiers_T
    for c = 1 to num_classifiers
        NB_Classifiers(c).active = NB_INACTIVE
    next c
    DIM NB_TrainingData(num_classifiers, num_training_points, num_training_params) as NB_TrainingData_T
    DIM NB_PossibleParamValues(num_classifiers, num_training_params, num_param_values) as NB_PossibleParamValues_T
    DIM NB_ClassifyParams(num_classifiers, num_classify_params) as NB_ClassifyParams_T
    DIM NB_ParamData(num_classifiers, num_training_params) as NB_ParamData_T
endfunction

function NB_NewClassifier()
    id = NB_GetFreeClassifier()
    NB_Classifiers(id).active = NB_ACTIVE
    NB_Classifiers(id).train_point = 0
    NB_Classifiers(id).m = -1
    NB_Classifiers(id).p = -1
endfunction id

function NB_SetPriors(id, m as integer, p as float)
    NB_Classifiers(id).m = m
    NB_Classifiers(id).p = p
endfunction

function NB_AddTrainData(id, ptr)
    inc NB_Classifiers(id).train_point, 1
    DIM NB_Arr() as NB_DATATYPE
    link array NB_Arr(), ptr
    ac = array count(NB_Arr())
    NB_Classifiers(id).train_param = ac
    for i = 1 to ac
        NB_TrainingData(id, NB_Classifiers(id).train_point, i).value = NB_Arr(i)
    next i
    unlink array NB_Arr()
    UNDIM NB_Arr()
endfunction

function NB_ClearTrainData(id)
    NB_Classifiers(id).train_point = 0
endfunction

function NB_AddClassifyParam(id, param)
    ` Add a parameter from the training params to classify with
    inc NB_Classifiers(id).classify_param, 1
    NB_ClassifyParams(id, NB_Classifiers(id).classify_param).train_param = param
endfunction

function NB_ClearClassifyParams(id)
    NB_Classifiers(id).classify_param = 0
Endfunction

function NB_SetClassifyData(id, ptr)
    `Classify data is stored in the 0th index of training data
    DIM NB_Arr()
    link array NB_Arr(), ptr
    ac = array count(NB_Arr())
    for i = 1 to ac
        NB_TrainingData(id, 0, i).value = NB_Arr(i)
    next i
    unlink array NB_Arr()
    UNDIM NB_Arr()
endfunction

` Indexes all of the possible values for each training parameter.
function NB_Train(id)
    local class as NB_DATATYPE
    for tpoint = 1 to NB_Classifiers(id).train_point
        for param = 1 to NB_Classifiers(id).train_param
            ` Get param value for this data point
            class = NB_TrainingData(id, tpoint, param).value
            
            ` Check if we already know about it.
            found = 0
            max_class = NB_ParamData(id, param).num_values

            for idx = 1 to max_class
                if NB_PossibleParamValues(id, param, idx).value = class
                    found = 1
                endif
            next idx
            
            ` If not, add it.
            if found = 0
                inc max_class, 1
                NB_PossibleParamValues(id, param, max_class).value = class
                NB_ParamData(id, param).num_values = max_class
            endif
        next param
    next tpoint
endfunction

function NB_SaveTrainData(id, filename as string, write_func_name as string)
    local write_ptr as dword
    local file as integer
    write_ptr = get ptr to function(write_func_name)
    file = find free datafile()
    if file exist(filename) then delete file filename
    open datafile to write file, filename
    `Write Metadata
    write datafile string file, "TRAIN_POINT"
    write datafile integer file, NB_Classifiers(id).train_point
    write datafile string file, "TRAIN_PARAM"
    write datafile integer file, NB_Classifiers(id).train_param
    write datafile string file, "METADATA_END"
    `Write actual training data
    for tpoint = 1 to NB_Classifiers(id).train_point
        for param = 1 to NB_Classifiers(id).train_param
            call function ptr write_ptr, file, NB_TrainingData(id, tpoint, param).value
        next param
    next tpoint
    close datafile file
endfunction

function NB_LoadTrainData(id, filename as string, read_func_name as string)
    read_ptr = get ptr to function(read_func_name)
    file = find free datafile()
    open datafile to read file, filename
    ` Read metadata
    repeat
        s$ = datafile string$(file)
        if s$ = "TRAIN_POINT"
            train_point = datafile integer(file)
        endif
        if s$ = "TRAIN_PARAM"
            train_param = datafile integer(file)
        endif
    until s$ = "METADATA_END"
    
    local dim NB_TEMP_DATA(train_param) as NB_DATATYPE
    arr_ptr = get arrayptr(NB_TEMP_DATA())
    local temp_dat as NB_DATATYPE
    
    ` Read in training data. Store in temp local array for storage.
    for tpoint = 1 to train_point
        for param = 1 to train_param
            temp_dat = call function ptr(read_ptr, file)
            NB_TEMP_DATA(param) = temp_dat
        next param
        NB_AddTrainData(id, arr_ptr)
    next tpoint
    close datafile file
endfunction

function NB_PrepareClassify(id)
    for c = 1 to NB_Classifiers(id).classify_param
        ` Get the parameter we are calculating for (i.e. Given our data to classify, what will PARAM be?)
        param = NB_ClassifyParams(id, c).train_param
        ` Iterate over each possible classification for this parameter
        local class as NB_DATATYPE
        for idx = 1 to NB_ParamData(id, param).num_values
            class = NB_PossibleParamValues(id, param, idx).value
            ` Calculate the number of times the parameter = this class
            times_param_is_class = 0
            for tpoint = 1 to NB_Classifiers(id).train_point
                if NB_TrainingData(id, tpoint, param).value = class
                    inc times_param_is_class
                endif
            next tpoint
            ` Store it for easy acces later when classifying
            NB_PossibleParamValues(id, param, idx).times_param_is_class = times_param_is_class
        next idx
    next c
endfunction

function NB_Classify(id, param_to_classify)
    local other_param_is_our_values as integer
    local prob_product as float
    local best_prob as float : best_prob = NB_MIN_INT * 1.0
    local times_param_is_class as integer
    local our_specific_case as integer
    local specific_case_product as float
    local m# as float : local p# as float

    ` Get the parameter we are calculating for. We want to know what it will be given our data.
    param = param_to_classify
    
    ` Get m-estimate values to take care of 0 probabilities.
    p# = NB_Classifiers(id).p
    m# = NB_Classifiers(id).m
    ` Default values for m and p if none specified.
    if p# < 0 : p# = 1.0 / (NB_ParamData(id, param).num_values*1.0) : endif
    if m# < 0 : m# = 2 : endif
               
    ` Iterate over each possible classification for this parameter
    local class as NB_DATATYPE
    local best_class as NB_DATATYPE
    for idx = 1 to NB_ParamData(id, param).num_values
        class = NB_PossibleParamValues(id, param, idx).value
        ` We want to calculate P(param_to_classify_value=class | [classification_data]). We can do this by asking ourselves-- if our data did belong
        ` to this particular class, how likely is it that our data would look the way it does? The formula looks like this:
        ` P(dataparam1=our_value | param_to_classify=class) * P(dataparam2=our_value | param_to_classify=class) 
        ` * ... * P(dataparam_n=our_value | param_to_classify_value=class). We then have to multiply by the likelihood that
        ` this class even shows up at all, so we multiply by P(param_to_classify_value=class) at the end. We can also divide by the probability that our specific case
        ` shows up exactly to give us a better guess at how correct our answer is (Bayes' Formula includes this as part of its definition, but since it's constant
        ` for a given set of classification data many people ignore it in their implementation of this algorithm.)
        ` This process will tell us how likely it is that this class is the class that our [classification data] belongs to.
        ` To find an individual P(dataparam_n=its_value | param_to_classify=class), we can use the training set like so:
        `                                                       (# of cases in training set where dataparam_n=our_value AND param_to_classify=class)
        ` P(dataparam_n=our_value | param_to_classify=class) =                                          /
        `                                                                      (# of cases in training set where param_to_classify=class)
        
        ` This takes care of the (# of cases in training set where param_to_classify=class).
        ` We calculated it in PrepareClassify().
        times_param_is_class = NB_PossibleParamValues(id, param, idx).times_param_is_class
        
        ` Now calculate the number of times parameter = this class AND the other parameters = our values. This is the numerator above.
        prob_product = 0
        specific_case_product = 0
        for other_param = 1 to NB_Classifiers(id).train_param
            other_param_is_our_values_and = 0
            our_specific_case = 0
            ` Do not calculate it for the parameter we're looking at.
            if other_param <> param
                ` Iterate over all data points in training set
                for tpoint = 1 to NB_Classifiers(id).train_point
                    ` Increase counter if other_parameter = our value for other_parameter AND parameter = this class
                    if NB_TrainingData(id, tpoint, other_param).value = NB_TrainingData(id, 0, other_param).value
                        if NB_TrainingData(id, tpoint, param).value = class
                            inc other_param_is_our_values_and, 1
                        endif
                        ` Increase counter for probability that our specific case shows up for this parameter
                        inc our_specific_case, 1
                    endif
                next tpoint
                ` Continue to build our probability product for this particular class.
                `print "Adding to prob product: ", (other_param_is_our_values_and * 1.0 + m# * p#)/(times_param_is_class * 1.0 + m#)
                prob_product = prob_product + log((other_param_is_our_values_and * 1.0 + m# * p#)/(times_param_is_class * 1.0 + m#))
                ` Continue to build the probability that our specific case shows up for all parameters
                specific_case_product = specific_case_product + log((our_specific_case * 1.0 + m# * p#) / (NB_Classifiers(id).train_point * 1.0 + m#))
                `print "SPECIFIC CASE FOR ", NB_TrainingData(id, 0, other_param).value,": ", our_specific_case
            endif
        next other_param
        
        ` Save the result for this potential classification. Finish off the probability calculation by multiplying by
        ` P(param_to_classify=class) = times_param_is_class / (total number of data points)
        ` Use logarithms to avoid integer underflow (#'s might get too tiny if we keep multiplying them).
        ` This is why we add/subtract rather than multiply/divide.
        prob_product = prob_product + log(times_param_is_class * 1.0 / (NB_Classifiers(id).train_point * 1.0))
        prob_product = prob_product - specific_case_product
        if prob_product >= best_prob
            best_prob = prob_product
            best_class = class
            NB_Classifiers(id).last_prob_product = best_prob
        endif
    next idx
endfunction best_class

function NB_GetLastProbability(id)
    prob# = exp( NB_Classifiers(id).last_prob_product)
endfunction prob#

function NB_GetFreeClassifier()
    id = 0
    repeat
        id = id + 1
    until NB_Classifiers(id).active = NB_INACTIVE
endfunction id

Anyway, I hope this code is helpful somehow, and if it isn't, I hope the post was at least informative

. Thanks for reading!

Hmm. It's been a while

.

Back to top

Profile PM Email

Sixty Squares

17

Years of Service

User Offline

Joined: 7th Jun 2006

Location: Somewhere in the world

Posted: 16th Aug 2013 07:40 Edited at: 16th Aug 2013 08:39

Link

The Simple Example

Simple, Heavily Commented Example (Look at this to learn how the functions work).

remstart

Simple Naieve Bayes Example:
    -The purpose of this example is to figure out what season it is based on the following:
        -Weather (Sunny, Rainy or Cloudy)
        -Temperature (Hot, Warm or Cold)
        -Humidity (Humid or Dry)
     Each of our "data points" is going to be a measurement taken at some time of the year. The Classifier
     will use this data to figure out what season it is when we give it new data.
remend

` This defines the type of data we'll be working with. We're working with strings 
` (Hot, Sunny, Humid, Fall, Cloudy, etc... these are all strings), but we could
` just as easily work with integers, dwords, etc. 
` (Aside: Note that there is no notion of "closeness" here. Things are either equal or not equal. 
` So, if we were using integers for the temperature, 20 and 21 degrees would be just as unequal as 
` 0 and 20000 degrees. Keep this in mind when determining what the algorithm might be able to "figure out").
#Constant NB_DATATYPE string

` This is the maximum number of classifiers we can create. Each classifier can act
` on a different set of training data and do completely different things.
` We only need one for this example.
num_classifiers = 1

` This is the maximum number of pieces of training data points we can have for a given classifier.
` We don't have 20 data points, but I'm setting this to 20 to show that it is simply an upper limit!
` We can't have more than 20 points, but we can have fewer.
num_training_points = 20

` This is the maximum number of "features" or "parameters" that each data point can have. `
` We have 4 attributes (temperature, weather, humidity, season), so we'll set this to 4. We could set it to 15
` if we wanted to and it would still work since it's just an upper limit.
num_training_params = 4

` This is the maximum number of values that any given parameter can take on. In our case, our longest parameter
` (Season) can be either Winter, Spring, Summer or Fall. That's 4 possible values. The others only have 2 or 3 possible values,
` So we set this to 4.
num_param_values = 4

`This is the maximum number of parameters that we can prep for classification at once. We only want to classify
` the season right now, so we'll set this to 1. More on what "prepping for classification" means later.
num_classify_params = 1

` Note that you could easily set these to larger values even though we don't need them. This would just mean you have extra space lying around in
` the arrays that get created.

` Now we finally initialize the system with our values.
NB_Setup(num_classifiers, num_training_points, num_training_params, num_param_values, num_classify_params)

` If you're too lazy to do all of that, there's a command called NB_DefaultSetup() which chooses some values for you. They may or may not
` suit your game, which is why I took the long route here.

` Let's create a new classifier to classify things with.
classifier_id = NB_NewClassifier()

` Now it needs some data to learn from. Let's create an array to hold a data point.
Dim My_Data$(4)

` And now let's add some data. Let's have slot 1 be "weather", slot 2 be "temperature",
` slot 3 be "humidity" and slot 4 be "season".
My_Data$(1) = "Sunny"
My_Data$(2) = "Hot"
My_Data$(3) = "Dry"
My_Data$(4) = "Summer"

` Now we need to tell our classifier about this data point. To do that, we need to give
` the classifier this array. Since we can't pass arrays into functions using DBPro, we instead
` pass a pointer to the array. This basically tells the function how to access our array. The
` get arrayptr command comes from IanM's Matrix1 plugins.
NB_AddTrainData(classifier_id, get arrayptr(My_Data$()))

` There. Data is copied over and saved. Now we can reuse the array to add lots of other data. The more data we add,
` the better of a job the classifier will do when it comes time to guess the season.
` A couple more data points for the summer:
My_Data$(1) = "Rainy"
My_Data$(2) = "Hot"
My_Data$(3) = "Humid"
My_Data$(4) = "Summer"
NB_AddTrainData(classifier_id, get arrayptr(My_Data$()))
My_Data$(1) = "Cloudy"
My_Data$(2) = "Hot"
My_Data$(3) = "Humid"
My_Data$(4) = "Summer"
NB_AddTrainData(classifier_id, get arrayptr(My_Data$()))

` Now Spring
My_Data$(1) = "Sunny"
My_Data$(2) = "Warm"
My_Data$(3) = "Dry"
My_Data$(4) = "Spring"
NB_AddTrainData(classifier_id, get arrayptr(My_Data$()))
My_Data$(1) = "Cloudy"
My_Data$(2) = "Warm"
My_Data$(3) = "Humid"
My_Data$(4) = "Spring"
NB_AddTrainData(classifier_id, get arrayptr(My_Data$()))

` Some Fall
My_Data$(1) = "Rainy"
My_Data$(2) = "Cold"
My_Data$(3) = "Humid"
My_Data$(4) = "Fall"
NB_AddTrainData(classifier_id, get arrayptr(My_Data$()))
My_Data$(1) = "Cloudy"
My_Data$(2) = "Cold"
My_Data$(3) = "Humid"
My_Data$(4) = "Fall"
NB_AddTrainData(classifier_id, get arrayptr(My_Data$()))

` And Winter.
My_Data$(1) = "Rainy"
My_Data$(2) = "Cold"
My_Data$(3) = "Dry"
My_Data$(4) = "Winter"
NB_AddTrainData(classifier_id, get arrayptr(My_Data$()))
My_Data$(1) = "Sunny"
My_Data$(2) = "Cold"
My_Data$(3) = "Dry"
My_Data$(4) = "Winter"
NB_AddTrainData(classifier_id, get arrayptr(My_Data$()))

` Great. We have some data. 9 points total. It doesn't matter if there's different amounts of data
` per season, by the way. The classifier makes due with whatever it gets.

` Time for our classifier to learn from the data. You have to do this every time you change the
` training data, otherwise the classifier won't use it.
NB_Train(classifier_id)

` Done. Now let's figure out which parameter we want to classify. We want to determine the season (parameter 4)
` based on the other 3 parameters (weather, temp, humidity), so let's add parameter 4 to our list.
NB_AddClassifyParam(classifier_id, 4)
` We could call that function again if we wanted to classify other parameters like the Temperature by doing the following:
` NB_AddClassifyParam(classifier_id, 2)
` But we don't because we only care about the season in this example.

` Now we need to "prep" the classifier for classification of the parameters we chose (parameter 4 in our case).
` This involves performing a few repetitive calculations and saving the results internally
` so classification doesn't take as long later.
NB_PrepareClassify(classifier_id)

` And now we're ready to classify. Let's let the user input some values. (Do a loop so we can do it over and over)
repeat
    input "Enter Weather (Sunny, Cloudy or Rainy) (Case-sensitive!): ", weather$
    input "Enter Temperature (Hot, Warm or Cold) (Case-sensitive!): ", temperature$
    input "Enter Humidity (Dry or Humid) (Case-sensitive!): ", humidity$
    
    ` And now let's tell the classifier about the user's values.
    My_Data$(1) = weather$
    My_Data$(2) = temperature$
    My_Data$(3) = humidity$
    My_Data$(4) = "So... what season am I, huh? HUH?"  ` Since we're about to go classify parameter 4, it doesn't matter what we set it to. It gets ignored.
    NB_SetClassifyData(classifier_id, get arrayptr(My_Data$()))
    
    ` Print out the data the user just entered.
    print "--------------"
    print "WEATHER: ", My_Data$(1)
    print "TEMPERATURE: ", My_Data$(2)
    print "HUMIDITY: ", My_Data$(3)
    print "--------------"
    
    ` Now print the computer's guess at the season.
    season$ = NB_Classify(classifier_id, 4)
    print "The season is most likely: ", season$
    
    ` And print how sure it is about it (take this answer with a grain of salt. It assumes independence even if it can't be assumed,
    ` so you can get some weird numbers if that assumption is false). This gives us the probability that was calculated in the last call to NB_Classify.
    certainty# = NB_GetLastProbability(classifier_id)
    print "This is ", certainty#*100.0, "% likely."
    
    ` And that's all there is to it!
    wait key
    cls
until escapekey()

+ Code Snippet

remstart

Simple Naieve Bayes Example:
    -The purpose of this example is to figure out what season it is based on the following:
        -Weather (Sunny, Rainy or Cloudy)
        -Temperature (Hot, Warm or Cold)
        -Humidity (Humid or Dry)
     Each of our "data points" is going to be a measurement taken at some time of the year. The Classifier
     will use this data to figure out what season it is when we give it new data.
remend

` This defines the type of data we'll be working with. We're working with strings 
` (Hot, Sunny, Humid, Fall, Cloudy, etc... these are all strings), but we could
` just as easily work with integers, dwords, etc. 
` (Aside: Note that there is no notion of "closeness" here. Things are either equal or not equal. 
` So, if we were using integers for the temperature, 20 and 21 degrees would be just as unequal as 
` 0 and 20000 degrees. Keep this in mind when determining what the algorithm might be able to "figure out").
#Constant NB_DATATYPE string

` This is the maximum number of classifiers we can create. Each classifier can act
` on a different set of training data and do completely different things.
` We only need one for this example.
num_classifiers = 1

` This is the maximum number of pieces of training data points we can have for a given classifier.
` We don't have 20 data points, but I'm setting this to 20 to show that it is simply an upper limit!
` We can't have more than 20 points, but we can have fewer.
num_training_points = 20

` This is the maximum number of "features" or "parameters" that each data point can have. `
` We have 4 attributes (temperature, weather, humidity, season), so we'll set this to 4. We could set it to 15
` if we wanted to and it would still work since it's just an upper limit.
num_training_params = 4

` This is the maximum number of values that any given parameter can take on. In our case, our longest parameter
` (Season) can be either Winter, Spring, Summer or Fall. That's 4 possible values. The others only have 2 or 3 possible values,
` So we set this to 4.
num_param_values = 4

`This is the maximum number of parameters that we can prep for classification at once. We only want to classify
` the season right now, so we'll set this to 1. More on what "prepping for classification" means later.
num_classify_params = 1

` Note that you could easily set these to larger values even though we don't need them. This would just mean you have extra space lying around in
` the arrays that get created.

` Now we finally initialize the system with our values.
NB_Setup(num_classifiers, num_training_points, num_training_params, num_param_values, num_classify_params)

` If you're too lazy to do all of that, there's a command called NB_DefaultSetup() which chooses some values for you. They may or may not
` suit your game, which is why I took the long route here.

` Let's create a new classifier to classify things with.
classifier_id = NB_NewClassifier()

` Now it needs some data to learn from. Let's create an array to hold a data point.
Dim My_Data$(4)

` And now let's add some data. Let's have slot 1 be "weather", slot 2 be "temperature",
` slot 3 be "humidity" and slot 4 be "season".
My_Data$(1) = "Sunny"
My_Data$(2) = "Hot"
My_Data$(3) = "Dry"
My_Data$(4) = "Summer"

` Now we need to tell our classifier about this data point. To do that, we need to give
` the classifier this array. Since we can't pass arrays into functions using DBPro, we instead
` pass a pointer to the array. This basically tells the function how to access our array. The
` get arrayptr command comes from IanM's Matrix1 plugins.
NB_AddTrainData(classifier_id, get arrayptr(My_Data$()))

` There. Data is copied over and saved. Now we can reuse the array to add lots of other data. The more data we add,
` the better of a job the classifier will do when it comes time to guess the season.
` A couple more data points for the summer:
My_Data$(1) = "Rainy"
My_Data$(2) = "Hot"
My_Data$(3) = "Humid"
My_Data$(4) = "Summer"
NB_AddTrainData(classifier_id, get arrayptr(My_Data$()))
My_Data$(1) = "Cloudy"
My_Data$(2) = "Hot"
My_Data$(3) = "Humid"
My_Data$(4) = "Summer"
NB_AddTrainData(classifier_id, get arrayptr(My_Data$()))

` Now Spring
My_Data$(1) = "Sunny"
My_Data$(2) = "Warm"
My_Data$(3) = "Dry"
My_Data$(4) = "Spring"
NB_AddTrainData(classifier_id, get arrayptr(My_Data$()))
My_Data$(1) = "Cloudy"
My_Data$(2) = "Warm"
My_Data$(3) = "Humid"
My_Data$(4) = "Spring"
NB_AddTrainData(classifier_id, get arrayptr(My_Data$()))

` Some Fall
My_Data$(1) = "Rainy"
My_Data$(2) = "Cold"
My_Data$(3) = "Humid"
My_Data$(4) = "Fall"
NB_AddTrainData(classifier_id, get arrayptr(My_Data$()))
My_Data$(1) = "Cloudy"
My_Data$(2) = "Cold"
My_Data$(3) = "Humid"
My_Data$(4) = "Fall"
NB_AddTrainData(classifier_id, get arrayptr(My_Data$()))

` And Winter.
My_Data$(1) = "Rainy"
My_Data$(2) = "Cold"
My_Data$(3) = "Dry"
My_Data$(4) = "Winter"
NB_AddTrainData(classifier_id, get arrayptr(My_Data$()))
My_Data$(1) = "Sunny"
My_Data$(2) = "Cold"
My_Data$(3) = "Dry"
My_Data$(4) = "Winter"
NB_AddTrainData(classifier_id, get arrayptr(My_Data$()))

` Great. We have some data. 9 points total. It doesn't matter if there's different amounts of data
` per season, by the way. The classifier makes due with whatever it gets.

` Time for our classifier to learn from the data. You have to do this every time you change the
` training data, otherwise the classifier won't use it.
NB_Train(classifier_id)

` Done. Now let's figure out which parameter we want to classify. We want to determine the season (parameter 4)
` based on the other 3 parameters (weather, temp, humidity), so let's add parameter 4 to our list.
NB_AddClassifyParam(classifier_id, 4)
` We could call that function again if we wanted to classify other parameters like the Temperature by doing the following:
` NB_AddClassifyParam(classifier_id, 2)
` But we don't because we only care about the season in this example.

` Now we need to "prep" the classifier for classification of the parameters we chose (parameter 4 in our case).
` This involves performing a few repetitive calculations and saving the results internally
` so classification doesn't take as long later.
NB_PrepareClassify(classifier_id)

` And now we're ready to classify. Let's let the user input some values. (Do a loop so we can do it over and over)
repeat
    input "Enter Weather (Sunny, Cloudy or Rainy) (Case-sensitive!): ", weather$
    input "Enter Temperature (Hot, Warm or Cold) (Case-sensitive!): ", temperature$
    input "Enter Humidity (Dry or Humid) (Case-sensitive!): ", humidity$
    
    ` And now let's tell the classifier about the user's values.
    My_Data$(1) = weather$
    My_Data$(2) = temperature$
    My_Data$(3) = humidity$
    My_Data$(4) = "So... what season am I, huh? HUH?"  ` Since we're about to go classify parameter 4, it doesn't matter what we set it to. It gets ignored.
    NB_SetClassifyData(classifier_id, get arrayptr(My_Data$()))
    
    ` Print out the data the user just entered.
    print "--------------"
    print "WEATHER: ", My_Data$(1)
    print "TEMPERATURE: ", My_Data$(2)
    print "HUMIDITY: ", My_Data$(3)
    print "--------------"
    
    ` Now print the computer's guess at the season.
    season$ = NB_Classify(classifier_id, 4)
    print "The season is most likely: ", season$
    
    ` And print how sure it is about it (take this answer with a grain of salt. It assumes independence even if it can't be assumed,
    ` so you can get some weird numbers if that assumption is false). This gives us the probability that was calculated in the last call to NB_Classify.
    certainty# = NB_GetLastProbability(classifier_id)
    print "This is ", certainty#*100.0, "% likely."
    
    ` And that's all there is to it!
    wait key
    cls
until escapekey()

Concise version:

remstart

Simple Naieve Bayes Example:
    -The purpose of this example is to figure out what season it is based on the following:
        -Weather (Sunny, Rainy or Cloudy)
        -Temperature (Hot, Warm or Cold)
        -Humidity (Humid or Dry)
     Each of our "data points" is going to be a measurement taken at some time of the year. The Classifier
     will use this data to figure out what season it is when we give it new data.
remend

` Setup
#Constant NB_DATATYPE string
num_classifiers = 1
num_training_points = 20
num_training_params = 4
num_param_values = 4
num_classify_params = 1
NB_Setup(num_classifiers, num_training_points, num_training_params, num_param_values, num_classify_params)

` New classifier
classifier_id = NB_NewClassifier()

` Add data
Dim My_Data$(4)
My_Data$(1) = "Sunny"
My_Data$(2) = "Hot"
My_Data$(3) = "Dry"
My_Data$(4) = "Summer"
NB_AddTrainData(classifier_id, get arrayptr(My_Data$()))
My_Data$(1) = "Rainy"
My_Data$(2) = "Hot"
My_Data$(3) = "Humid"
My_Data$(4) = "Summer"
NB_AddTrainData(classifier_id, get arrayptr(My_Data$()))
My_Data$(1) = "Cloudy"
My_Data$(2) = "Hot"
My_Data$(3) = "Humid"
My_Data$(4) = "Summer"
NB_AddTrainData(classifier_id, get arrayptr(My_Data$()))
My_Data$(1) = "Sunny"
My_Data$(2) = "Warm"
My_Data$(3) = "Dry"
My_Data$(4) = "Spring"
NB_AddTrainData(classifier_id, get arrayptr(My_Data$()))
My_Data$(1) = "Cloudy"
My_Data$(2) = "Warm"
My_Data$(3) = "Humid"
My_Data$(4) = "Spring"
NB_AddTrainData(classifier_id, get arrayptr(My_Data$()))
My_Data$(1) = "Rainy"
My_Data$(2) = "Cold"
My_Data$(3) = "Humid"
My_Data$(4) = "Fall"
NB_AddTrainData(classifier_id, get arrayptr(My_Data$()))
My_Data$(1) = "Cloudy"
My_Data$(2) = "Cold"
My_Data$(3) = "Humid"
My_Data$(4) = "Fall"
NB_AddTrainData(classifier_id, get arrayptr(My_Data$()))
My_Data$(1) = "Rainy"
My_Data$(2) = "Cold"
My_Data$(3) = "Dry"
My_Data$(4) = "Winter"
NB_AddTrainData(classifier_id, get arrayptr(My_Data$()))
My_Data$(1) = "Sunny"
My_Data$(2) = "Cold"
My_Data$(3) = "Dry"
My_Data$(4) = "Winter"
NB_AddTrainData(classifier_id, get arrayptr(My_Data$()))

` Train classifier on data
NB_Train(classifier_id)

` Learn to classify Season (parameter 4)
NB_AddClassifyParam(classifier_id, 4)
NB_PrepareClassify(classifier_id)

` Let user play
repeat
    input "Enter Weather (Sunny, Cloudy or Rainy) (Case-sensitive!): ", weather$
    input "Enter Temperature (Hot, Warm or Cold) (Case-sensitive!): ", temperature$
    input "Enter Humidity (Dry or Humid) (Case-sensitive!): ", humidity$
    
    My_Data$(1) = weather$
    My_Data$(2) = temperature$
    My_Data$(3) = humidity$
    My_Data$(4) = "ice cream"
    NB_SetClassifyData(classifier_id, get arrayptr(My_Data$()))
    
    print "--------------"
    print "WEATHER: ", My_Data$(1)
    print "TEMPERATURE: ", My_Data$(2)
    print "HUMIDITY: ", My_Data$(3)
    print "--------------"

` Classify season (parameter 4)
    season$ = NB_Classify(classifier_id, 4)
    print "The season is most likely: ", season$
    wait key
    cls
until escapekey()

+ Code Snippet

remstart

Simple Naieve Bayes Example:
    -The purpose of this example is to figure out what season it is based on the following:
        -Weather (Sunny, Rainy or Cloudy)
        -Temperature (Hot, Warm or Cold)
        -Humidity (Humid or Dry)
     Each of our "data points" is going to be a measurement taken at some time of the year. The Classifier
     will use this data to figure out what season it is when we give it new data.
remend

` Setup
#Constant NB_DATATYPE string
num_classifiers = 1
num_training_points = 20
num_training_params = 4
num_param_values = 4
num_classify_params = 1
NB_Setup(num_classifiers, num_training_points, num_training_params, num_param_values, num_classify_params)

` New classifier
classifier_id = NB_NewClassifier()

` Add data
Dim My_Data$(4)
My_Data$(1) = "Sunny"
My_Data$(2) = "Hot"
My_Data$(3) = "Dry"
My_Data$(4) = "Summer"
NB_AddTrainData(classifier_id, get arrayptr(My_Data$()))
My_Data$(1) = "Rainy"
My_Data$(2) = "Hot"
My_Data$(3) = "Humid"
My_Data$(4) = "Summer"
NB_AddTrainData(classifier_id, get arrayptr(My_Data$()))
My_Data$(1) = "Cloudy"
My_Data$(2) = "Hot"
My_Data$(3) = "Humid"
My_Data$(4) = "Summer"
NB_AddTrainData(classifier_id, get arrayptr(My_Data$()))
My_Data$(1) = "Sunny"
My_Data$(2) = "Warm"
My_Data$(3) = "Dry"
My_Data$(4) = "Spring"
NB_AddTrainData(classifier_id, get arrayptr(My_Data$()))
My_Data$(1) = "Cloudy"
My_Data$(2) = "Warm"
My_Data$(3) = "Humid"
My_Data$(4) = "Spring"
NB_AddTrainData(classifier_id, get arrayptr(My_Data$()))
My_Data$(1) = "Rainy"
My_Data$(2) = "Cold"
My_Data$(3) = "Humid"
My_Data$(4) = "Fall"
NB_AddTrainData(classifier_id, get arrayptr(My_Data$()))
My_Data$(1) = "Cloudy"
My_Data$(2) = "Cold"
My_Data$(3) = "Humid"
My_Data$(4) = "Fall"
NB_AddTrainData(classifier_id, get arrayptr(My_Data$()))
My_Data$(1) = "Rainy"
My_Data$(2) = "Cold"
My_Data$(3) = "Dry"
My_Data$(4) = "Winter"
NB_AddTrainData(classifier_id, get arrayptr(My_Data$()))
My_Data$(1) = "Sunny"
My_Data$(2) = "Cold"
My_Data$(3) = "Dry"
My_Data$(4) = "Winter"
NB_AddTrainData(classifier_id, get arrayptr(My_Data$()))

` Train classifier on data
NB_Train(classifier_id)

` Learn to classify Season (parameter 4)
NB_AddClassifyParam(classifier_id, 4)
NB_PrepareClassify(classifier_id)

` Let user play
repeat
    input "Enter Weather (Sunny, Cloudy or Rainy) (Case-sensitive!): ", weather$
    input "Enter Temperature (Hot, Warm or Cold) (Case-sensitive!): ", temperature$
    input "Enter Humidity (Dry or Humid) (Case-sensitive!): ", humidity$
    
    My_Data$(1) = weather$
    My_Data$(2) = temperature$
    My_Data$(3) = humidity$
    My_Data$(4) = "ice cream"
    NB_SetClassifyData(classifier_id, get arrayptr(My_Data$()))
    
    print "--------------"
    print "WEATHER: ", My_Data$(1)
    print "TEMPERATURE: ", My_Data$(2)
    print "HUMIDITY: ", My_Data$(3)
    print "--------------"

	` Classify season (parameter 4)
    season$ = NB_Classify(classifier_id, 4)
    print "The season is most likely: ", season$
    wait key
    cls
until escapekey()

Hmm. It's been a while

.

Back to top

Profile PM Email

Ashingda 27

16

Years of Service

User Offline

Joined: 15th Feb 2008

Location:

Posted: 16th Aug 2013 08:12

Link

Very interesting concepts, could lead to making a video game hard to beat because you cant just spam the same moves until the closing credits. Hmm I might want to try something like this in my game.

Back to top

Profile PM Email

Ortu

DBPro Master

16

Years of Service

User Offline

Joined: 21st Nov 2007

Location: Austin, TX

Posted: 16th Aug 2013 09:02

Link

Really nice write up with some good clear examples. Nice man, thanks!

I'm gonna be king of the moon dots!

Back to top

Profile PM Email Website

Sixty Squares

17

Years of Service

User Offline

Joined: 7th Jun 2006

Location: Somewhere in the world

Posted: 17th Aug 2013 01:44

Link

@Ashingda: My thoughts exactly. Let me know if you end up using it in Land Conquest; I'd be interested in seeing how you choose to use it/how well it works in that context.

@Ortu: Thank man, I'm glad you enjoyed it

Hmm. It's been a while

.

Back to top

Profile PM Email

Clonkex

Forum Vice President

13

Years of Service

User Offline

Joined: 20th May 2010

Location: Northern Tablelands, NSW, Australia

Posted: 4th Sep 2013 15:52

Link

Quote: "Long time no see!"

Good to see you're still alive, Sixty! I have a question for you: could you possibly recompile RND Sprint and post it please? I want to play it and fight over high scores with my brother once again, but it won't run on Windows 7 ("Parameter Incorrect") because it was compiled with a version of DBPro that had a bug (which has since be rectified). I've played it before (and it was fun!) but it was on my Windows XP computer last time.

AH I just saw your Bob thread....I can't believe a thread about a neverending (already sounds bad) text adventure (no! this sounds horrible) involving someone called Bob (ARG...choking... on... boring!) actually managed to survive FOUR PAGES!! (*dies with liquid boredom spilling out of every orifice*) I read through the whole thing and it was very funny

Quote: "Very interesting concepts, could lead to making a video game hard to beat because you cant just spam the same moves until the closing credits."

I agree, could be turn out very useful. I have several ideas for what I could use this for already

Placement Editor Professional
Code Line Counter

Back to top

Profile PM Email

Sixty Squares

17

Years of Service

User Offline

Joined: 7th Jun 2006

Location: Somewhere in the world

Posted: 14th Sep 2013 03:14 Edited at: 14th Sep 2013 03:16

Link

Quote: " could you possibly recompile RND Sprint and post it please? I want to play it and fight over high scores with my brother once again, but it won't run on Windows 7 ("Parameter Incorrect") because it was compiled with a version of DBPro that had a bug (which has since be rectified). I've played it before (and it was fun!) but it was on my Windows XP computer last time."

I'm glad you guys enjoyed it! I'm actually on a Windows 7 computer right now. I had recompiled it with the updated DBPro, but for some reason the game crashes (unexpected error) when you try to move at maximum speed (on my computer at least). It might have something to do with some outdated plugin or something, but I have no idea. It'd be interesting to try it on another computer though. I have to find the flash drive with the game on it but once I do I'll try and re-upload it!

Quote: "AH I just saw your Bob thread....I can't believe a thread about a neverending (already sounds bad) text adventure (no! this sounds horrible) involving someone called Bob (ARG...choking... on... boring!) actually managed to survive FOUR PAGES!! (*dies with liquid boredom spilling out of every orifice*) I read through the whole thing and it was very funny"

Hahahahahaha I had forgotten about that. I have no idea how it survived for 4 pages either, that's hilarious in retrospect!

Quote: "I have several ideas for what I could use this for already "

I look forward to seeing what you come up with

Hmm. It's been a while

.

Back to top

Profile PM Email

Sorry your browser is not supported!

Work in Progress / [DBP] Supervised Machine Learning Functions: Naive Bayes Classifier (Easy-to-use!)