As described by our approach, our program uses circular definitions to give the computer the ability to manipulate the symbols (words) that make up a basic vocabulary.
Our basic algorithm is described in our "Our Approach" page; this section focuses on specific elements of our code.
Full C++ source code, including Visual Studio 6.0 and 2005 project and solution files to be posted later, pending a finalized version.
This object represents the most basic element in our machine's vocabulary: a single word. It contains fields for how the word is spelled, an integer uniquely identifying it, and vectors of CWord pointers.
It also contains a field describing its usage in grammar ("noun" versus "verb", etc.). This allows our vocabulary to have separate entries for "bark", as in "what a dog does" and "bark" as in "what a tree has."
The vectors of CWord pointers define how each word is related to others. For example, pointers in one vector represent words that this one innately "is"; another vector stores which words
innately compose this one.
This arrangement would allow an AI program to determine the context of words in a sentence. For example, it is impossible to determine the exact meaning of "bark" simply from its spelling - us humans read the rest of the sentence to figure out if it is talking about the habitually annoying canine ritual or that mass of cellulose that tends to form around trees. Our AI knows which meaning is meant by "bark" by examining the other words in the sentence and seeing which definition of bark is "closest" - the fewest links away from the other CWord nodes in the sentence..
The CWord structure is meant mostly to store data, and little code for operations on that data.
The CIndexedWeb object represents the vocabulary itself. It is composed of multiple CWord nodes, and simplifies their creation and deletion at runtime. Although technically this arbitrarily-linked aggregate of nodes is called a "graph" (thanks, Dr. McVey, for help with the proper nomenclature), the object was christened prior to my knowledge that this willy-nilly structure even had a proper name, and I didn't feel like changing every reference to "CIndexedWeb" with "CIndexedGraph."
This ill-named object is responsible for most of the actions performed on the data, such as checking if a word is already in the vocabulary and adding new words. Future versions would contain "degrees of separation" algorithms to find out how many links apart two arbitrary CWords are. (This would be used to determine the context of a particular spelling of a word, as described in the CWord section above.)
This class also maintains a linear table of pointers to every CWord node added to / removed from the list. Although this table itself is worthless from a symbol analysis point of view, it makes it possible to guarantee traversal of the entire structure with a single "for" loop, which is useful for search and file I/O algorithms.
Although depth-first and breadth-first searches that hop from node to node by their links are more elegant, these would miss the unlinked "orphan" nodes our data structure is designed to accomodate. When an AI encounters a word it does not "know" (i.e., does not have a CWord entry for in its CIndexedWeb), it would add an empty CWord node to the CIndexedWeb, but not link it to anything. CIndexedWeb has functions for using linear search to find these orphan nodes, which can be "filled in" later by the computer interrogating the human as to their meaning. This process is recursive; if the user describes unknown word x to the computer in terms of another unknown word y, the CIndexedWeb links x and y to each other anyway and then asks for the definition of word y, until the user eventually describes some part of the fork in terms of existing vocabulary.