XML Parser for AGK v.2 - GameCreators Forum

Author

Message

Phaelax

DBPro Master

22

Years of Service

Recently Online

Joined: 16th Apr 2003

Location: Metropia

Posted: 4th Mar 2017 22:56 Edited at: 4th Mar 2017 22:56

Link

While working on my updated TMX loader, I had to revisit my XML parser. After greatly improving the speed over my previous Base64 code, the XML parsing turned out to be another major bottleneck in my loader. I'm happy to say the speed improvement makes this very usable now.

If you've used my old parser, which I'm just now realizing it appears I never posted my AppGameKit adaptation from my old DBP code, this one works entirely different. My old version required you to grab the values you wanted from the XML in a linear fashion as it read the document. So you took what data from the file you needed as it loaded line by line. Basically, it was more like a SAX parser. My new, entirely rewritten from scratch, approach is more like a DOM parser. While that does require more memory because the entire document is loaded in a tree-like structure, data from the XML-tree can be accessed at any time and maintains the integrity of the dom structure.

Attached is a TMX map file which contains a 120k character string making up the encoded base64 map data. This enormous string was the cause of major headaches in my old parser. The file itself is about 400 lines. Load time is virtually instant. Testing the code with my unencoded zelda map file, which is 867KB and 45,455 lines of text, it loaded in 1.4 seconds. (the old version took 2 minutes!)

In my example, the print element function is recursive as to print each element's children in a tabbed-structure so you can visualize the tree structure.
The element's structure as printed is the element's global id (it's index in the XML_Element array) followed by the element's tag name, the attributes listed in square brackets, then finally how many children this element contains and who it's parent is.

The function list is not yet complete, but should be well enough to show how it works. This also makes it very easy to write an XML writer as well.

One final thing to note is a slight limitation. The loader was written to assume only 1 tag is present per line. For most well-structured xml files, this shouldn't be an issue. However, something you download from the web may not be so neatly formatted, such as a result from an xml-rpc server request. Hopefully, in the future I can fix that limitation. But for now it is what it is and it serves its primary purpose I had intended, to read TMX files.

//////////////////////////////////////////////////////////////////////
// Title: XML Parser
// Author: Phaelax
// Date: March 4, 2017
//
// Commands:
//    arr[]  = xml_loadDocument(string filename)
//    int    = xml_GetParentId(dom ref as XML_Element[], elementId as integer)
//    int    = xml_GetAttributeCount(dom ref as XML_Element[], elementId as integer)
//    int    = xml_GetAttributeKeyById(dom ref as XML_Element[], elementId as integer, attributeId as integer)
//    int    = xml_GetAttributeValueById(dom ref as XML_Element[], elementId as integer, attributeId as integer)
//    int    = xml_GetTagName(dom ref as XML_Element[], elementId as integer)
//    int    = xml_GetChildCount(dom ref as XML_Element[], elementId as integer)
//    string = xml_GetAttributeValueByName(dom ref as XML_Element[], elementId as integer, att as string))
//    int    = xml_GetAttributeIdByName(dom ref as XML_Element[], elementId as integer, att as string))
//    ar[]   = xml_GetAttributesArray(dom ref as XML_Element[], elementId as integer)
//    string = xml_GetElementValue(dom ref as XML_Element[], elementId as integer)
//    int    = xml_GetChildIdById(dom ref as XML_Element[], elementId as integer, childId as integer)
//    arr[]  = xml_GetRootElementsArray(dom ref as XML_Element[])

SetVirtualResolution(1920,1200)
SetWindowSize(1920, 1200, 0)
UseNewDefaultFonts(1)
SetSyncRate(60, 0 ) 
setPrintSize(20)

Type XML_Attribute
	key as string
	value as string
EndType

Type XML_Tag
	parent as integer
	name as string
	attributes as XML_Attribute[]
	value as string
EndType

Type XML_Element
	self as XML_Tag
	children as integer[-1]
EndType

dom as XML_Element[]

start = GetMilliseconds()
dom = xml_loadDocument("overworld22.tmx")
finish = getMilliseconds()

// Get the root elements of the dom tree
roots as integer[]
roots = xml_GetRootElementsArray(dom)

do
	
	print(finish-start)
	print("")

for i = 0 to roots.length
		printElement(dom, roots[i], "")
	next i

sync()
loop

function printElement(dom ref as XML_Element[], i as integer, space as string)
	name$  = xml_GetTagName(dom, i)
	parent = xml_GetParentId(dom, i)
	childCount = xml_GetChildCount(dom, i) 
	att as XML_Attribute[]
	att = xml_GetAttributesArray(dom, i)
	value$ = xml_GetElementValue(dom, i)
	printc(space+str(i)+")   "+name$)
	printc("    [")
	for k = 0 to att.length
		printc(att[k].key+"  =  "+att[k].value+" , ")
	next k
	printc("]")
	printc("    (childCount:  "+str(childCount)+")")
	print("        parentID:  "+str(parent))

for j = 0 to childCount-1
		c = xml_GetChildIdById(dom, i, j)
		printElement(dom, c, space+"            ")
	next j
	print("")
endfunction

//////////////////////////////////////////////////////////////////////
// Return an array of indices of each element that is part of the dom root
//////////////////////////////////////////////////////////////////////
function xml_GetRootElementsArray(dom ref as XML_Element[])
	arr as integer[]
	for i = 0 to dom.length
		if dom[i].self.parent = 0 then arr.insert(i)
	next i
endfunction arr
//////////////////////////////////////////////////////////////////////
// Return the element id of the specified child element
//////////////////////////////////////////////////////////////////////
function xml_GetChildIdById(dom ref as XML_Element[], elementId as integer, childId as integer)
	if childId = -1 then exitfunction -1
endfunction dom[elementId].children[childId]

//////////////////////////////////////////////////////////////////////
// Return the element's value (inner content)
//////////////////////////////////////////////////////////////////////
function xml_GetElementValue(dom ref as XML_Element[], elementId as integer)
endfunction dom[elementId].self.value

//////////////////////////////////////////////////////////////////////
// Return an array of XML_Attribute
//////////////////////////////////////////////////////////////////////
function xml_GetAttributesArray(dom ref as XML_Element[], elementId as integer)
endfunction dom[elementId].self.attributes

//////////////////////////////////////////////////////////////////////
// Returns the parent ID of this element
//////////////////////////////////////////////////////////////////////
function xml_GetParentId(dom ref as XML_Element[], elementId as integer)
endfunction dom[elementId].self.parent

//////////////////////////////////////////////////////////////////////
// Returns the key/name of an attribute
//////////////////////////////////////////////////////////////////////
function xml_GetAttributeKeyById(dom ref as XML_Element[], elementId as integer, attributeId as integer)
endfunction dom[elementId].self.attributes[attributeId].key

//////////////////////////////////////////////////////////////////////
// Returns the value of an attribute given it's ID
//////////////////////////////////////////////////////////////////////
function xml_GetAttributeValueById(dom ref as XML_Element[], elementId as integer, attributeId as integer)
endfunction dom[elementId].self.attributes[attributeId].value

//////////////////////////////////////////////////////////////////////
// Returns the value of an attribute, looked up by the attribute name
//////////////////////////////////////////////////////////////////////
function xml_GetAttributeValueByName(dom ref as XML_Element[], elementId as integer, att as string)
	for i = 0 to dom[elementId].self.attributes.length
		if dom[elementId].self.attributes[i].key = att then exitfunction dom[elementId].self.attributes[i].value
	next i
endfunction ""

//////////////////////////////////////////////////////////////////////
// Returns the ID of an attribute of specified element
//////////////////////////////////////////////////////////////////////
function xml_GetAttributeIdByName(dom ref as XML_Element[], elementId as integer, att as string)
	for i = 0 to dom[elementId].self.attributes.length
		if dom[elementId].self.attributes[i].key = att then exitfunction i
	next i
endfunction -1

//////////////////////////////////////////////////////////////////////
// Return the tag name of this element
//////////////////////////////////////////////////////////////////////
function xml_GetTagName(dom ref as XML_Element[], elementId as integer)
endfunction dom[elementId].self.name

//////////////////////////////////////////////////////////////////////
// Returns the number of attributes for this element
//////////////////////////////////////////////////////////////////////
function xml_GetAttributeCount(dom ref as XML_Element[], elementId as integer)
endfunction dom[elementId].self.attributes.length+1

//////////////////////////////////////////////////////////////////////
// Returns the number of direct children under the specified element
//////////////////////////////////////////////////////////////////////
function xml_GetChildCount(dom ref as XML_Element[], elementId as integer)
endfunction dom[elementId].children.length+1

//////////////////////////////////////////////////
// Returns the index in xml array of the first
// occurrence of 'tag'
// Returns -1 if no match found
//////////////////////////////////////////////////

function xml_FindFirstTag(dom as XML_Element[], tag as string)
	for i = 0 to dom.length
		if dom[i].self.name  = tag then exitfunction i
	next i
endfunction -1

//////////////////////////////////////////////////
// Loads an XML file into a DOM-like structure
// and returns it as an array of XML_Element
//////////////////////////////////////////////////
function xml_loadDocument(file as string)
	elements as XML_Element[]

openTags as integer[0]

f = openToRead(file)

q = 0
	repeat
		inc q
		// find tag
		s$ = readLine(f)
		L = len(s$)
		a1 = 0
		findAtts = 0
		tagFound = 0
		for i = 1 to L
			// tag opening
			b$ = mid(s$, i, 1)
			if b$ = "<"
				if mid(s$, i+1, 1) = "/"
					openTags.remove()
					exit
				else
					for j = i+1 to L
						c$ = mid(s$, j, 1)
						if c$ = " " or c$ = ">"
							tagFound = 1
							tag$ = mid(s$, i+1, j-i-1)
							e as XML_Element
							e.self.name = tag$
							e.self.parent = openTags[openTags.length]
							elements.insert(e)
							insertedElement = elements.length
							
							if e.self.parent > 0 then elements[e.self.parent].children.insert(insertedElement)
							
							a1 = j+1
							if c$ = " "
								findAtts = 1
							else
								// add this tag to open stack
								openTags.insert(elements.length)
							endif
							exit
						endif
					next j
				endif
			else
				if asc(b$) > 32 and asc(b$) < 127
					elements[openTags[openTags.length]].self.value = s$
					exit
				endif
			endif
		next i

if findAtts = 1
			repeat
				n1 = 0 : n2 = 0
				v1 = 0 : v2 = 0
				insert = 0
				c1$ = ''
				// get attributes
				for i = a1 to L
					c$ = mid(s$, i, 1)
					// start of attribute name
					if c$ <> " " and n1=0 then n1 = i
					if c$ = "=" then n2 = i
					
					if c1$ <> ''
						if c$ = c1$
							v2 = i
							a1 = i+1
							insert = 1
							exit
						endif
					else
						if c$ = "'" or c$ = '"'
							c1$ = c$
							v1 = i+1
						endif
					endif
				next i
				
				if insert = 1
					a as XML_Attribute
					a.key = mid(s$, n1, n2-n1)
					a.value = mid(s$, v1, v2-v1)
					
					elements[elements.length].self.attributes.insert(a)
				endif
			until i >= L
			
			// Check next to last character to see if this is a singleton tag
			c$ = mid(s$, L-1, 1)
			if c$ = "/" or c$ = "?"
				// if singleton, do nothing
			else
				// opening tag, add to stack
				openTags.insert(elements.length)
			endif
			
		endif
	until fileEOF(f)
	closeFile(f)
endfunction elements

+ Code Snippet

//////////////////////////////////////////////////////////////////////
// Title: XML Parser
// Author: Phaelax
// Date: March 4, 2017
//
// Commands:
//    arr[]  = xml_loadDocument(string filename)
//    int    = xml_GetParentId(dom ref as XML_Element[], elementId as integer)
//    int    = xml_GetAttributeCount(dom ref as XML_Element[], elementId as integer)
//    int    = xml_GetAttributeKeyById(dom ref as XML_Element[], elementId as integer, attributeId as integer)
//    int    = xml_GetAttributeValueById(dom ref as XML_Element[], elementId as integer, attributeId as integer)
//    int    = xml_GetTagName(dom ref as XML_Element[], elementId as integer)
//    int    = xml_GetChildCount(dom ref as XML_Element[], elementId as integer)
//    string = xml_GetAttributeValueByName(dom ref as XML_Element[], elementId as integer, att as string))
//    int    = xml_GetAttributeIdByName(dom ref as XML_Element[], elementId as integer, att as string))
//    ar[]   = xml_GetAttributesArray(dom ref as XML_Element[], elementId as integer)
//    string = xml_GetElementValue(dom ref as XML_Element[], elementId as integer)
//    int    = xml_GetChildIdById(dom ref as XML_Element[], elementId as integer, childId as integer)
//    arr[]  = xml_GetRootElementsArray(dom ref as XML_Element[])


SetVirtualResolution(1920,1200)
SetWindowSize(1920, 1200, 0)
UseNewDefaultFonts(1)
SetSyncRate(60, 0 ) 
setPrintSize(20)


Type XML_Attribute
	key as string
	value as string
EndType

Type XML_Tag
	parent as integer
	name as string
	attributes as XML_Attribute[]
	value as string
EndType

Type XML_Element
	self as XML_Tag
	children as integer[-1]
EndType


dom as XML_Element[]

start = GetMilliseconds()
dom = xml_loadDocument("overworld22.tmx")
finish = getMilliseconds()

// Get the root elements of the dom tree
roots as integer[]
roots = xml_GetRootElementsArray(dom)



do
	
	print(finish-start)
	print("")


	for i = 0 to roots.length
		printElement(dom, roots[i], "")
	next i
	

    sync()
loop



function printElement(dom ref as XML_Element[], i as integer, space as string)
	name$  = xml_GetTagName(dom, i)
	parent = xml_GetParentId(dom, i)
	childCount = xml_GetChildCount(dom, i) 
	att as XML_Attribute[]
	att = xml_GetAttributesArray(dom, i)
	value$ = xml_GetElementValue(dom, i)
	printc(space+str(i)+")   "+name$)
	printc("    [")
	for k = 0 to att.length
		printc(att[k].key+"  =  "+att[k].value+" , ")
	next k
	printc("]")
	printc("    (childCount:  "+str(childCount)+")")
	print("        parentID:  "+str(parent))

	for j = 0 to childCount-1
		c = xml_GetChildIdById(dom, i, j)
		printElement(dom, c, space+"            ")
	next j
	print("")
endfunction



//////////////////////////////////////////////////////////////////////
// Return an array of indices of each element that is part of the dom root
//////////////////////////////////////////////////////////////////////
function xml_GetRootElementsArray(dom ref as XML_Element[])
	arr as integer[]
	for i = 0 to dom.length
		if dom[i].self.parent = 0 then arr.insert(i)
	next i
endfunction arr
//////////////////////////////////////////////////////////////////////
// Return the element id of the specified child element
//////////////////////////////////////////////////////////////////////
function xml_GetChildIdById(dom ref as XML_Element[], elementId as integer, childId as integer)
	if childId = -1 then exitfunction -1
endfunction dom[elementId].children[childId]

//////////////////////////////////////////////////////////////////////
// Return the element's value (inner content)
//////////////////////////////////////////////////////////////////////
function xml_GetElementValue(dom ref as XML_Element[], elementId as integer)
endfunction dom[elementId].self.value

//////////////////////////////////////////////////////////////////////
// Return an array of XML_Attribute
//////////////////////////////////////////////////////////////////////
function xml_GetAttributesArray(dom ref as XML_Element[], elementId as integer)
endfunction dom[elementId].self.attributes

//////////////////////////////////////////////////////////////////////
// Returns the parent ID of this element
//////////////////////////////////////////////////////////////////////
function xml_GetParentId(dom ref as XML_Element[], elementId as integer)
endfunction dom[elementId].self.parent

//////////////////////////////////////////////////////////////////////
// Returns the key/name of an attribute
//////////////////////////////////////////////////////////////////////
function xml_GetAttributeKeyById(dom ref as XML_Element[], elementId as integer, attributeId as integer)
endfunction dom[elementId].self.attributes[attributeId].key

//////////////////////////////////////////////////////////////////////
// Returns the value of an attribute given it's ID
//////////////////////////////////////////////////////////////////////
function xml_GetAttributeValueById(dom ref as XML_Element[], elementId as integer, attributeId as integer)
endfunction dom[elementId].self.attributes[attributeId].value

//////////////////////////////////////////////////////////////////////
// Returns the value of an attribute, looked up by the attribute name
//////////////////////////////////////////////////////////////////////
function xml_GetAttributeValueByName(dom ref as XML_Element[], elementId as integer, att as string)
	for i = 0 to dom[elementId].self.attributes.length
		if dom[elementId].self.attributes[i].key = att then exitfunction dom[elementId].self.attributes[i].value
	next i
endfunction ""

//////////////////////////////////////////////////////////////////////
// Returns the ID of an attribute of specified element
//////////////////////////////////////////////////////////////////////
function xml_GetAttributeIdByName(dom ref as XML_Element[], elementId as integer, att as string)
	for i = 0 to dom[elementId].self.attributes.length
		if dom[elementId].self.attributes[i].key = att then exitfunction i
	next i
endfunction -1

//////////////////////////////////////////////////////////////////////
// Return the tag name of this element
//////////////////////////////////////////////////////////////////////
function xml_GetTagName(dom ref as XML_Element[], elementId as integer)
endfunction dom[elementId].self.name

//////////////////////////////////////////////////////////////////////
// Returns the number of attributes for this element
//////////////////////////////////////////////////////////////////////
function xml_GetAttributeCount(dom ref as XML_Element[], elementId as integer)
endfunction dom[elementId].self.attributes.length+1

//////////////////////////////////////////////////////////////////////
// Returns the number of direct children under the specified element
//////////////////////////////////////////////////////////////////////
function xml_GetChildCount(dom ref as XML_Element[], elementId as integer)
endfunction dom[elementId].children.length+1

//////////////////////////////////////////////////
// Returns the index in xml array of the first
// occurrence of 'tag'
// Returns -1 if no match found
//////////////////////////////////////////////////

function xml_FindFirstTag(dom as XML_Element[], tag as string)
	for i = 0 to dom.length
		if dom[i].self.name  = tag then exitfunction i
	next i
endfunction -1

//////////////////////////////////////////////////
// Loads an XML file into a DOM-like structure
// and returns it as an array of XML_Element
//////////////////////////////////////////////////
function xml_loadDocument(file as string)
	elements as XML_Element[]

	openTags as integer[0]

	f = openToRead(file)


	q = 0
	repeat
		inc q
		// find tag
		s$ = readLine(f)
		L = len(s$)
		a1 = 0
		findAtts = 0
		tagFound = 0
		for i = 1 to L
			// tag opening
			b$ = mid(s$, i, 1)
			if b$ = "<"
				if mid(s$, i+1, 1) = "/"
					openTags.remove()
					exit
				else
					for j = i+1 to L
						c$ = mid(s$, j, 1)
						if c$ = " " or c$ = ">"
							tagFound = 1
							tag$ = mid(s$, i+1, j-i-1)
							e as XML_Element
							e.self.name = tag$
							e.self.parent = openTags[openTags.length]
							elements.insert(e)
							insertedElement = elements.length
							
							if e.self.parent > 0 then elements[e.self.parent].children.insert(insertedElement)
							
							a1 = j+1
							if c$ = " "
								findAtts = 1
							else
								// add this tag to open stack
								openTags.insert(elements.length)
							endif
							exit
						endif
					next j
				endif
			else
				if asc(b$) > 32 and asc(b$) < 127
					elements[openTags[openTags.length]].self.value = s$
					exit
				endif
			endif
		next i


		if findAtts = 1
			repeat
				n1 = 0 : n2 = 0
				v1 = 0 : v2 = 0
				insert = 0
				c1$ = ''
				// get attributes
				for i = a1 to L
					c$ = mid(s$, i, 1)
					// start of attribute name
					if c$ <> " " and n1=0 then n1 = i
					if c$ = "=" then n2 = i
					
					if c1$ <> ''
						if c$ = c1$
							v2 = i
							a1 = i+1
							insert = 1
							exit
						endif
					else
						if c$ = "'" or c$ = '"'
							c1$ = c$
							v1 = i+1
						endif
					endif
				next i
				
				if insert = 1
					a as XML_Attribute
					a.key = mid(s$, n1, n2-n1)
					a.value = mid(s$, v1, v2-v1)
					
					elements[elements.length].self.attributes.insert(a)
				endif
			until i >= L
			
			// Check next to last character to see if this is a singleton tag
			c$ = mid(s$, L-1, 1)
			if c$ = "/" or c$ = "?"
				// if singleton, do nothing
			else
				// opening tag, add to stack
				openTags.insert(elements.length)
			endif
			
		endif
	until fileEOF(f)
	closeFile(f)
endfunction elements

"I like offending people, because I think people who get offended should be offended." - Linus Torvalds

Attachments

Login to view attachments

Back to top

Profile PM Email Website

Sorry your browser is not supported!

AppGameKit Classic Chat / XML Parser for AGK v.2

Attachments