Java LinkedList / Stack problem

I’ve bolded up the part of the code which I think is causing problems. Basically the code is required to search through a String and identify any links. This part of the code appears to work as expected.

However, my problem starts when the program searches through the LinkedList “visited” and tries to find any links that have been found previously. If it is not found the program proceeds to add them to the stack “search”. But the program does not seem to be doing the searching properly, and as a result the same links are added several times.

Has anyone got any ideas on what’s causing the problem?
thanks.

public void findLinks(String page, String domain) {
	//parts of this method have been modifyed from the Webcrawler,
	//PageVisitor.java lines 133-167
		
	int lastPosition = 0; //position of "http:" substring in page
	int endOfURL; //pos of end of http://........
	String link; //the link we're after
		
	while(lastPosition != -1 ) {
		boolean found = false;
		lastPosition = page.indexOf("http://" + domain, lastPosition);
		if (lastPosition != -1) {
			endOfURL = page.indexOf("\"", lastPosition + 1 );

			//extract found hypertext link
			link = page.substring(lastPosition, endOfURL);
			link = link.trim();
			if (link.endsWith("\"")) {
				link = link.substring(0, link.length() - 1 );
			}

			//ignore refereces
			if(link.indexOf("#") != -1) {
				link = link.substring(0, link.indexOf("#"));
			}
				
			//ignore properties
			if(link.indexOf("?") != -1) {
				link = link.substring(0, link.indexOf("?"));
			}
				
			//discard links which point explicitly to images
			if(link.endsWith(".gif") || link.endsWith(".jpg")
			|| link.endsWith(".png") || link.endsWith(".ico")
			|| link.endsWith(".bmp") || link.endsWith(".ief")
			|| link.endsWith(".jpeg") || link.endsWith(".tiff")
			|| link.endsWith(".css")) {
				;
			}
				
			else { //collect all others
				//my code to trim domain name and http:// from string
				link = link.substring(domain.length()+7, link.length());
					
				[b]//search "visited"
				int size = visited.size();
				for(int i = 0; i < size; i++) {
					if(visited.get(i) == link) {
						found = true;
						break;
					}
					else {
						found = false;
					}
				}[/b]
				
				if(found == false) {
					search.push(new String(link));
					visited.add(new String(link));
				}
			}

			lastPosition++; //skip current link
		}
	}
} //end findLinks method

is size not a built in function?
i cant remmebr alot of java, i only skimmed the surface of it… but thought if u maybe try another name for the varible??

Ok I stopped coding after my IT studies, not a clue what language its in
or how the for loops are processed.

But to me (I could be wrong)

You for loop appears to be trying to set a local/global variable to zero
at the start, hopefully i’m wrong but if it is then surely the i <size will alway’s
be 0 < size ???

I could be talking tripe though :confused:

It’s written in Java 5 (or 1.5). The for loops can be constructed like this, basically the structure of this for loop is:
for(start condition; loop until condition; counter increment).

Sorry, but I don’t think its that, as I’ve been constructing for loops in this manor since I started working with Java (Java only feature I think).

nope c/c++ does it the same way, and I would assume c# does too

though you can shave some speed off by changing the loop slightly


found = false;
for(int i = 0; i < size; i++) 
{
    if(visited.get(i).compareTo(link) == 0) 
    {
        found = true;
        break;
    }
}

you get less assigmnent calls

EDIT ahh there an error.

if(visited.get(i) == link) should not work

String is an object not a primitive.

if(visited.get(i).compareTo(link) == 0)

or

if(visited.get(i).equals(link))

is what you need

Thanks, I’d not seen the method before. It seems to have fixed my problem. :thumbsup:

The is a Java programmer’s best friend.

http://java.sun.com/j2se/1.5.0/docs/api/
http://java.sun.com/j2se/1.5.0/docs/api/java/lang/String.html

Yer, I already knew that existed, but I’m still getting to grips with it, which is probably why I couldn’t find it in the first place :wink:

But thanks anyway.