I’ve bolded up the part of the code which I think is causing problems. Basically the code is required to search through a String and identify any links. This part of the code appears to work as expected.
However, my problem starts when the program searches through the LinkedList “visited” and tries to find any links that have been found previously. If it is not found the program proceeds to add them to the stack “search”. But the program does not seem to be doing the searching properly, and as a result the same links are added several times.
Has anyone got any ideas on what’s causing the problem?
thanks.
public void findLinks(String page, String domain) {
//parts of this method have been modifyed from the Webcrawler,
//PageVisitor.java lines 133-167
int lastPosition = 0; //position of "http:" substring in page
int endOfURL; //pos of end of http://........
String link; //the link we're after
while(lastPosition != -1 ) {
boolean found = false;
lastPosition = page.indexOf("http://" + domain, lastPosition);
if (lastPosition != -1) {
endOfURL = page.indexOf("\"", lastPosition + 1 );
//extract found hypertext link
link = page.substring(lastPosition, endOfURL);
link = link.trim();
if (link.endsWith("\"")) {
link = link.substring(0, link.length() - 1 );
}
//ignore refereces
if(link.indexOf("#") != -1) {
link = link.substring(0, link.indexOf("#"));
}
//ignore properties
if(link.indexOf("?") != -1) {
link = link.substring(0, link.indexOf("?"));
}
//discard links which point explicitly to images
if(link.endsWith(".gif") || link.endsWith(".jpg")
|| link.endsWith(".png") || link.endsWith(".ico")
|| link.endsWith(".bmp") || link.endsWith(".ief")
|| link.endsWith(".jpeg") || link.endsWith(".tiff")
|| link.endsWith(".css")) {
;
}
else { //collect all others
//my code to trim domain name and http:// from string
link = link.substring(domain.length()+7, link.length());
[b]//search "visited"
int size = visited.size();
for(int i = 0; i < size; i++) {
if(visited.get(i) == link) {
found = true;
break;
}
else {
found = false;
}
}[/b]
if(found == false) {
search.push(new String(link));
visited.add(new String(link));
}
}
lastPosition++; //skip current link
}
}
} //end findLinks method