# checker for broken links on html pages



## bigearsbilly (Jul 27, 2009)

Can anyone recommend an open source checker to find
stale, broken links on html pages?

I've tried:
*
linkcheck-1.4: Checks a web site for bad links
linkchecker-5.0.1: Check HTML documents for broken links
linklint-2.3.6.d: Perl script that checks links on web sites
*

and _none_ of them work.


----------



## bb (Jul 27, 2009)

Depends on how intelligent it should be. You can find absolute links very easily, and check them with curl (print all failed links):


```
grep -Eo -e 'https?://[^"[:space:]]*' input.html | sort -u |\
while read u; do curl -sfI "$u" > /dev/null || echo "$u"; done
```

Or with csh:


```
foreach u (`grep -Eo -e 'https?://[^"[:space:]]*' input.html | sort -u`)
curl -sfI "$u" > /dev/null || echo "$u"
end
```

If you want to check relative links as well, you'll need a more sophisticated tool that you can point to a pages URL, so that it can resolve the relative links like a browser.


----------



## bigearsbilly (Jul 27, 2009)

yes well.
I downloaded a web site using wget so the links are  
rewritten as relative.
but, strangely, wget missed some links, dunno why it looks simple enough. 
So I do actually need more of a relative link checker.

I'm writing one myself now.


----------

