Mochabot log - CommonJS IRC channel: #commonjs on irc.freenode.net

2010-07-11:

[1:15] <Wes-> Dantman: How expensive is bandwidth over there? I want to mirror the wiki ("functional backup") - but without Last-Modified headers, it's going to be a VERY arduous re-crawl
[1:15] <Wes-> and the one I started this morning was mysteriously killed, grrrr
[1:16] <Dantman> Wes-, rsync xml dumps?
[1:16] <Wes-> Dantman: xml dumps aren't helpful, very hard to read
[1:16] <Wes-> Just want HTML content
[1:20] <Dantman> Hmm... now that I think about it... isn't the import script intelligent?
[1:32] <Dantman> Wes-, sounds like MediaWiki's import functionality should be smart enough... You should be able to create a simple local MW install, and on a daily basis rsync the .xml dump from it, import that into the install, and dump html from there.
[1:34] <Dantman> I wonder which would be more bw efficient though... .xml rsync, or rsync of a .html dump I make
[1:39] <Dantman> n/m... perhaps it's not intelligent
[1:39] <Dantman> *sigh* Maybe I should do a .html dump alongside the .xml dumps
[1:44] <Dantman> Ok, n/m again... it IS intelligent, importing .xml dumps will work
[1:48] <Dantman> Ugh... zumbrunn you know that change in dns for just commonjs.org made the rsync line I gave to everyone invalid
[1:54] <Dantman> Wes-, rsync -r wiki.commonjs.org::commonjs/htmldump/ .
[1:59] <Dantman> Wes-, I added html dumping to the cron job, you can rsync it daily... you should probably rsync a full.xml with it too just for backup purposes...
[2:01] <Dantman> ^_^ rsync uses an efficient diff algo, so last-modifieds don't matter
[4:09] * Dantman gives up on a random idea that probably wouldn't be worth implementing...
[4:09] <Dantman> ... heh, I was thinking of making it so that searching on the wiki would return results from both the wiki and the mailing list.
[5:19] <- *Wyverald* h
[5:19] <- *Wyverald* help
[12:45] <Wes-> dantman: skipping non-regular file "images/thumb/4/4d/AP_CommonJS.jpg/120px-AP_CommonJS.jpg"
[12:46] <Wes-> dantman: for all the images, etc, during rsync -- any idea what's up with that? Are they maybe links on your end?
[12:49] <Wes-> dantman: That htmldump is good stuff, though!!!!
[12:50] <Wes-> very few links need re-writing locally to work
[12:50] <Wes-> Oh, hm, local re-writes won't necessarily be kind to rsync
[12:51] <Wes-> Oh scratch the "very few" part
[12:54] * Wes- gets scripting
[13:54] <Wes-> dantman: http://wiki.commonjs.org.mirrors.page.ca/
[13:55] <Wes-> dantman: http://wiki.commonjs.org.mirrors.page.ca/update_cjs_wiki.sh
[13:59] <Wes-> WTF? Where did dan go?

 

 

Logs by date :