Mochabot log - CommonJS IRC channel: #commonjs on irc.freenode.net

2010-02-27:

[2:06] <inimino> ashb: the streaming version does that, but there's some work yet remaining before that is usable
[2:07] <inimino> ashb: the problem is that because a PEG allows unlimited lookahead, it is in the general case undecidable whether you can emit parse nodes safely as you go or not
[2:09] <inimino> ashb: so you either have to have a "whoops that whole branch failed, back up to pos 42" event that you can emit, which makes it hard for the consumer, or you have to do some hairy analysis on the grammar to find out which parts are safe to emit, or you have to have the user provide 'commit points' at which events can be emitted
[2:09] <inimino> naturally I chose the "hairy analysis" path
[2:10] <MisterN> inimino: this reminds me of one of my earlier programming projects where i had a misguided attempt at an "object-oriented" backtracking parser :D
[2:10] <inimino> I'm not sure my feeble mind can fathom what that would be :)
[2:11] <inimino> productions are objects?
[2:11] <MisterN> if i remember correctly
[2:13] <MisterN> damn i have no clue what my code from back then means
[2:14] <inimino> hehe
[2:16] <inimino> WTFs per minute
[2:30] <MisterN> inimino: it's worse than that.
[2:31] <MisterN> ooh i think i understand part of the code now
[2:32] <MisterN> inimino: i think that's the code that made me wary of event-driven designs
[2:33] <inimino> ah, I can see how that would happen
[2:33] <inimino> it's probably dangerous to extrapolate much from parsers though, they are kind of a special case
[2:34] <MisterN> inimino: well the thing is much of event-driven stuff is often parsing
[2:35] <inimino> people do like SAX and such
[2:35] <inimino> but the parser itself isn't sending around a lot of events internally, just emitting them, is that the way your works?
[2:35] <inimino> or was events all the way down?
[2:36] <MisterN> inimino: it was receiving events from a lower-level parser
[2:36] <MisterN> essentially it was a high-level parser on top of something similar to SAX
[2:36] <inimino> oh, like a separate lexer?
[2:36] <MisterN> yes
[2:36] <inimino> ah, ok
[2:37] <MisterN> maybe that was a mistake i don't know :D
[2:37] <inimino> you could build a classic shift-reduce LR parser on top of that
[2:38] <inimino> would probably work fine
[2:38] <MisterN> i just hacked it. i was also younger back then :P
[2:38] <inimino> hehe :)
[2:38] <MisterN> the code is here if you care: http://github.com/MrN/GOTT/tree/net.sourceforge.gott/gott/tdl/schema/
[2:40] <inimino> epic name :)
[2:40] <MisterN> heh. we used to have an irc channel and once in a while somebody would come ask about god
[2:41] <MisterN> the whole project was an epic failure. i hope i learned from it..
[2:41] <inimino> I'm sure
[2:42] <inimino> never trust a programmer without a few flaming failures under their belt
[2:42] <MisterN> hah
[2:44] <MisterN> inimino: hmm i guess the parser was "designed" to be more powerful than shift-reduce. it was fully backtracking
[2:46] <inimino> MisterN: ah, sounds cool, kind of PEG-like then
[2:46] <MisterN> i remember printing out tens of pages with printf-debugging output
[2:47] <MisterN> i guess you can imagine how this looks for a parser
[2:48] <inimino> oh, I don't have to
[2:48] <inimino> I was just doing that :-)
[2:48] <inimino> http://inimino.org/~inimino/images/screenshots/2010-02-19_3.png
[2:48] <inimino> a few days ago :P
[2:49] <inimino> it works best on small inputs...
[2:49] <MisterN> i feel your pain!
[2:50] <MisterN> no idea what the weird numbers mean :D
[2:51] <MisterN> also: tiling wm?
[2:51] <inimino> I do use a tiling wm
[2:51] <inimino> but that's all one browser window
[2:52] <inimino> code is on the left, output on the right
[2:52] <MisterN> oh you run it in the browser. why not
[2:52] <MisterN> you could also run it in a v8 shell i suppose
[2:52] <inimino> yeah, it is running on node :-)
[2:52] <inimino> well, the parsing happens in the browser, but the server is node
[2:53] <MisterN> don't trust the client to do anything that matters to anybody else :)
[2:54] <inimino> I'll try Chromium for a while if I can ever get it to compile...
[2:54] <MisterN> i have chrome beta :D
[2:54] <inimino> on Linux?
[2:55] <MisterN> yup
[2:55] <MisterN> but firefox is the main browser
[2:55] <inimino> hm...
[2:55] <inimino> yeah, that screenshot is Fx
[2:55] <inimino> but I need to use some others for a while
[2:56] <inimino> to make sure I am catching various bugs
[2:56] <inimino> parsers, and generated code, have a way of finding edge cases
[2:56] <MisterN> sure, use all engines
[2:57] <MisterN> how about testing it on node too?
[2:57] <inimino> yeah, that would be good
[2:59] <inimino> time for some food, bbl
[9:39] <Dantman> ^_^ MongoDB's tailable cursors look fun...
[10:00] <Dantman> Heh, love MongoDB's idea of "limits" to the system... You're "limited" by default to about 24,000 namespaces per-database (namespace = # of collections + # of indexes including implied _id index)... ^_^ And if that's not high enough you can use --nssize to make that any higher value you want... *snicker*
[10:03] <Dantman> I suppose it's a limitation if you write a bad application that creates new collections for separate user data instead of grouping data by type of data and using an id to separate data.
[10:12] <Dantman> O_O WHaaat!!!... MongoDB supports indexing and querying geospatial data... it's new and alpha... but, still, wow!
[10:43] <Dantman> Too bad browsers don't send a md5 of files they post. If they did I could use that in my upload system to avoid re-uploading data we already have.
[10:49] <ondras> use the File api to compute md5 of file in inputfield and compare before starting upload :)
[10:50] <ondras> http://hacks.mozilla.org/2009/12/w3c-fileapi-in-firefox-3-6/
[10:53] <Dantman> Meh, I don't feel like starting an alternate file upload just yet...
[13:39] * Dantman wishes there was a tool to do code repo calculation like what ohloh does directly on git repos instead of needing to point the service to those repositories...
[13:41] <ashb> you cna download ohcount
[13:41] <ashb> http://github.com/mxcl/homebrew/blob/master/Library/Formula/ohcount.rb
[13:41] <Dantman> Ohh, ^^ nice...
[13:41] <Dantman> I wanted to see what kind of numbers I got from the private repo at work for fun...
[13:42] <Dantman> The code is getting massive...
[13:42] <Dantman> Hmm... then again, I'm going to end up getting false numbers... we have 3rd party libraries included in the repo.
[13:44] <Dantman> I'm on yet another fun task... I'm taking colliding filenames and scanning for files in mongo based on a regex...; You know, the old "filename (1).ext" pattern.
[13:45] <Dantman> ^_^ I'm basically going to be creating a JavaScript RegExp with my RegExp.escape and sending it to MongoDB instead of using it myself.
[13:46] <ashb> you're using gridfs?
[13:46] <Dantman> No, just mongodb
[13:46] <Dantman> We have file uploads in the system so file metadata for sites is inside the database... File data gets stored on the filesystem.
[13:47] <ashb> any redundancy on the files?
[13:47] <ashb> i.e. backups, replicants?
[13:47] <Dantman> Will in production
[13:47] <ashb> good good.
[13:49] <Dantman> I was considering using gridfs, however I want to avoid using sharding for as long as possible (I'm using a feature or two currently that doesn't work with sharding enabled, and I feel like sticking with master-slave for awhile)... And since files are exponentially larger than the plain docs used for site,user,etc... data using gridfs will bring the day we'd need to use sharding to be able to scale exponentially closer I decided to opt instead for stori
[13:49] <Dantman> ng files inside of a distributed filesystem.
[13:49] <ashb> you might want to look at mogileFS in that case
[13:51] <Dantman> Initially I discarded that one as an option because I didn't like the idea of it using HTTP and MySQL...
[13:52] <Dantman> Though, I'm currently looking at XtreemFS which I believe technically does too use HTTP a bit, so I might reevaluate that...
[13:52] <ashb> i've used mogilfeFS on a project for canon europe before
[13:52] <ashb> it works really well
[13:53] <Dantman> I kindof like the idea of using a system that is actually mounted on the filesystem so I can use normal fileio instead of being forced to write extra code just to support the filesystem.
[13:54] <Dantman> CDN integration is already going to be extra work.
[13:54] <ashb> depends what you are doing with the files
[13:56] <Dantman> Heck, I might want to include the data in rsync based backups.
[13:56] <ashb> mogilefs 1) backs itself up, and 2) the data is on the disk anyway
[13:56] <ashb> (backs it selfup = relicates stuff
[13:57] <ashb> or was that done by the layer we stuck on top of it
[13:57] <Dantman> Local backups are always good even when you have data stored remotely in multiple places.
[13:57] <ashb> i can't remember
[13:57] <ashb> shargind/hot-failover = local backup
[13:57] <ashb> *sharding
[14:03] <Dantman> Hmmm, http://groups.google.com/group/xtreemfs/browse_thread/thread/b3fc7efef186900b/83303c10fcdf2947
[14:05] <ashb> we certainly didn't have those sort of problems
[14:09] <Dantman> Hmm... I think I do have a need for non-application level stuff now that I think about it...
[14:13] <Dantman> Sure, right now the current stuff I'm doing is nothing but storing data associated with a md5 hash based filename (ie: Storing e/ea/ea9fe18503c35c35f3c485b8e61540 with the file data we calculated the hash from)
[14:13] <ashb> you want to be careful about collisions
[14:13] <ashb> use the file lenght too
[14:13] <ashb> tho even then i'm not sure thats enough
[14:14] <ashb> (if someone wants to be malicious
[14:15] <Dantman> However on the roadmap is special systems doing image thumbnailing, breaking images up, and even more transcoding video files is planned.
[14:15] <Dantman> And those will require external programs that only work with the filesystem to access that data.
[14:15] <Dantman> Hmm... md5 is really that bad...
[14:16] <ashb> we did all of htat on top of mogileFS :)
[14:17] <Dantman> I just started using md5 because I was debating using GridFS and GridFS is using md5...
[14:20] <ashb> you should check if md5+length is still vuln to those attacks then
[14:20] <ashb> (if you care about hte attacks at all. depending on what oyu do it might not matter)
[14:22] <Dantman> In Kommonwealth v2 we actually used sha1 instead.
[14:23] <ashb> yeah sha is better
[14:24] <Dantman> ^_^ And (partially because I found the thought fun)... our salted passwords are being hashed with SHA-512 in Kommonwealth v3.
[14:24] <Dantman> *snicker*
[14:25] <Dantman> ;) We're not going to need to update our password hashing system for ages... heh
[14:25] <Dantman> ((The ironic part though, is username/password logins are going to end up depreciated in a way...))
[14:26] <Dantman> We'll support them... but eventually preference will be for users to login using OpenId, Facebook, Google, Twitter, or any of many other 3rd party accounts supporting OpenId or OAuth.
[14:27] <ashb> facebook and twitter aren't openid providers are they?
[14:27] <Dantman> Facebook has Facebook Connect, and Twitter uses OAuth.
[14:29] <Dantman> We intend to integrate 3rd party services so that special jits can integrate external functionality... so using the OAuth that would allow that integration as a login identity makes sense.
[14:29] <Dantman> It's almost like what Disquis is doing.
[14:32] <Dantman> Think, logging into Kommonwealth using your Yahoo/Flickr account and using that same identity to make your Flickr photos available to use as a photo gallery on your website.
[14:57] <Dantman> ashb, How does MogileFS handle adding new storage nodes? ie: What kind of stuff is involved in adding a new server, tasks, latency, things that might need to be restarted.
[14:59] <Dantman> Also, that no-single-point-of-failure... Is that just master-slave replication, or does it support something so that writes will still work even if the masters die?
[15:00] <ashb> can't recall exactly
[15:00] <ashb> i'm fairly sure its hot replication
[15:02] <Dantman> *sigh* YouTube is absolutely broken on my laptop...
[15:02] <ashb> s/ on my.*/
[15:03] <ashb> its YouTube. it is the epitomy of retardation on the internet
[15:03] <Dantman> Stupid Google... Crunchyroll's custom Flash video player has better support for my 64bit Linux Flash than your video player.
[15:05] <Dantman> Heh, whoops, I just started typing xtreem with my Japanese keyboard on my iPhone.
[16:15] <Dantman> I've been reading through the whole docs for both XtreemFS and MogileFS... I think I'm going to go with XtreemFS.
[16:16] <ashb> its your call. I've never used Xtreemfs so can't comment on it
[16:38] <Dantman> Looking at MogileFS I do see database and file replication... However I don't see fallover. Trackers are configured to use static master SQL databases, and you have to make sure that all trackers point to the same database.
[16:38] <ashb> wonder what i did before then
[16:38] <ashb> cos i'm fairly certain we had hot failover
[16:38] <ashb> but if Xtreemfs does it out of the box great
[16:39] <Dantman> XtreemFS doesn't do it yet, atm the Directory server and Metadata servers are single points of failure
[16:39] <Dantman> However eliminating that is planned to be finished within Q1/2010
[16:40] <ashb> so it'll be done by September if you're lucky then :)
[16:40] <Dantman> ;) I won't be in production for a good amount of time, and won't be needing something that reliable for awhile too.
[16:41] <Dantman> Heck, I'm relying on alpha MongoDB features, and can't wait for replica sets
[16:42] <Dantman> But yes, the planned setup for XtreemFS has automatic fallover... MRC and DIR servers will work as master-slave, and if the master dies a new master will be elected.
[16:43] <Dantman> And that kind of thing already exists for the OSD servers
[16:45] <Dantman> It also has file striping and selection of OSDs based on geographic location
[16:47] <Dantman> And because of the way I store files I have no need for read/write replication, so the fact that read/write replication isn't planned till 2.x doesn't matter to me... I only need the read-only replication (ie: Files are replicated after you mark them read-only).
[16:48] <Dantman> I just use normal filesystem commands to stream the data into a file in the mounted volume and when finished mark it read-only
[16:50] <Dantman> The web based console (with maps even) looks very useful too.
[18:31] <Dantman> ashb, btw, using a SQL database to store file metadata also smells to me... XtreemFS is using a key/value store they built optimized for storing filesystem related metadata.
[18:31] <ashb> don't know what oyu mean. mysql isn't a database ;)
[18:31] <ashb> its a perfectly good key/value store tho
[18:31] <ashb> that's about all it *is* good at
[18:31] <ashb> cos if you treat it like its a database it will jump up and bite you in the ass
[18:32] <ashb> Dantman: slightly more seriously
[18:32] <ashb> livejournal runs on mogilfes
[18:33] <ashb> i think it can cope with the load
[18:33] <ashb> the same guy(s) are also responsible for memcached
[18:33] <ashb> I think they know a thing or two about scalability and stability
[18:34] <Dantman> Meh, still different than what there is to know about filesystems.
[18:34] <ashb> sure - its not right for everyone
[18:34] <ashb> but it certainly is fast and scalable
[18:37] <Dantman> Heh, the XtreemFS guys are actually also developing a OS... Well, a Linux-based OS optimized for grid usage.
[18:38] <ashb> see i'd trust that *less* personally
[18:38] <Dantman> Meh, least they have good docs.
[18:39] <ashb> :)
[18:41] <Dantman> Looked through basically the entire MogileFS wiki... Scattered info, stuff obviously written by someone who didn't actually develop the system just making guesses on how it works from an end user perspective, and a strong sense that there was a good deal of information missing.
[18:41] <ashb> no they developed it, it was just opensourced as an afterthought
[18:42] <Dantman> Nah, I mean half the docs were likely not written by someone who knew the actual internals of the system.
[18:42] <ashb> who knows. perhaps
[18:43] <Dantman> ((heh, not that it's a high priority for any open-source project... ;) just look at MediaWiki's docs... devs don't like writing documentation))
[18:43] <ashb> yeah but they are special - they like writing php ;)
[18:43] <Dantman> heh... ;) Don't be too sure about that...
[18:44] <Dantman> I was a core dev.
[18:45] <Dantman> Programming MediaWiki made me grow into a PHP Guru..... Where PHP Guru = Person who has programmed in PHP enough to understand it almost inside out and realize how bad it is and... In other words PHP Guru = Expert at PHP who has grown to hate the language...
[18:45] <ashb> heh
[18:45] <Dantman> ;) And there are a fair bit of other "PHP Gurus" developing MW. *snicker*
[18:47] <Dantman> And one of the horrible parts of the language is you are absolutely and unavoidably locked into a cgi like model where you have to do init on every request... And MediaWiki's init is very heavy... There's a huge ammount of work just trying to avoid all that init.
[18:48] <ashb> surely it has socket method?
[18:48] <ashb> you could in theory write a FastCGI deamon in php, no?
[18:48] <ashb> i mean it wouldn't be as easy to deploy then
[18:48] <Dantman> And unfortunately one of the key reasons to use php is so you can deploy it just about anywhere
[18:49] <ashb> yup
[18:49] <Dantman> If MW didn't use PHP and instead used a language with something like Rack/WSGI/JSGI/PSGI the message system could be loaded just once... Completely eliminating a huge pile of overhead, and also eliminating a number of ugly parts of MW caused by the need to optimize the message system.
[18:50] <Dantman> ;) It also wouldn't need to implement the job que as something that works in requests... it would instead be a nice dedicated thread.
[18:50] <ashb> :)
[18:51] <Dantman> ;) Like my gc worker in Kommonwealth v3, the snapshot queue, and so on...
[18:51] <Dantman> I could combine the queues... but it's just easier to separate them.
[18:51] <Dantman> And I can dedicate servers to certain tasks.
[18:52] <Dantman> ie: Dedicate a high ram/cpu server to doing video transcoding while leaving simple snapshots to lighter servers.
[18:53] <Dantman> And gc has no need to be run on any more than one server... while queues could be useful running in multiple instances to allow mulitple jobs to be done simultaniously.
[18:54] <Dantman> Now if Java only had a isReadOnly method
[18:54] <Dantman> canRead isn't quite what I want... I want to know if I marked a file with setReadOnly()
[18:55] * ashb hants Dantman JNI ;)
[18:56] <Dantman> ^_^ I don't even know what Java does on a POSIX system with setReadOnly()
[18:59] <Dantman> I suppose I shouldn't worry about it when I don't even know if setReadOnly() sets a file readonly in a way that it'll make XtreemFS replicate it.

 

 

Logs by date :