Saturday, May 24, 2008

Erlang, distribution & bspawner

Sriram Krishnan was enquiring about distributed programming tools and all things distributed. No doubt, the first thing that came into my mind was "erlang", and taking into consideration that he was inviting feedback & his recent work on cacheman and on server /app performace i thought it would be a good time to talk about bspawner - my own pet project among other things erly.

Load Balancing & Introducing bspawner

Erlang nodes can communicate over the network if they find each other. You can send a message to any process (differ from OS process) through its Pid – even if the process is on another node.

Although you don't have to know where processes are, it is still up to the erlang programmer to decide which node initially spawns a task on which node as there is no inherent way to make use of multiple nodes , even though they recognize each other & can message-pass between them.

bspawner is a project i've open sourced - that attempts to load-balance the task of spawning across multiple nodes . The steps involved can be isolated into a couple of distinct problems.

  1. deciding which node needs to spawn a task
  2. communicating across these nodes
  3. maintaining a record of nodes, added /removed nodes ,etc

This project in its essence, deals with the first part and the implementation of the message passing begins with the "messenger.erl" sample program and modifiedto handle the intended message-passing, load-balancing and node-information

Cheers to the feedback, suggestions , comments, and further changes inspired by the growing involvement of the erlang community & encouragement from #erlang in particular.

However, since the project is still in its early stages - and although it was a fantastic learning experience - I learned that Erlang already had inbuilt load-balancing module that I quote:

pool can be used to run a set of Erlang nodes as a pool of computational processors. It is organized as a master and a set of slave nodes and includes the following features:

  • The slave nodes send regular reports to the master about their current load.
  • Queries can be sent to the master to determine which node will have the least load.
How cool can it get! Since it's even build in a master-slave basis - when one worker goes down, restart strategies can be configured courtesy of Erlang's OTP supervisor and behaviours. this basically allows you to horizontally scale and have distribute processing among box's. I'm even wondering to test this setup at hover.in on weekends by maybe doing some bizarrely wild clustered processing job like ...
  • finding the largest prime number in Pune maybe ? :D
  • any thing from distributing cron jobs to non-blocking or blocking I/O to ... free your mind!

Testing & Test suites

I'm not much into the whole test suite setup as yet, but there a good test suite that is widely adopted would be

  • EUnit - a Lightweight Unit Testing Framework for Erlang
  • An excellent error report evaluation, testing and debugging paper can also be found here (although a little old '92 , still gives valuable info on general practices)

Distributed Monitoring & Debugging

Regarding utilities for debugging & profiling ,etc apart from user-contributed packages, and there are loads of them on process-one, trapexit , jungerl, google code ,etc - I thought I would list few of the interesting utilties . ( see more on the left side of the documentation at erlang.org under Tool Applications )

  • appmon — a graphical utility to observe and manipulate supervisiontrees.
  • debugger — an Erlang source code debugger.
  • erl_interface — a set of libraries for communicating with distributed Erlang nodes.
  • et — the event tracer and tools to record and give a graphical presentation of event data.
  • eva — the “event and alarm” handling application.
  • observer — tools for tracing and observing the behaviour of a distributed system.
  • os_mon — a tool to monitor resource usage in the external operating system.
  • pman — a graphic tool to inspect the state of the system, at local or remote Erlang nodes.
  • runtime_tools — miscellaneous small routines needed in the runtime system.
  • toolbar — a graphical toolbar from which applications can be started.
  • webtool — a system for managing web-based tools (such as inets)
  • tools — a package of stand-alone applications for analysing and monitoring Erlang programs. This includes tools for profiling, coverage analysis, cross reference analysis etc.

These go hand in hand with other utilities like the distributed database also written in erlang called Mnesia, or other open-source erlang implementations of everything from bloom filters and decisssion trees , to bayeux protocol-comet servers and cron jobs.

Preserving State in Datastrutctures, Process's or Servers

Behaviours are templates or formalizations of common design patterns. The three inbuilt behaviours are gen_server (client-server paradigms), gen_event (event-driven paradigms) and gen_fsm ( finite state machine paradigms). In addition you can create your own behaviours and have modules implement them. (Emacs in erlang mode, even gives nice skeletons for all behaviours, common design patterns)

You can also have stateless or stateful process's which can run indefintely, or be blocking in nature or exit after a timeout. Which really opens up a whole lot of possibilities, but again - since the key is in message passing - regardless of the process's being on your local node or half way across the globe.


To think about ....

  • load- balancing two or multiple yaws servers each of which are capable of handling 80,000 parallel connections each.

  • Having erlang communicate via a port to say a python or perl for doing abstracting cross-language functionaliry ( Facebook Chat seems to be the largest xmpp-erlang based web application , and communicates with c++ for logging)
So there you have, I'm still a beginner in Erlang - but it's pretty evident to see the kind of flexibility to mix and match with the OS and other languages, the inherent encouragement towards distribute and concurrent programming across multiple nodes, the facets of a functional programming language , as well as an active and growing developer community.


Keep Clicking,
Bhasker V Kode

Tuesday, May 20, 2008

Updates on Identities on the web

Some thoughts on identities and their related data , online accounts and related news over the past few weeks.
  1. The Data Portability group amidst trying to get the big-wigs to come together and discuss how to share user data ,gets warned twice about infringement related cases in its own logo. But otherwise they've been pretty busy, with several announcements with the right intention, but left to too many influential market dynamics and vested invterests ( as illustrated in the points below)

  2. OpenID, seems to keep itself out of trouble, and with a bunch of active adopters and hackers working together - there seems to be hope after all . Still don't know how long before the authentication itself can be made a asynchronous call, rather than the multi-step process right now. A ycombinator startup called clickPass seems to have some traction in its favour.

  3. Google 's Orkut makes the news with the indian who was jailed for commenting on a particular politician ( im not touching this with a 10 foot pole! :D ) . Rumours from the local Pune papers, also suggest that girifthar'ed the wrong guy! ( that's right , newspapers in Pune report more rumours and Bollywood gossip that actual news)

  4. And as if offering for your product for FREE was'nt good enough, early adopters are ploughing away in a bid to yank their data out as well . Thoughts on who owns your data within the facebook network, have brought in highly-engrossing discussions as well. Its your id, your profile, your pics, on their servers , powered by their cash and their vc's credibility. If the beacon showed what 3rd party developers could do with your data, the latest trend in 'yanking the data out' could give as unpredictable and "no-one-strategy-suits-anyone" results. ( It might also be interesting to check out the school talk / DHH talk on how to make money online - charging your customers works! )

  5. What do you do when a startup's product becomes so popular that people get inspired enought to want to take their data elsewhere or distribute it as well! Thats been the case with Twitter and several comments on how to de-centralise twitter.

  6. FaceBook and Google not hitting it off with the launch of Google FriendConnect, with Scoble 's insights into how Microsoft wants to keep the web closed, and how FriendFeed is trying to filter signal from the noise. Most users are used to the noise btw( when was the last time you went to cnn or bbc VS a twitter or a valleywag) . Facebook also makes news in the #erlang channels btw, for probably becoming possibly the largest erlang powered chat app. Will be interesting to see what technique they're using in handling unicode - something that hover.in is working to integrate as well .

  7. Zoho announcing that you can now login to Zoho with your google or yahoo accounts. Very inspiring to see the pace at which Zoho takes ideas and implements them. Keep 'em coming!


  8. And amidst all this , a friend pointed out to me that someone was posting comments under the alias kode, without their full names , giving several blogs and people the impression that it was infact me who was commenting, etc. What happened to the days when stalkers believed in linkback :D . Either ways, I hope its nothing as serious and on the contrary, it does bring up several unique advantages that a sezwho, a disqus , friendfeed or an openid-enabled commenting system can provide.

    But it helps to remember that denial is'nt just a river in Africa .

    8 )

Keep Clicking,
Bhasker V Kode


PS : Btw, if you're a fresher looking to join a startup in Pune , send in a mail to kode at hover dot in