Archive for the ‘compute grid’ Category

I believe many of you aware about great Eric Brewer theorem which says that any net­worked shared-data system can have only two of three desirable properties: Consistency or high Availability or tolerance to network Partitions (i.e. this property means that network may loose any packet/message ) – this theorem usually called CAP-theorem. This theorem is quite important (or even very fundamental) for many distributed workloads like computational and data grids and “2 of 3” principle is basis for many architectural decisions in cloud world. Below I combine couple of great links around that topic which surely may shred some light on that principle as well as give you more insights how that basic rule is changed nowadays . And yes, by the way CAP theorem  is key thing for most of NoSQL or any other data distributed solutions – below you might good reading about that topic, including some new views on that topic.

Easy explanation of CAP theorem

Cloud computing humor

Guys from EC2 announced micro instances – it costs 2 (two) cents per hour for linux and now it’s will costs less than traditional dedicated hosting with root access – monthly payment for EC2 micro instance will be about 15 USD, and price for root/linux on dedicated hosting will be about 30 USD/month. It’s really good news – you can have 100 boxes cluster just for two usd per hour! Bad thing is that micro instances don’t have their own disk space – EBS only, looks like this best ever use case for this type of instances will be highly-distributed computational grid with all data stored in RAM. And don’t forget that EBS will costs you some money – $0.10 per allocated GB per month Amazon EBS also charges $0.10 per 1 million I/O requests you make to your volume . Fredrick Poller’s already check out micro instances performance by sysbench : Amazon EC2 Micro instance, how fast is it?.

AWS related links :

Update in Amazon Web Services:

  • 2 high cpu instance types : 64 bits – Double Extra Large с 34.2 GB RAM, and 13 ECU (4 virtual cores *3.25 EC2 compute Unit=ECU), 64-bit platform
    and Quadruple Extra Large – 68.4 GB of RAM/ 26 ECU (8 virtual cores* 3.25 ECU) : New EC2 High-Memory Instances
  • Instance prices changes ( us-east is still cheaper thatn eu-west ) : Amazon EC2 – Now an Even Better Value
  • New service for relational DB ( provisioning, scaling and other nice things ) : Introducing Amazon RDS – The Amazon Relational Database Service
  • Security stuff : Vulnerability identified in Amazon’s cloud computing
  • Amazon EC2 – Ubuntu at google groups
  • 5 years ago Amazon announced Amazon Simple Queue Service – top points of AWS for last 5 years
  • If anyone interested in it – here’s new update for Sun Grid Engine 6.2 – update 4. It almost about bug fixing and man’s changes – list of changes is here. Sources’ tag for CVS is V62u4_TAG (make sense for Grid Engine, ARCo, SGE Inspect ), by the way as I know Hedeby is still 1.0u3.

    If a recent Evans Data study is any indication, private cloud deployments could rise sharply over the next year.

    Sun Grid Engine’s top engineer Richard Hierlmeier wrote article ( and some bash scripts which implements it – btw why you not to put it them onto your cvs? ) about using SDM in compute cloud ( here’s EC2 as example, I suppose that GoGrid can be used also without too many changes ) – Using SDM Cloud Adapter to Manage Solaris Zones.

    Sun released new version of Sun Grid Engine – 6.2 Update 3. That’s new:

    upd. Also there’s new Sun Studio 12 Update 1 is available too.

    The central idea we were working on was this idea of de-localized information — information for which I didn’t care what computer it was stored on. It didn’t depend on any particular computer. I didn’t know the identities of other computers in the ensemble that I was working on. I just knew myself and the cybersphere, or sometimes we called it the tuplesphere, or just a bunch of information floating around. We used the analogy — we talked about helium balloons. We used a million ways to try and explain this idea – hn Markoff and Clay Shirky talk to David Gelernter – Lord of the Cloud

    Stallman dismisses cloud computing as industry bluster. “It’s stupidity. It’s worse than stupidity: it’s a marketing hype campaign,” he said – huh, i agree that anything which contain “cloud” keyword have too much marketing stuff , but it’s not really stupid. There’s too much marketing stuff in this area ( and goGrid’s guys are the very first for this ‘too much marketing’ ), but let’s look on amazon ec2 – it’s really great amazing thing – last 7 years my work related with various size cluster’s, and last year my “server provider” is amazon – and I may say that amazon is much convenient than any company-owned-datacenter. For my it’s big deal when I can get 100 servers for 10 mins and run some job on them. There’s too much marketing noise in cloud industry, but it works and it works almost fine.
    ps. Another point for cloud computing – it’s Steve Ballmer on defining the cloud.

    As base AMI i used ami-7db75014 – it’s OpenSolaris supported by Sun, common informartion about installing and using OpenSolaris in EC2 also available in Sun’s Amazon EC2 Getting started guide – in this post I will focus almost in SGE using in Amazon EC2. As SGE distributive i use all-in-one tar package – i choosed “All supported platform” in Grid Engige download page – it takes about 350 Mb, but I don’t worry about platform architecture – if sun support it – it will be in this package. This ge62u2_1.tar.gz contains bunch of other tar.gz’s ( and even hedeby’s core package ) and can be unpacked by :

    root@ec2-server:~/tools/archive# gzip -dc ge62u2_1.tar.gz | tar xvpf –

    So I just go inside ge6.2u2_1 and unpack them all using something like this

    for myfile in *.tar.gz
    do
    gzip -dc $myfile | tar xvpf –
    done

    One important thing – hedeby-1.0u2-core.tar.gz contains old versions of some files from ge-6.2u2_1-common.tar.gz – there’s conflicts in files common/util/arch  and common/util/arch_variables – here’s diff for them – may be sometimes it can be usefull, but for my configuration it causes very strange errors when I try to install executor host :

    value == NULL for attribute “mailer” in configuration list of “ec2-xx-xxx-xxx-xxx.compute-1.amazonaws.com”
    ./inst_sge[261]: Translate: not found [No such file or directory]
    ./inst_sge[263]: Translate: not found [No such file or directory]
    ./inst_sge[264]: Translate: not found [No such file or directory]

    When I replace this files from ge-6.2u2_1-common.tar.gz installation works as expected. Next point it’s DNS configuration – SGE is very picky to DNS and it will cause some problems in running SGE Amazon EC2 instances with SGE, this stuff can be fixed using host_aliases file in SGE, or other way it’s to use /etc/hosts file for it – some kind of this technique used in Hedeby-SGE on Amazon EC2 demo, for example if we have master this name and 2 executor hosts I put this lines into /etc/hosts :

    #internal_ip external_full_name external_short_name internal_full_name internal_short_name
    10.yyy.xyz.zzz ec2-RRR-TTT-ZZZ-YYY.compute-1.amazonaws.com ec2-RRR-TTT-ZZZ-YYY domU-mm-ww-PPP-WWW-FFF-GGG.compute-1.internal domU-mm-ww-PPP-WWW-FFF-GGG
    10.yyy.qwe.ttt ec2-aaa-bbb-ccc-ddd.compute-1.amazonaws.com ec2-aaa-bbb-ccc-ddd domU-mm-ww-JJJ-HHH-DDD-SSS.compute-1.internal domU-mm-ww-JJJ-HHH-DDD-SSS
    10.yyy.pre.ppp ec2-yyy-rrr-eee-qqq.compute-1.amazonaws.com ec2-yyy-rrr-eee-qqq domU-mm-ww-UUU-III-OOO-PPP.compute-1.internal domU-mm-ww-UUU-III-OOO-PPP

    Also I use hostname ec2-RRR-TTT-ZZZ-YYY ( external_short_name )  to set instance hostname – this names I use as hostnames when I configure SGE.

    Below I try to summary my experience with SGE and it’s using on vary platform ( Solaris 10, Ubuntu, OpenSolaris, etc.. ). If you use Solaris – check out my Solaris – common questions and it’s differences from Linux – may be your problems deal with Solaris, but not SGE.
    So let’s go :

    • when I installing SGE, and after export SGE_ROOT=<my_sge_path> i try to run util/setfileperm.sh I got ‘can’t find script /util/arch‘ error as shown below :
      root@domU-12-31-39-03-CC-95:/opt/ge6.2u2_1# util/setfileperm.sh $SGE_ROOT
      can’t find script /util/arch
      this error can be fixed by set SDM_DIST enviroment variable :
      export SDM_DIST=$SGE_ROOT
    • I got commlib error :
      error: commlib error: access denied (client IP resolved to host name “”. This is not identical to clients host name “”)ERROR: unable to contact qmaster using port 10500 on host “solaris-master.devnet.int.corp”
      rebooting SGE master host helps – see Sun Grid Engine : execution host can’t connet to master host with “commlib error: access denied
    • to be continued..

    Shravan Kumar share with me a Lew Tucker ( Sun’s CTO ) demo where he demonstates Virtual Data Center – it’s not a usual marketing speech, it’s quite interesting. As I understand Lew Tucker talks about this one –The APIs for the Sun Cloud – a RESTful API for creating and managing cloud resources, including compute, storage, and networking components – it looks very attractive and interesting so I’m goin’ to check it out shortly in time 🙂

    Here’s another Lew Tucker demos – cloud APIs and how to create and deploy My SQL Virtual Machine

    Lew Tucker and Dave Douglas demonstrate web developer application for cloud storage service

    All of this videos ( and other sun’s cloud-related info ) is available at Sun’s Community OneEast web event home page, also you may check out sun’s A Guide to Getting Started with Cloud Computing.

    One way to do it consists in using queues – you may create unique queue for each host in your SGE grid ( using qconf -aq ) and specify this queue name in submitting parameters –

    qsub -q <queue_name> $SGE_ROOT/examples/jobs/simple.sh

    In case if you would like do deploy jobs onto grid from application ( C or Java ) SGE supports special API – Direct esource Managment Application API – DRMAA – here’s some examples in C++ and Java which may help to figure out this stuff. There’s SGE DRMAA Javadocs, drmaa package JavaDocs and common help – C library functions listed in section 3. To specify queue name dmraa_set_attribute function should be used as shown below :

    drmaa_set_attribute(jt, DRMAA_NATIVE_SPECIFICATION, “q queue_name”, error, DRMAA_ERROR_STRING_BUFFER – 1);

    Another way to route jon onto specific host it’s to specify request attributes in qsub  : – qsub -l <request_attr_name> – for Java example please see below. Also you may add “soft” or “hard” resource requirements modifier ( for more see SGE glossary – hard/soft resource requirements).

    drmaa_set_attribute(jt, DRMAA_NATIVE_SPECIFICATION, “-hard  -q queue_name”, error, DRMAA_ERROR_STRING_BUFFER – 1);

    Here’s a listing of  drmaa C++ example which runs job on specified queue – to build it you may use this simple bash script which listed below – it works on Solaris 10, for Linux I suppose it’s better to use g++ compiler :

    INC=-I$SGE_ROOT/include
    LIB=-L$SGE_ROOT/lib/sol-x86/
    LIB_NAME=-ldrmaa
    cc $INC $LIB $LIB_NAME sge_drmaa_test_example.c -o sge_drmaa_test_example.out

    If you got below error when you run this example

    ld.so.1: sge_drmaa_test_example.out: fatal: libdrmaa.so.1.0: open failed: No such file or directory
    Killed

    please checkout LD_LIBRARY_PATH environment variable, it should be set in the way like ( Solaris 10 x86 )

    export LD_LIBRARY_PATH=$SGE_ROOT/lib/sol-x86/

    Java implementation also use DRMAA, but it looks little different from C++ : instead of  drmaaa_set_attribute it called JobTemplate::setNativeSpecification :

    job_template.setNativeSpecification(“-hard -q ” + queue_name);

    Another way to run job on needed host it’s to specifying hostname as request attributes – it look like

    jt.setNativeSpecification(“-l hostname=dev-host1”);

    Here’s an java source for sge drmaa example or Java drmaa example archive – zip  contains source file, eclipse project and compiled binaries – to create jar you may use Eclipse export  or run inside bin folder

    jar cf SgeDrmaaJobRunner.jar net/bokov/sge/*.class

    To run this jar ( and run /tools/job.sh which already deployed on all executors ) on Solaris 10 I use this command

    java -cp $SGE_ROOT/lib/drmaa.jar:SgeDrmaaJobRunner.jar -Djava.library.path=$LD_LIBRARY_PATH net.bokov.sge.SgeDrmaaJobRunner soft host  not_wait  /tools/job.sh host2-dev-net

    Also you specify not only one queue name, but use a lists of queue’s names as parameter –

    qsub -q queue_1, queue_2 $SGE_ROOT/examples/jobs/simple.sh

    At least qsub allows this syntax 🙂

    When I run GigaSpace Service container ( gsc.exe ) on my Sony Vaio notebook with Windows Vista this application doesn’t work well – it prints “counter is 0??counter is 0??counter is 0??counter is 0??counter is 0??” in forewer cycle. You can fix it by using config file which overrides defaults in command line. For dot NET version XAP it looks like :
    gsc “C:\GigaSpaces\XAP.NET 6.6.0\Runtime\config\overrides\windows.xml”