Posts tagged ‘sun’

By unknown reasons Sun JDK was moved to partner repository, so to use sun jdk you need to do this steps

sudo add-apt-repository "deb http://archive.canonical.com/ lucid partner"
sudo apt-get update
sudo apt-get upgrade

and then enjoy :

sudo apt-cache search jdk | grep sun
sun-java6-source - Sun Java(TM) Development Kit (JDK) 6 source files
sun-java6-jre - Sun Java(TM) Runtime Environment (JRE) 6 (architecture independent files)
sun-java6-jdk - Sun Java(TM) Development Kit (JDK) 6
sun-java6-javadb - Java(TM) DB, Sun Microsystems' distribution of Apache Derby
sun-java6-demo - Sun Java(TM) Development Kit (JDK) 6 demos and examples
sun-java6-bin - Sun Java(TM) Runtime Environment (JRE) 6 (architecture dependent files)

If anyone interested in it – here’s new update for Sun Grid Engine 6.2 – update 4. It almost about bug fixing and man’s changes – list of changes is here. Sources’ tag for CVS is V62u4_TAG (make sense for Grid Engine, ARCo, SGE Inspect ), by the way as I know Hedeby is still 1.0u3.

Sun Grid Engine’s top engineer Richard Hierlmeier wrote article ( and some bash scripts which implements it – btw why you not to put it them onto your cvs? ) about using SDM in compute cloud ( here’s EC2 as example, I suppose that GoGrid can be used also without too many changes ) – Using SDM Cloud Adapter to Manage Solaris Zones.

Sun released new version of Sun Grid Engine – 6.2 Update 3. That’s new:

upd. Also there’s new Sun Studio 12 Update 1 is available too.

Stallman dismisses cloud computing as industry bluster. “It’s stupidity. It’s worse than stupidity: it’s a marketing hype campaign,” he said – huh, i agree that anything which contain “cloud” keyword have too much marketing stuff , but it’s not really stupid. There’s too much marketing stuff in this area ( and goGrid’s guys are the very first for this ‘too much marketing’ ), but let’s look on amazon ec2 – it’s really great amazing thing – last 7 years my work related with various size cluster’s, and last year my “server provider” is amazon – and I may say that amazon is much convenient than any company-owned-datacenter. For my it’s big deal when I can get 100 servers for 10 mins and run some job on them. There’s too much marketing noise in cloud industry, but it works and it works almost fine.
ps. Another point for cloud computing – it’s Steve Ballmer on defining the cloud.

As base AMI i used ami-7db75014 – it’s OpenSolaris supported by Sun, common informartion about installing and using OpenSolaris in EC2 also available in Sun’s Amazon EC2 Getting started guide – in this post I will focus almost in SGE using in Amazon EC2. As SGE distributive i use all-in-one tar package – i choosed “All supported platform” in Grid Engige download page – it takes about 350 Mb, but I don’t worry about platform architecture – if sun support it – it will be in this package. This ge62u2_1.tar.gz contains bunch of other tar.gz’s ( and even hedeby’s core package ) and can be unpacked by :

root@ec2-server:~/tools/archive# gzip -dc ge62u2_1.tar.gz | tar xvpf –

So I just go inside ge6.2u2_1 and unpack them all using something like this

for myfile in *.tar.gz
do
gzip -dc $myfile | tar xvpf –
done

One important thing – hedeby-1.0u2-core.tar.gz contains old versions of some files from ge-6.2u2_1-common.tar.gz – there’s conflicts in files common/util/arch  and common/util/arch_variables – here’s diff for them – may be sometimes it can be usefull, but for my configuration it causes very strange errors when I try to install executor host :

value == NULL for attribute “mailer” in configuration list of “ec2-xx-xxx-xxx-xxx.compute-1.amazonaws.com”
./inst_sge[261]: Translate: not found [No such file or directory]
./inst_sge[263]: Translate: not found [No such file or directory]
./inst_sge[264]: Translate: not found [No such file or directory]

When I replace this files from ge-6.2u2_1-common.tar.gz installation works as expected. Next point it’s DNS configuration – SGE is very picky to DNS and it will cause some problems in running SGE Amazon EC2 instances with SGE, this stuff can be fixed using host_aliases file in SGE, or other way it’s to use /etc/hosts file for it – some kind of this technique used in Hedeby-SGE on Amazon EC2 demo, for example if we have master this name and 2 executor hosts I put this lines into /etc/hosts :

#internal_ip external_full_name external_short_name internal_full_name internal_short_name
10.yyy.xyz.zzz ec2-RRR-TTT-ZZZ-YYY.compute-1.amazonaws.com ec2-RRR-TTT-ZZZ-YYY domU-mm-ww-PPP-WWW-FFF-GGG.compute-1.internal domU-mm-ww-PPP-WWW-FFF-GGG
10.yyy.qwe.ttt ec2-aaa-bbb-ccc-ddd.compute-1.amazonaws.com ec2-aaa-bbb-ccc-ddd domU-mm-ww-JJJ-HHH-DDD-SSS.compute-1.internal domU-mm-ww-JJJ-HHH-DDD-SSS
10.yyy.pre.ppp ec2-yyy-rrr-eee-qqq.compute-1.amazonaws.com ec2-yyy-rrr-eee-qqq domU-mm-ww-UUU-III-OOO-PPP.compute-1.internal domU-mm-ww-UUU-III-OOO-PPP

Also I use hostname ec2-RRR-TTT-ZZZ-YYY ( external_short_name )  to set instance hostname – this names I use as hostnames when I configure SGE.

Below I try to summary my experience with SGE and it’s using on vary platform ( Solaris 10, Ubuntu, OpenSolaris, etc.. ). If you use Solaris – check out my Solaris – common questions and it’s differences from Linux – may be your problems deal with Solaris, but not SGE.
So let’s go :

  • when I installing SGE, and after export SGE_ROOT=<my_sge_path> i try to run util/setfileperm.sh I got ‘can’t find script /util/arch‘ error as shown below :
    root@domU-12-31-39-03-CC-95:/opt/ge6.2u2_1# util/setfileperm.sh $SGE_ROOT
    can’t find script /util/arch
    this error can be fixed by set SDM_DIST enviroment variable :
    export SDM_DIST=$SGE_ROOT
  • I got commlib error :
    error: commlib error: access denied (client IP resolved to host name “”. This is not identical to clients host name “”)ERROR: unable to contact qmaster using port 10500 on host “solaris-master.devnet.int.corp”
    rebooting SGE master host helps – see Sun Grid Engine : execution host can’t connet to master host with “commlib error: access denied
  • to be continued..

Shravan Kumar share with me a Lew Tucker ( Sun’s CTO ) demo where he demonstates Virtual Data Center – it’s not a usual marketing speech, it’s quite interesting. As I understand Lew Tucker talks about this one –The APIs for the Sun Cloud – a RESTful API for creating and managing cloud resources, including compute, storage, and networking components – it looks very attractive and interesting so I’m goin’ to check it out shortly in time 🙂

Here’s another Lew Tucker demos – cloud APIs and how to create and deploy My SQL Virtual Machine

Lew Tucker and Dave Douglas demonstrate web developer application for cloud storage service

All of this videos ( and other sun’s cloud-related info ) is available at Sun’s Community OneEast web event home page, also you may check out sun’s A Guide to Getting Started with Cloud Computing.

Actually newbies in Solaris ( and experienced users in Linux :-)) have troubles with some every-day routines which works in Solaris in a different way than in most Linux like Ubuntu. Below I try to list most ”popular” problems and questins about differences in Solaris and Linux and try to figure them out.

  • grep doesn’t have -r switch, so there’s no recursive looking throught directories, here’s alternatives for recursive grep on Solaris :
    grep 'somestring' `find . -name '*'`
    find . | xargs grep 'somestring'
  • tar doesn’t support -z option, so tar xfz my_archive.tar.gz will fail with error “tar: z: unknown function modifier“. To unpack tar.gz archive on Solaris you may use this one :
    gzip -dc my_archive.tar.gz | tar xvpf -
  • how to set environment variables in Solaris ( i set them in \~/.bashrc, but it doesn’t works )  : to make Solaris read and apply to user environment barsh_rc file try to create file ~/.bash_profile in your home, and put in it lines listed below :
    if [ -f ~/.bashrc ]; then
    . ~/.bashrc
    fi

To be continued…

Usefull links : Solaris Infrequently Asked and Obscure Questions

I got some problems with my SGE cluster – I got some amount of Solaris 10 which running under some virtualization, all servers are the same configured and have equally environment, on one machine I install SGE master, on other SGE execution hosts – and some execution hosts works well, but on another I have strange error from “install_execd” :

Checking hostname resolving
—————————
Cannot contact qmaster. The command failed:
./bin/sol-x86/qconf -sh
The error message was:
error: commlib error: access denied (client IP resolved to host name “”. This is not identical to clients host name “”)
ERROR: unable to contact qmaster using port 10500 on host “solaris-master.devnet.int.corp”

When I run “qconf -sh” I got :

bash-3.00# qconf -sh
error: commlib error: access denied (client IP resolved to host name “”. This is not identical to clients host name “”)
ERROR: unable to contact qmaster using port 10500 on host “solaris-master.devnet.int.corp

I check out connection – ping works, hostname resolved, telnet connection on port 10500 – it works, after I check connection from master host – there’s no problems too. I compare environment on execution hosts which are worked well with hosts which have error – they got the same environment, master host configuration also have no any suspicios-looking stuff. I try to find something usefull in web – no results, some guys have same problem, but no one knows that’s happen and how to fix it. After I try to reboot execution hosts – no effect.

But when I try run “reboot” on master host – wow, it helps! So, guys, if you’ got the same errors with SGE – try to “reboot” on your master host – it may helps.


One way to do it consists in using queues – you may create unique queue for each host in your SGE grid ( using qconf -aq ) and specify this queue name in submitting parameters –

qsub -q <queue_name> $SGE_ROOT/examples/jobs/simple.sh

In case if you would like do deploy jobs onto grid from application ( C or Java ) SGE supports special API – Direct esource Managment Application API – DRMAA – here’s some examples in C++ and Java which may help to figure out this stuff. There’s SGE DRMAA Javadocs, drmaa package JavaDocs and common help – C library functions listed in section 3. To specify queue name dmraa_set_attribute function should be used as shown below :

drmaa_set_attribute(jt, DRMAA_NATIVE_SPECIFICATION, “q queue_name”, error, DRMAA_ERROR_STRING_BUFFER – 1);

Another way to route jon onto specific host it’s to specify request attributes in qsub  : – qsub -l <request_attr_name> – for Java example please see below. Also you may add “soft” or “hard” resource requirements modifier ( for more see SGE glossary – hard/soft resource requirements).

drmaa_set_attribute(jt, DRMAA_NATIVE_SPECIFICATION, “-hard  -q queue_name”, error, DRMAA_ERROR_STRING_BUFFER – 1);

Here’s a listing of  drmaa C++ example which runs job on specified queue – to build it you may use this simple bash script which listed below – it works on Solaris 10, for Linux I suppose it’s better to use g++ compiler :

INC=-I$SGE_ROOT/include
LIB=-L$SGE_ROOT/lib/sol-x86/
LIB_NAME=-ldrmaa
cc $INC $LIB $LIB_NAME sge_drmaa_test_example.c -o sge_drmaa_test_example.out

If you got below error when you run this example

ld.so.1: sge_drmaa_test_example.out: fatal: libdrmaa.so.1.0: open failed: No such file or directory
Killed

please checkout LD_LIBRARY_PATH environment variable, it should be set in the way like ( Solaris 10 x86 )

export LD_LIBRARY_PATH=$SGE_ROOT/lib/sol-x86/

Java implementation also use DRMAA, but it looks little different from C++ : instead of  drmaaa_set_attribute it called JobTemplate::setNativeSpecification :

job_template.setNativeSpecification(“-hard -q ” + queue_name);

Another way to run job on needed host it’s to specifying hostname as request attributes – it look like

jt.setNativeSpecification(“-l hostname=dev-host1”);

Here’s an java source for sge drmaa example or Java drmaa example archive – zip  contains source file, eclipse project and compiled binaries – to create jar you may use Eclipse export  or run inside bin folder

jar cf SgeDrmaaJobRunner.jar net/bokov/sge/*.class

To run this jar ( and run /tools/job.sh which already deployed on all executors ) on Solaris 10 I use this command

java -cp $SGE_ROOT/lib/drmaa.jar:SgeDrmaaJobRunner.jar -Djava.library.path=$LD_LIBRARY_PATH net.bokov.sge.SgeDrmaaJobRunner soft host  not_wait  /tools/job.sh host2-dev-net

Also you specify not only one queue name, but use a lists of queue’s names as parameter –

qsub -q queue_1, queue_2 $SGE_ROOT/examples/jobs/simple.sh

At least qsub allows this syntax 🙂