I believe many of you aware about great Eric Brewer theorem which says that any networked shared-data system can have only two of three desirable properties: Consistency or high Availability or tolerance to network Partitions (i.e. this property means that network may loose any packet/message ) – this theorem usually called CAP-theorem. This theorem is quite important (or even very fundamental) for many distributed workloads like computational and data grids and “2 of 3” principle is basis for many architectural decisions in cloud world. Below I combine couple of great links around that topic which surely may shred some light on that principle as well as give you more insights how that basic rule is changed nowadays . And yes, by the way CAP theorem is key thing for most of NoSQL or any other data distributed solutions – below you might good reading about that topic, including some new views on that topic.
Archive for the ‘compute grid’ Category
Guys from EC2 announced micro instances – it costs 2 (two) cents per hour for linux and now it’s will costs less than traditional dedicated hosting with root access – monthly payment for EC2 micro instance will be about 15 USD, and price for root/linux on dedicated hosting will be about 30 USD/month. It’s really good news – you can have 100 boxes cluster just for two usd per hour! Bad thing is that micro instances don’t have their own disk space – EBS only, looks like this best ever use case for this type of instances will be highly-distributed computational grid with all data stored in RAM. And don’t forget that EBS will costs you some money – $0.10 per allocated GB per month Amazon EBS also charges $0.10 per 1 million I/O requests you make to your volume . Fredrick Poller’s already check out micro instances performance by sysbench : Amazon EC2 Micro instance, how fast is it?.
- Shanghai releases 3-year cloud computing plan – Shanghai plans to use cloud computing technologies in urban management, industry development, e-government and small and medium-sized enterprise services
AWS related links :
- State of the cloud – 2010 August
- Amazon’s EC2 Generating 220M+ Annually – really interesting post – “at 40,000 servers evenly distributed across 6 availability zones we know, ___ 6,700 servers per zone..Most of the servers are likely in the US availability zones vs. the EU zones, maybe 75-80% of total capacity”. Also take a look on amazon ec2 instances usage rates
- Anatomy of an Amazon EC2 Resource ID and based on this anatomy EC2 usage estimates
- Rumor Mill: Google EC2 Competitor Coming in 2010?
Update in Amazon Web Services:
and Quadruple Extra Large – 68.4 GB of RAM/ 26 ECU (8 virtual cores* 3.25 ECU) : New EC2 High-Memory Instances
If anyone interested in it – here’s new update for Sun Grid Engine 6.2 – update 4. It almost about bug fixing and man’s changes – list of changes is here. Sources’ tag for CVS is V62u4_TAG (make sense for Grid Engine, ARCo, SGE Inspect ), by the way as I know Hedeby is still 1.0u3.
- Cloud Hosting Performance
- Measuring EC2 system performance
- Comparing Amazon EC2 performance with other cloud/VPS hosting options… and real hardware
- Monitoring Cloud Computing Performance with PRTG: CPU, Disk, Memory Speed Comparison of Amazon EC2 Instance Types
- Posted by Alexey Bokov on August 18, 2009 at 5:47 pm under administration, compute grid, internet, useful links.
Tags: administration, Amazon EC2, gogrid, work
Comment on this post.
- Performance Check – EC2 Instance Types vs Virtual Server
- 10 reasons why cloud computing is bad idea
- Dulance returns back and Google’s Special Snippets for Price Comparison Sites
- Running Terracotta on Amazon EC2 ( terracotta.org )
- Using amazon EC2 security groups as tags for instances
- Cloud Hosting Performance – comparing popular cloud providers in real time for last 48 hours
- GoGrid Management Tools and Library – by the way today goGrid officially out of beta
Sun Grid Engine’s top engineer Richard Hierlmeier wrote article ( and some bash scripts which implements it – btw why you not to put it them onto your cvs? ) about using SDM in compute cloud ( here’s EC2 as example, I suppose that GoGrid can be used also without too many changes ) – Using SDM Cloud Adapter to Manage Solaris Zones.
Sun released new version of Sun Grid Engine – 6.2 Update 3. That’s new:
- Amazon EC2 adapter – SDM Cloud Service Adapter now is avaialable to scale SGE cluster in EC2 – they only need to have OpenVPN installed on instances, and special configuration for IP addresses ( SGE is still very picky about DNS ) – more about restriction is here and here’s bash scripts which used for EC2 deployment. By the way I see that OpenSolaris is still only one OS which available for SGE on EC2, if you’re looking for some SGE/EC2 solutions you may check out Convergence proejct which deal with SGE/GemFire cluster installation on EC2 – see Installing SGE on EC2, or Hedeby installation on EC2.
- Now only one JVM can be runned on master or managed host – in previous version SGE ran 3 JVM on every host – one for cs_vm ( configuration service ), executor_vm ( executor component ) and rp_vm ( resource provider ). In SGE terminology it called SDM simple install.
- Now SGE have Exclusive Scheduling – this helps to guarantee predictable performance and to avoid interference when a job is not using all of the slots that are available on a host
- Sun declared that now SGE have Microsoft Vista Support ( don’t think that there’s too much SGE installations on Vista ) and also some usual marking speech about Power Saving features 🙂
upd. Also there’s new Sun Studio 12 Update 1 is available too.
The central idea we were working on was this idea of de-localized information — information for which I didn’t care what computer it was stored on. It didn’t depend on any particular computer. I didn’t know the identities of other computers in the ensemble that I was working on. I just knew myself and the cybersphere, or sometimes we called it the tuplesphere, or just a bunch of information floating around. We used the analogy — we talked about helium balloons. We used a million ways to try and explain this idea – hn Markoff and Clay Shirky talk to David Gelernter – Lord of the Cloud
Stallman dismisses cloud computing as industry bluster. “It’s stupidity. It’s worse than stupidity: it’s a marketing hype campaign,” he said – huh, i agree that anything which contain “cloud” keyword have too much marketing stuff , but it’s not really stupid. There’s too much marketing stuff in this area ( and goGrid’s guys are the very first for this ‘too much marketing’ ), but let’s look on amazon ec2 – it’s really great amazing thing – last 7 years my work related with various size cluster’s, and last year my “server provider” is amazon – and I may say that amazon is much convenient than any company-owned-datacenter. For my it’s big deal when I can get 100 servers for 10 mins and run some job on them. There’s too much marketing noise in cloud industry, but it works and it works almost fine.
ps. Another point for cloud computing – it’s Steve Ballmer on defining the cloud.
- cloud platforms
- cloud verndors
- Amazon EC2 vs GoGrid vs Mosso
- How LinkedIn grew Bumpersticker to be the “Biggest Ruby on Rails app in the World? and Web Scalability Practices: Bumper Sticker on Rails
- The Private Cloud: Enterprise-ready on and off premise by Stephen Herrod
- Defining private clouds – Part 1, Part 2 – by Bernard Golden
As base AMI i used ami-7db75014 – it’s OpenSolaris supported by Sun, common informartion about installing and using OpenSolaris in EC2 also available in Sun’s Amazon EC2 Getting started guide – in this post I will focus almost in SGE using in Amazon EC2. As SGE distributive i use all-in-one tar package – i choosed “All supported platform” in Grid Engige download page – it takes about 350 Mb, but I don’t worry about platform architecture – if sun support it – it will be in this package. This ge62u2_1.tar.gz contains bunch of other tar.gz’s ( and even hedeby’s core package ) and can be unpacked by :
root@ec2-server:~/tools/archive# gzip -dc ge62u2_1.tar.gz | tar xvpf –So I just go inside ge6.2u2_1 and unpack them all using something like this
for myfile in *.tar.gz do gzip -dc $myfile | tar xvpf – doneOne important thing – hedeby-1.0u2-core.tar.gz contains old versions of some files from ge-6.2u2_1-common.tar.gz – there’s conflicts in files common/util/arch and common/util/arch_variables – here’s diff for them – may be sometimes it can be usefull, but for my configuration it causes very strange errors when I try to install executor host :
value == NULL for attribute “mailer” in configuration list of “ec2-xx-xxx-xxx-xxx.compute-1.amazonaws.com” ./inst_sge[261]: Translate: not found [No such file or directory] ./inst_sge[263]: Translate: not found [No such file or directory] ./inst_sge[264]: Translate: not found [No such file or directory]When I replace this files from ge-6.2u2_1-common.tar.gz installation works as expected. Next point it’s DNS configuration – SGE is very picky to DNS and it will cause some problems in running SGE Amazon EC2 instances with SGE, this stuff can be fixed using host_aliases file in SGE, or other way it’s to use /etc/hosts file for it – some kind of this technique used in Hedeby-SGE on Amazon EC2 demo, for example if we have master this name and 2 executor hosts I put this lines into /etc/hosts :
#internal_ip external_full_name external_short_name internal_full_name internal_short_name
10.yyy.xyz.zzz ec2-RRR-TTT-ZZZ-YYY.compute-1.amazonaws.com ec2-RRR-TTT-ZZZ-YYY domU-mm-ww-PPP-WWW-FFF-GGG.compute-1.internal domU-mm-ww-PPP-WWW-FFF-GGG
10.yyy.qwe.ttt ec2-aaa-bbb-ccc-ddd.compute-1.amazonaws.com ec2-aaa-bbb-ccc-ddd domU-mm-ww-JJJ-HHH-DDD-SSS.compute-1.internal domU-mm-ww-JJJ-HHH-DDD-SSS
10.yyy.pre.ppp ec2-yyy-rrr-eee-qqq.compute-1.amazonaws.com ec2-yyy-rrr-eee-qqq domU-mm-ww-UUU-III-OOO-PPP.compute-1.internal domU-mm-ww-UUU-III-OOO-PPP
Also I use hostname ec2-RRR-TTT-ZZZ-YYY ( external_short_name ) to set instance hostname – this names I use as hostnames when I configure SGE.
Below I try to summary my experience with SGE and it’s using on vary platform ( Solaris 10, Ubuntu, OpenSolaris, etc.. ). If you use Solaris – check out my Solaris – common questions and it’s differences from Linux – may be your problems deal with Solaris, but not SGE.
So let’s go :
- when I installing SGE, and after export SGE_ROOT=<my_sge_path> i try to run util/setfileperm.sh I got ‘can’t find script /util/arch‘ error as shown below :
root@domU-12-31-39-03-CC-95:/opt/ge6.2u2_1# util/setfileperm.sh $SGE_ROOT
can’t find script /util/arch
this error can be fixed by set SDM_DIST enviroment variable :
export SDM_DIST=$SGE_ROOT - I got commlib error :
error: commlib error: access denied (client IP resolved to host name “”. This is not identical to clients host name “”)ERROR: unable to contact qmaster using port 10500 on host “solaris-master.devnet.int.corp”
rebooting SGE master host helps – see Sun Grid Engine : execution host can’t connet to master host with “commlib error: access denied - to be continued..
Shravan Kumar share with me a Lew Tucker ( Sun’s CTO ) demo where he demonstates Virtual Data Center – it’s not a usual marketing speech, it’s quite interesting. As I understand Lew Tucker talks about this one –The APIs for the Sun Cloud – a RESTful API for creating and managing cloud resources, including compute, storage, and networking components – it looks very attractive and interesting so I’m goin’ to check it out shortly in time 🙂
Here’s another Lew Tucker demos – cloud APIs and how to create and deploy My SQL Virtual Machine
Lew Tucker and Dave Douglas demonstrate web developer application for cloud storage service
All of this videos ( and other sun’s cloud-related info ) is available at Sun’s Community OneEast web event home page, also you may check out sun’s A Guide to Getting Started with Cloud Computing.
One way to do it consists in using queues – you may create unique queue for each host in your SGE grid ( using qconf -aq ) and specify this queue name in submitting parameters –
qsub -q <queue_name> $SGE_ROOT/examples/jobs/simple.shIn case if you would like do deploy jobs onto grid from application ( C or Java ) SGE supports special API – Direct esource Managment Application API – DRMAA – here’s some examples in C++ and Java which may help to figure out this stuff. There’s SGE DRMAA Javadocs, drmaa package JavaDocs and common help – C library functions listed in section 3. To specify queue name dmraa_set_attribute function should be used as shown below :
drmaa_set_attribute(jt, DRMAA_NATIVE_SPECIFICATION, “q queue_name”, error, DRMAA_ERROR_STRING_BUFFER – 1);Another way to route jon onto specific host it’s to specify request attributes in qsub : – qsub -l <request_attr_name> – for Java example please see below. Also you may add “soft” or “hard” resource requirements modifier ( for more see SGE glossary – hard/soft resource requirements).
drmaa_set_attribute(jt, DRMAA_NATIVE_SPECIFICATION, “-hard -q queue_name”, error, DRMAA_ERROR_STRING_BUFFER – 1);Here’s a listing of drmaa C++ example which runs job on specified queue – to build it you may use this simple bash script which listed below – it works on Solaris 10, for Linux I suppose it’s better to use g++ compiler :
INC=-I$SGE_ROOT/include LIB=-L$SGE_ROOT/lib/sol-x86/ LIB_NAME=-ldrmaa cc $INC $LIB $LIB_NAME sge_drmaa_test_example.c -o sge_drmaa_test_example.outIf you got below error when you run this example
ld.so.1: sge_drmaa_test_example.out: fatal: libdrmaa.so.1.0: open failed: No such file or directory
Killed
please checkout LD_LIBRARY_PATH environment variable, it should be set in the way like ( Solaris 10 x86 )
export LD_LIBRARY_PATH=$SGE_ROOT/lib/sol-x86/
Java implementation also use DRMAA, but it looks little different from C++ : instead of drmaaa_set_attribute it called JobTemplate::setNativeSpecification :
job_template.setNativeSpecification(“-hard -q ” + queue_name);
Another way to run job on needed host it’s to specifying hostname as request attributes – it look like
jt.setNativeSpecification(“-l hostname=dev-host1”);
Here’s an java source for sge drmaa example or Java drmaa example archive – zip contains source file, eclipse project and compiled binaries – to create jar you may use Eclipse export or run inside bin folder
jar cf SgeDrmaaJobRunner.jar net/bokov/sge/*.class
To run this jar ( and run /tools/job.sh which already deployed on all executors ) on Solaris 10 I use this command
java -cp $SGE_ROOT/lib/drmaa.jar:SgeDrmaaJobRunner.jar -Djava.library.path=$LD_LIBRARY_PATH net.bokov.sge.SgeDrmaaJobRunner soft host not_wait /tools/job.sh host2-dev-net
Also you specify not only one queue name, but use a lists of queue’s names as parameter –
qsub -q queue_1, queue_2 $SGE_ROOT/examples/jobs/simple.sh
At least qsub allows this syntax 🙂
When I run GigaSpace Service container ( gsc.exe ) on my Sony Vaio notebook with Windows Vista this application doesn’t work well – it prints “counter is 0??counter is 0??counter is 0??counter is 0??counter is 0??” in forewer cycle. You can fix it by using config file which overrides defaults in command line. For dot NET version XAP it looks like :
gsc “C:\GigaSpaces\XAP.NET 6.6.0\Runtime\config\overrides\windows.xml”