|
|
Categories » ‘Research’
July 28th, 2010 by Peter
6 years (!) after my initial Debian ITP (see here) and all the hard work on a custom Condor Debian package, they finally made it too:
http://www.cs.wisc.edu/condor/debian/
The Condor people are now offering there own Debian repository for the (longer existing) Condor DEB files. This allows you to have a decently updated Debian-based cluster with the latest version available. Great thing. Use it.
If you were using my Condor Debian packages in the past, you should stop that, and do a careful migration. The user names are equal, and the Wisconsin version of the installation scripts seems to be a little bit picky. I recommend to purge my package (for example with apt-get remove –purge condor) and check for any remainings, before you start with the new repository.
The installation is by default working in personal mode, which demands no user interaction during installation. The package looks like a completely new development, which is sad – some people (check the ITP) spend some time on things such as debconf support.
November 4th, 2009 by Peter
HPCwire explains in a recent article how NVIDIA wants to offer access to a remote rendering cluster called ‘RealityServer’. The current description sounds more like a typical remote software offer (’software as a service’, if you prefer), and not like remotely accessible raw GPU cores, as the title suggests. Anyway, worthwhile a look…
http://www.nvidia.com/object/realityserver.html
October 5th, 2009 by Peter
My home standardization body OGF published the first list of requirements for a Open Cloud Computing Interface. One of the more interesting parts of this spec is the feature matrix for existing cloud APIs to be considered:
- Amazon Elastic Compute Cloud
- ElasticHosts
- Flexiscale
- GoGrid
- Sun Cloud API
- Rackspace Cloud Servers
- WMware vSphere
Their analysis shows some interesting facts:
- Amazon does not support persistent computer resources, most others do.
- Only Amazon has support for ephermal (real local) storage resources, which are huge and performant, but not resilient to hardware faults. Everybody supports virtual persistent storage.
- Static IPs and firewall features are about to become “cloud mainstream.”
September 24th, 2009 by Peter
In my middleware course, I asked the question of how to realize a statefulSingleCall remoting server. I expected to hear something about external state storage, e.g. in a database. To my surprise, most students proposed to use a static variable in the server implementation for keeping the state between calls.
This idea results from a pure programming-language thinking, and (for most students) from successful experimentation. Looking under the hood, it is not completely obvious that this will work in all cases. Static variables are managed by the CLR as entities bounded to a loaded class, in this case the server class. According to most sources, static variables are only garbage collected when the class is garbage collected. This can only happen if the surrounding application domain is unloaded. For a standard remoting server, this is very unlikely to happen ever.
Things become more interesting with a different runtime host than the operating system. If you use IIS, it could decide to unload your whole server application if it is not triggered often enough, or if memory is whole.
All in all, relying on static variables in a virtual runtime environment is bad style.
BTW: I know that .NET Remoting is deprecated in favour of WCF, but it is still a very helpful teaching tool.
March 27th, 2009 by Peter
I attended the MRSC 2009 conference, taking place at the Zuse institute in Berlin. Some impressions:
- The conference claims to be a “many-core and reconfigurable supercomputing conference”. In fact, it was more about FPGA-based hardware accelerators than about many-core technologies. The mixture of industry and academia talsk was quite interesting. And the catering was really good
- Prof. Reinefeld, the host of the conference, gave some nice introduction about his interest in FPGA accelerators. One of the main reasons is power consumption – the ZIB facilities increased their power consumption from around 90 kW in 1997, over 260 kW in 2002, to meanwhile 660 kW in 2008. He also told us that roughly 40% of the power consumption today accounts for cooling only. Specialized FPGA accelerators could allow to keep the speedup pace AND reduce the power consumption significantly.
- I learned that FPGA boards have some general properties that are relevant for their programming. They have a comparatively low frequency (hundreds of MHz), can support tailored data types very well (if programmed accordingly), and normally provide thousands of ‘cores’ in a SIMD-like fashion. FPGA mostly have a memory bandwidth problem, meaning that they produce the results too fast to put them away in time. All algorithms simply run in pure hardware. The latest (very impressive) trend are FPGA’s that fit into a standard X86 CPU socket. The FPGA tool vendor provides a FSB or HyperTransport implementation, which allows you to add a spezialized CPU with full RAM access to your SMP system.
- All speakers agreed that standard n-core processors, FPGA’s, and GPU’s all have their right to exist. Standard processors are best in control-flow and sequential activities, GPU’s are perfect for floating point work, and FPGA’s are well-suited for the optimized processing of application-specific data types.
- Since I am a fan of standardization, the according activities OpenFPGA and OpenAccelerator must be mentioned.
- Martin Herbordt gave a nice talk about the advantages of FPGA acceleration. I found the slides in a 2006 version. He explained programming models for FPGA, showing that according applications need heavy restructuring, and that performance is highly sensitive to implementation quality – even more than with MPP programming. His presentation showed that the tailoring of FPGA’s can bring fast implementations things that would be hard on a standard processor (e.g. random number generation or coordinate transformation). This doesn’t come for free, FPGA-enabled algorithms need to fit to the ‘vector-like’ architecture. Stream processing through a series of ALU’s seems to be one favourite approach. He explained an example were BLAST (an indexing problem) was accelerated by changing it to a streaming problem.
- Phillip Maar from University of Potsdam presented a combined solution to design multiprocessor system-on-chip (MPSoC) solutions. They take a C program and first parallelize it automatically by clan partitioning to a MPI application (yes, this was the weak part). In the next step, they perform a functional cycle-accurate simulation of this program to design an optimal FPGA layout. In order to bring the MPI program logic to the chip, they created a hardware version of an MPI subset (SocMPI).
- CAPS presented the HMPP workbench, which allows to parallelize C and Fortran software by preprocessor directives. The most interesting aspect is the support for hybrid systems with GPU’s, FPGA’s, multiple cores and other execution engines. The software has the concept of “codelets”, which are functions to be executed in a remote device or specialized core. The source code always remains independent from the target accelerator.
- Mitrionics was also an interesting company, mentioned by nearly everybody. Their major product is a virtual processor implementation for FPGA’s, which can be stripped down according to the application needs before it goes on chip. It reminded me of the operating system concepts in embedded systems, were you compile your own version by putting together only the relevant modules (e.g. Windows CE). In the Mitrionics case, you program against the virtual processor function set (so called tiles), and not the FPGA chip itself. The highly optimized set of standard ’tiles’ is turing-complete. All tiles in the virtual processor can be compiled with the bit width you need. This saves precious space on the FPGA chip. Mitrionics has an own parallel programming language for the virtual processor solution. The speaker spent some time on explaining why automated parallelization of sequential code can never work. His main argument was that the compiler would need to know all possible parallelization strategies for sequential control flow patterns in advance. (BTW, I completely agree that todays parallelizing compilers perform only intelligent algorithm pattern matching, sometimes supported by annotations such as OpenMP). The Mitrionics language is based on data dependencies only, without any execution order description. Somebody from the audience identified similarities to data-flow languages from the 80’s, so they might be interesting again.
- Somebody from HP labs talked about their view on the world – very generic. HP seems to count on heterogeneus system integration due to the power wall as near future problem.
- Microsoft showed the recent parallel computing extensions in Windows 7 and .NET. Nothing new, but nicely explained. For Windows 7, they claim better NUMA support (some lightly extended WIN32 scheduling API’s) and user-mode scheduling (UMS) as major things.The first thing is not really impressive – they basically provide group affinity support and extended information about the core-cache relationships. If you know Linux CPUsets, it’s more or less the same. UMS is interesting, since it immediately reminded me of NT4 fibers. The Microsoft guy (from Redmond, not Cambridge !) confirmed that it’s more or less the same, but Dave Probert states it in a different way. Everything will come with Visual Studio 2010, and no, this will not be based on Phoenix
- SiliconGraphics had a nice presentation with some internal details. The speaker explained how SG was suprised by the rise of GPU’s in the last months. Hardware vendors (ClearSpeed, NVidia, XDI) and tool providers (RapidMind, Mitrionics, Alinea) together provide a lot of alternatives to their solutions. He showed how SG is now providing tailored hybrid clusters, and talked a little bit about the main problems with them – power-on sequence, non-atomic resource allocation, SW incompatibilities and online diagnosis flaws. He also showed some really cool research prototype with 180 x 2 Atom processor nodes in one (!) 3U chassy. Plans go up to 10.000 cores per rack, all running under CentOS. Here is a picture.
February 24th, 2009 by Peter
For some measurement task, we needed to isolate one CPU core on an Intel Quad Core machine under Linux 2.6. With the background of SMP and NUMA system support, Linux provides the following options:
- isolcpus=[]: This is an old kernel boot parameter, which allows the isolate one or more CPU’s from the scheduler. Processes can still be placed on this processors by explicit affinity system calls. Processors are still utilized for interrupt processing, since they are only taken away from the OS scheduler. Modern kernels implement this feature with the (anyway given) scheduling domain support, a part of the cpu sets functionality. The kernel boot option was therefore lately requested to be removed.
- CPU sets: CPU sets constrain the CPU and Memory placement of tasks to only the resources within a tasks current set. Specifically, the sched_setaffinity() and the mbind() system calls are accordingly intercepted. The scheduler has also support to partition the system CPU’s into a number of scheduling domains. By default, there is one scheduling domain covering all CPUs, except those marked with isolcpus (see above). After mounting the according device, you can create custom CPU sets and assign tasks and processors to it. In our experiments, it was not possible to remove a core from the default CPU set. Aquick check of the sources showed that this is intentional, even though the error code is misleading:
mount -t cgroup -o cpuset cpuset /dev/cpuset
root@intel:/dev/cpuset# cat cpuset.cpus
0-3
root@intel:/dev/cpuset# /bin/echo 0,1,2 > cpuset.cpus
/bin/echo: Schreibfehler: Keine Berechtigung
- CPU hotplug: The CPU hotplug implementation provides exhaustive support for CPU management within the SYSFS file system tree. The implementation also frees the CPU from interrupts and timers. A simple echo 0 > /sys/devices/system/cpu/cpuX/online puts the particular CPU off-line. With a very recent kernel, it also works for single cores in our system:
root@intel:/sys/devices/system/cpu/cpu0# ls -l
drwxr-xr-x 5 root root 0 24. Feb 18:59 cache
drwxr-xr-x 4 root root 0 24. Feb 18:59 cpufreq
drwxr-xr-x 2 root root 0 24. Feb 18:59 microcode
drwxr-xr-x 2 root root 0 24. Feb 18:59 thermal_throttle
drwxr-xr-x 2 root root 0 24. Feb 18:59 topology
If you want to check if your disabling activity was successful, check the output of /proc/interrupts.
January 12th, 2009 by Peter
A presentation at the CCC 2008 congress showed how to create a rogue CA certificate, based on the well-known flaws in the MD-5 hashing algorithm. There is also an exhaustive explanation on the web. Verisign already reacted and switched to SHA-1. For students it might be interesting to see that a very basic crypto algorithm flaw can possibly harm a whole Internet security infrastructure. What happens if SHA-1 is broken tomorrow ?
December 15th, 2008 by Peter
In my middleware course, we discussed the true meaning of oneway in CORBA IDL. The standard and most other sources agree that oneway has at-least-once defines that oneway has at-most-once delivery behaviour, meaning that such operation calls might not be processed by the server, but if they are, then only once. But some students (and many sources) then automatically equate oneway with asynchronous procedure calls, which is wrong (check CORBA AMI). There is still no returning of a result value. The client calls and continuous immediately.
It also turned out that the detailed oneway semantic is not as ORB-dependent as you might think. Since GIOP 1.2, there is the SyncScopePolicy that allows to specify ‘how reliable’ the client ORB should deliver the message. Beside the standard (non-regulated) behavior, you can demand at least acceptance by the servant-side TCP stack (SYNC_WITH_TRANSPORT), reception by some servant (SYNC_WITH_SERVER), or even the processing as with a normal synchronous method (SYNC_WITH_TARGET).
November 11th, 2008 by Peter
Some authors from Berkeley published a paper about their infiltration of the largest known Spam bot network. The article is an interesting example of carefully interpreted statistical data, but also a good analysis of anti-spam technologies and their effectiveness. Of course, the usual suspects could not resist to draw generalized (and wrong) conclusions from it …
November 7th, 2008 by Peter
Time is running so fast …
Sun is about to release a new beta version of their application server GlassFish, which will implement the next J2EE / Java EE version 6. As usual, Java EE 6 is based on a JCP document.
One new focus is on profiles, which allow to take a subset of the (huge) Jave EE API set and build an according “compliant” application server. The major use case are – of course – web applications. So the only profile under discussion so far is the “Web Profile“. This includes the unavoidable inclusion of REST support.
The more interesting part is called “pruning”. The Sun people aim at some cleanup of the historically grown API set, which is really a good idea. Most of the currently discussed removal candidates have more powerful replacements since EE 5, so this is not extremely painful. The early review draft document of the EE 6 spec says:
“Technologies that may be pruned in a future release are marked Proposed Optional below. Technologies that have been pruned are marked Optional below. There are no Optional technologies for Java EE 6.”
The “proposed optional” marking is so far only given for JAX-RPC and JAXR (search for “POPT” in the JCR document). So you can see that Sun remains extremely conservative with non-backward-compatible changes. This is somehow bad, because the burden of nearly unused small API’s is still there. Who ever used JavaMail ?
GlassFish will also add support for several JVM-based scripting languages such as JRuby. This smells like a reaction on the .NET idea, and is anyway a good step. The Java language / component model still contains huge design mistakes from the past (e.g. call-by-value vs. call-by-reference, package structure as directories, naming conventions as component layout, …), so it is wise to open up for alternatives. The realization strategy is nebulous, and the David Wheeler argument again strikes. JRuby wraps Ruby code in Java classes, which are instantiated by a Java application server, which is run by the virtual machine, which relies on operating system libraries, which rely on the operating system core functions, which …. Layers over layers over layers.
|
|
|