|
|
Posts by Peter:
July 28th, 2010 by Peter
6 years (!) after my initial Debian ITP (see here) and all the hard work on a custom Condor Debian package, they finally made it too:
http://www.cs.wisc.edu/condor/debian/
The Condor people are now offering there own Debian repository for the (longer existing) Condor DEB files. This allows you to have a decently updated Debian-based cluster with the latest version available. Great thing. Use it.
If you were using my Condor Debian packages in the past, you should stop that, and do a careful migration. The user names are equal, and the Wisconsin version of the installation scripts seems to be a little bit picky. I recommend to purge my package (for example with apt-get remove –purge condor) and check for any remainings, before you start with the new repository.
The installation is by default working in personal mode, which demands no user interaction during installation. The package looks like a completely new development, which is sad – some people (check the ITP) spend some time on things such as debconf support.
March 25th, 2010 by Peter
I am currently collecting the possibilities of getting EFI support in both physical and virtual machines. Here are the results I got so far. The list is constantly updated. EFI vendor and revision numbers are taken from the “ver” command in the EFI shell. Feel free to contribute !
| Virtualization Technology |
Processor Platform |
EFI Vendor |
EFI Specification |
EFI Revision |
| Virtual Box >= 3.1 |
IA32, X64 |
tbd |
tbd |
tbd |
| HPVM B.04.00.00 BL7 under HP-UX 11 |
IA64 |
HP |
1.10 |
14.62 |
November 4th, 2009 by Peter
HPCwire explains in a recent article how NVIDIA wants to offer access to a remote rendering cluster called ‘RealityServer’. The current description sounds more like a typical remote software offer (’software as a service’, if you prefer), and not like remotely accessible raw GPU cores, as the title suggests. Anyway, worthwhile a look…
http://www.nvidia.com/object/realityserver.html
October 5th, 2009 by Peter
My home standardization body OGF published the first list of requirements for a Open Cloud Computing Interface. One of the more interesting parts of this spec is the feature matrix for existing cloud APIs to be considered:
- Amazon Elastic Compute Cloud
- ElasticHosts
- Flexiscale
- GoGrid
- Sun Cloud API
- Rackspace Cloud Servers
- WMware vSphere
Their analysis shows some interesting facts:
- Amazon does not support persistent computer resources, most others do.
- Only Amazon has support for ephermal (real local) storage resources, which are huge and performant, but not resilient to hardware faults. Everybody supports virtual persistent storage.
- Static IPs and firewall features are about to become “cloud mainstream.”
September 24th, 2009 by Peter
In my middleware course, I asked the question of how to realize a statefulSingleCall remoting server. I expected to hear something about external state storage, e.g. in a database. To my surprise, most students proposed to use a static variable in the server implementation for keeping the state between calls.
This idea results from a pure programming-language thinking, and (for most students) from successful experimentation. Looking under the hood, it is not completely obvious that this will work in all cases. Static variables are managed by the CLR as entities bounded to a loaded class, in this case the server class. According to most sources, static variables are only garbage collected when the class is garbage collected. This can only happen if the surrounding application domain is unloaded. For a standard remoting server, this is very unlikely to happen ever.
Things become more interesting with a different runtime host than the operating system. If you use IIS, it could decide to unload your whole server application if it is not triggered often enough, or if memory is whole.
All in all, relying on static variables in a virtual runtime environment is bad style.
BTW: I know that .NET Remoting is deprecated in favour of WCF, but it is still a very helpful teaching tool.
September 24th, 2009 by Peter
I was completely unaware of this, but since Visual Studio 2008, you can debug into the .NET class library sources. Shawn Burke has a good explanation how it works.
April 7th, 2009 by Peter
If an existing web presentation is moved to a new URL scheme or domain, most people are unsure about their precious Google page ranking. Most sources agree about the following rules:
- Redirect every single URL by answering with HTTP error code 301 (Moved permanently). A 301 permanent redirect is not considered webspam by Google. Make absolutely sure that no 404 error (page not found) occurs somewhere. Do not perform wildcard redirects.
- Try to keep the content of old and new page somehow similar, in order to stick with the indexed key words from the old page.
- Invite GoogleBot by an updated sitemap file. If you have a sitemap for your old page, resubmit it, so that GoogleBot finds the 301’s earlier.
- It’s okay to have multiple pages that perform 301 redirects; but you should try to avoid multiple redirects one one URL (e.g. A -> B -> C -> D).
Sources:
http://www.mattcutts.com/blog/seo-advice-discussing-302-redirects/
http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=83105
http://groups.google.com/group/Google_Webmaster_Help/web/q-a-from-the-junetune-live-chat
March 27th, 2009 by Peter
I attended the MRSC 2009 conference, taking place at the Zuse institute in Berlin. Some impressions:
- The conference claims to be a “many-core and reconfigurable supercomputing conference”. In fact, it was more about FPGA-based hardware accelerators than about many-core technologies. The mixture of industry and academia talsk was quite interesting. And the catering was really good
- Prof. Reinefeld, the host of the conference, gave some nice introduction about his interest in FPGA accelerators. One of the main reasons is power consumption – the ZIB facilities increased their power consumption from around 90 kW in 1997, over 260 kW in 2002, to meanwhile 660 kW in 2008. He also told us that roughly 40% of the power consumption today accounts for cooling only. Specialized FPGA accelerators could allow to keep the speedup pace AND reduce the power consumption significantly.
- I learned that FPGA boards have some general properties that are relevant for their programming. They have a comparatively low frequency (hundreds of MHz), can support tailored data types very well (if programmed accordingly), and normally provide thousands of ‘cores’ in a SIMD-like fashion. FPGA mostly have a memory bandwidth problem, meaning that they produce the results too fast to put them away in time. All algorithms simply run in pure hardware. The latest (very impressive) trend are FPGA’s that fit into a standard X86 CPU socket. The FPGA tool vendor provides a FSB or HyperTransport implementation, which allows you to add a spezialized CPU with full RAM access to your SMP system.
- All speakers agreed that standard n-core processors, FPGA’s, and GPU’s all have their right to exist. Standard processors are best in control-flow and sequential activities, GPU’s are perfect for floating point work, and FPGA’s are well-suited for the optimized processing of application-specific data types.
- Since I am a fan of standardization, the according activities OpenFPGA and OpenAccelerator must be mentioned.
- Martin Herbordt gave a nice talk about the advantages of FPGA acceleration. I found the slides in a 2006 version. He explained programming models for FPGA, showing that according applications need heavy restructuring, and that performance is highly sensitive to implementation quality – even more than with MPP programming. His presentation showed that the tailoring of FPGA’s can bring fast implementations things that would be hard on a standard processor (e.g. random number generation or coordinate transformation). This doesn’t come for free, FPGA-enabled algorithms need to fit to the ‘vector-like’ architecture. Stream processing through a series of ALU’s seems to be one favourite approach. He explained an example were BLAST (an indexing problem) was accelerated by changing it to a streaming problem.
- Phillip Maar from University of Potsdam presented a combined solution to design multiprocessor system-on-chip (MPSoC) solutions. They take a C program and first parallelize it automatically by clan partitioning to a MPI application (yes, this was the weak part). In the next step, they perform a functional cycle-accurate simulation of this program to design an optimal FPGA layout. In order to bring the MPI program logic to the chip, they created a hardware version of an MPI subset (SocMPI).
- CAPS presented the HMPP workbench, which allows to parallelize C and Fortran software by preprocessor directives. The most interesting aspect is the support for hybrid systems with GPU’s, FPGA’s, multiple cores and other execution engines. The software has the concept of “codelets”, which are functions to be executed in a remote device or specialized core. The source code always remains independent from the target accelerator.
- Mitrionics was also an interesting company, mentioned by nearly everybody. Their major product is a virtual processor implementation for FPGA’s, which can be stripped down according to the application needs before it goes on chip. It reminded me of the operating system concepts in embedded systems, were you compile your own version by putting together only the relevant modules (e.g. Windows CE). In the Mitrionics case, you program against the virtual processor function set (so called tiles), and not the FPGA chip itself. The highly optimized set of standard ’tiles’ is turing-complete. All tiles in the virtual processor can be compiled with the bit width you need. This saves precious space on the FPGA chip. Mitrionics has an own parallel programming language for the virtual processor solution. The speaker spent some time on explaining why automated parallelization of sequential code can never work. His main argument was that the compiler would need to know all possible parallelization strategies for sequential control flow patterns in advance. (BTW, I completely agree that todays parallelizing compilers perform only intelligent algorithm pattern matching, sometimes supported by annotations such as OpenMP). The Mitrionics language is based on data dependencies only, without any execution order description. Somebody from the audience identified similarities to data-flow languages from the 80’s, so they might be interesting again.
- Somebody from HP labs talked about their view on the world – very generic. HP seems to count on heterogeneus system integration due to the power wall as near future problem.
- Microsoft showed the recent parallel computing extensions in Windows 7 and .NET. Nothing new, but nicely explained. For Windows 7, they claim better NUMA support (some lightly extended WIN32 scheduling API’s) and user-mode scheduling (UMS) as major things.The first thing is not really impressive – they basically provide group affinity support and extended information about the core-cache relationships. If you know Linux CPUsets, it’s more or less the same. UMS is interesting, since it immediately reminded me of NT4 fibers. The Microsoft guy (from Redmond, not Cambridge !) confirmed that it’s more or less the same, but Dave Probert states it in a different way. Everything will come with Visual Studio 2010, and no, this will not be based on Phoenix
- SiliconGraphics had a nice presentation with some internal details. The speaker explained how SG was suprised by the rise of GPU’s in the last months. Hardware vendors (ClearSpeed, NVidia, XDI) and tool providers (RapidMind, Mitrionics, Alinea) together provide a lot of alternatives to their solutions. He showed how SG is now providing tailored hybrid clusters, and talked a little bit about the main problems with them – power-on sequence, non-atomic resource allocation, SW incompatibilities and online diagnosis flaws. He also showed some really cool research prototype with 180 x 2 Atom processor nodes in one (!) 3U chassy. Plans go up to 10.000 cores per rack, all running under CentOS. Here is a picture.
March 23rd, 2009 by Peter
If you need a nice geek present, check this one.
http://www.domain-karte.de/
The german ISP company United Domains offers this map for sale, or even for free if you write a short notice about them in your blog. Just as I do …

February 24th, 2009 by Peter
For some measurement task, we needed to isolate one CPU core on an Intel Quad Core machine under Linux 2.6. With the background of SMP and NUMA system support, Linux provides the following options:
- isolcpus=[]: This is an old kernel boot parameter, which allows the isolate one or more CPU’s from the scheduler. Processes can still be placed on this processors by explicit affinity system calls. Processors are still utilized for interrupt processing, since they are only taken away from the OS scheduler. Modern kernels implement this feature with the (anyway given) scheduling domain support, a part of the cpu sets functionality. The kernel boot option was therefore lately requested to be removed.
- CPU sets: CPU sets constrain the CPU and Memory placement of tasks to only the resources within a tasks current set. Specifically, the sched_setaffinity() and the mbind() system calls are accordingly intercepted. The scheduler has also support to partition the system CPU’s into a number of scheduling domains. By default, there is one scheduling domain covering all CPUs, except those marked with isolcpus (see above). After mounting the according device, you can create custom CPU sets and assign tasks and processors to it. In our experiments, it was not possible to remove a core from the default CPU set. Aquick check of the sources showed that this is intentional, even though the error code is misleading:
mount -t cgroup -o cpuset cpuset /dev/cpuset
root@intel:/dev/cpuset# cat cpuset.cpus
0-3
root@intel:/dev/cpuset# /bin/echo 0,1,2 > cpuset.cpus
/bin/echo: Schreibfehler: Keine Berechtigung
- CPU hotplug: The CPU hotplug implementation provides exhaustive support for CPU management within the SYSFS file system tree. The implementation also frees the CPU from interrupts and timers. A simple echo 0 > /sys/devices/system/cpu/cpuX/online puts the particular CPU off-line. With a very recent kernel, it also works for single cores in our system:
root@intel:/sys/devices/system/cpu/cpu0# ls -l
drwxr-xr-x 5 root root 0 24. Feb 18:59 cache
drwxr-xr-x 4 root root 0 24. Feb 18:59 cpufreq
drwxr-xr-x 2 root root 0 24. Feb 18:59 microcode
drwxr-xr-x 2 root root 0 24. Feb 18:59 thermal_throttle
drwxr-xr-x 2 root root 0 24. Feb 18:59 topology
If you want to check if your disabling activity was successful, check the output of /proc/interrupts.
|
|
|