Core i7 (Nehalem) reviews

GSH · November 03, 2008, 12:06:07 PM

They're starting to trickle out. See http://techreport.com/articles.x/15818 , for example. Pricey, but powerful.

-- GSH

Avatar · November 03, 2008, 12:42:25 PM

Ouch... pricey yes... be awhile before I see one of these (probably used) sitting on my desk...

When will Intel and AMD 'merge' with some of the software giants such that when a new processor comes out there's software around to take full advantage of it? Imagine the sales boost from kicking these out and having a Photoshop version optimized already for it? Or a video production suite?

Why do they continue to leave it up to the OS to make some use of the new features and power of new CPUs?

-Av-

CmptrWz · November 03, 2008, 12:44:09 PM

Because the majority of software doesn't talk directly to the CPU, but rather has the OS talk for it? At least for management of various things like coexisting with other software.

GSH · November 03, 2008, 01:33:46 PM

The Nehalem reviews are showing speed boosts in existing software, no special optimizations needed. This happens almost every round of new CPUs from any manufacturer -- they're just plain faster, out of the box, than the old models. (We'll ignore the disaster that was the P4 for now.) This is because the CPU makers want to be able to show a speedup on day 1. (And, as core i7/nehalem hasn't officially *launched*, we're technically still before day 1)

As to why there isn't specially-optimized software on day 1, it's usually a case that the resources aren't there to support it. To have Photoshop (to take an example) optimized, my guess is that it'd take at least the following: (1) first round of Nehalem HW arrives in Adobe's offices ~6 months before launch, about 4-8 systems. (2) 3-5 mid-level engineers benchmark the heck out of operations, identify biggest hotspots. (3) If using Intel compiler (which costs $$$), recompile with beta versions, otherwise, consider using assembly (senior engineers). (4) ~3 months before launch, QA gets involved, needing 20-40 systems, and they hammer on things to ensure there are no differences. (5) Betas released at launch. All that is at least a million $ of cost. Is that a good use of resources? Maybe, maybe not.

What Nehalem and other systems these days benefit most from is this: parallel programs. A single-threaded app is going to gain *nothing* from leaving 1-7 cores unused. A lot of software is not multithreaded at all, or only uses 2 threads at max. This is because multithreaded apps are not easy, despite decades of research. (Running multiple 1-thread apps does show a benefit on modern systems, because process-level parallelism is trivial for the OS.) There are some apps -- rendering, video encoding -- that are trivial to parallelize, as they do the same operation to distinct data blocks, and there's very little dependencies between the operations. Making everything else parallel is not so easy.

-- GSH

OvermindDL1 · November 03, 2008, 02:08:19 PM

Which is where my new language will come in very useful (gah, I wish I did not have classes, without them the language could be 'usable' right now... grr... classes seem like such a waste of time... freaking biology and such...). Naturally parallelized in the Actor model (like Erlang and such), but it compiles down to machine code with such optimizations that it even outperforms equivalent C programs compiled through VS with high optimizations.

Gah, I wish it was usable now... things I need to do in it. At least it will scale up with multiple cores pretty arbitrarily unlike other 'modern' languages...

With how old Erlang is though, just in the past few years it has started to see a rather massive surge of new programmers, for rather obvious reasons, so it is just now coming into its own I guess.

Generated by OvermindDL1's Signature Auto-Add Script that OvermindDL1 did manually since Greasemonkey does not work in Firefox 3.1 yet...

GSH · November 03, 2008, 02:32:59 PM

OM- while I don't disagree that other languages may make parallel processing easier, they're still going to run afoul of http://en.wikipedia.org/wiki/Amdahl's_law Amdahl's law. Communication overhead will take a big chunk of time when more CPUs are thrown at problems. (Less so for extremely data-parallel operations like video encoding.) I wish I could quickly find the links in the http://www.realworldtech.com/index.cfm discussion boards w/ posts from Linus Torvalds (yes, that guy) noting that bus contention for memory locks is going to hurt message passing and other techniques used by these parallel languages more than their proponents realize. Sharing of mutex/lock memory across cores causes more slowdowns than multiple things hitting separate chunks of memory.

-- GSH

Angstromicus · November 03, 2008, 02:42:56 PM

Hopefully 3D processors will circumvent that

.

Nielk1 · November 03, 2008, 03:18:46 PM

Even quantum computers would have a max limit and diminishing returns. All they can do is do it in the least steps possible.

Angstromicus · November 03, 2008, 03:28:23 PM

We've got a lot of time before we start reaching the theoretical limit of computational power. We simplified things by taking the randomness out of events that happen. The more advanced CPU's get, the more random the events seem because there are a lot more of logically defined events going on. If we can continue the trend, then, perhaps, neutron stars will be the topic of CPU powerhouses :- P.

OvermindDL1 · November 03, 2008, 03:35:05 PM

That is the thing about the Actor model though, there is no locking, no mutexes, no global state (it is a *very* different way to program in, I hated it at first, but the more I used it in Erlang the more I became addicted to the model). The Actor model, if I had to come up with a way to describe it, imagine a single program made in Erlang as being lots of tiny little programs, each running on their own computer, each with their own event loop (if they need one) and so forth; they all communicate by sending a message to another computer (can be either reliable or unreliable, depending on the message needs, although in Erlang it is always reliable, although that still does not mean the other actor got it, such as if it dies before receiving the message...). As such, there is no global state on the internet, the only states are kept within the individual computers (the individual Actors). When an Actor passes a message to another Actor, the first Actor either sends the message off and continues (if the other Actor is, say, on another core or another computer, the message is actually passed to another actor, which then handles the proper routing to pass it to another core, which would just be a pointer pass, or serializes it up to send across a network or so if on another computer, and the receiving Actor will 'eventually' get it, that is the big thing about the Actor model, it is defined as "Unbounded Nondeterminism", which is a completely foreign concept in near all programming languages), or if the receiving Actor is on the local core/thread (this is faster then it sounds...) then it runs immediately to handle the message, quite literally this is just compiled as a function call to put it more blunt (in reality it is usually a long jump in machine code) and when that Actor finishes processing it, it either passes messages to other Actors (control continues to be passed around), or it waits to receive something (in which case the next Actor in the queue for the core/thread is resumed), or it dies (in which case the next Actor in the queue for the core/thread is resumed).

In a more direct way, you can think of it as cooperative multi-tasking as well, things can take a huge CPU slice if the so wish, but you trust the programmer not to do anything stupid (just like in C/C++/etc... the programmer can grind his program to a halt, or if they do it well it works very fast), and if they just pause every hundred or thousand loops if they are doing a very long loop for example, they can just receive() with a time wait of zero, so it will be called instantly again when anything else needs to be run.

Usually the system also starts up a lot of utility Actors (everything is an Actor, even message queues for other Actors are implemented as Actors, although most of the annoying code is hidden by the compiler). For example, a usual Actor that the programmer may want the system to always start up is a utility Actor to just watch other Actors to see if they have built-up messages, in which case it can cause it to switch cores (which is done by that an Actor just always has a Queue actor which you can query on your core to see if there are any pending messages, can also have versions that keep a time since last access and so forth).

In most Actor programs there is actually no 'real' message passing except between computers, everything is generally compiled down to a longJump/functionCall and most of that optimizes away to get better then C speed (since the compiler does not have to worry about aliasing and other annoyances, it is free to optimize a great deal more code then most C/C++ compilers could ever dream of). Even Erlang, being a recently JIT'd language (used to be full interpreted), still beats C++ in many ways, especially in problems that are heavy in parallelism (The Yaws web server, built in Erlang, is consistently the fastest web server out for any kind of decent load, beats Apache by far, and even the faster lighter-weights it still beats without breaking a stride).

Because of the breaking of states between Actors, no global state, no worrying about alias'ing, etc... the compiler is able to compact message passes together (for example, if an Actor creates an actor to do utility work, and that second Actor is always *only* accessed by the Actor that created it, which is a very common thing in Actor programs, the compiler then does a lot of inlining, compressing data structures, gets rid of all the address indirections, and just generally optimizes it out by usually putting the entire Actor on the first Actors 'stack', if the address is ever passed outside of it, it is still free to do most of that optimization, just some things like inlining *between* actors cannot be done, not that, that is any kind of a big deal).

It is designed to be well optimizable, while being able to expand arbitrarily, but it does require a vast different in how you 'think' when you program. You have to take into account that messages may come in, like thirty seconds after it was sent, it may come instantly (like being optimized out), you may suddenly get a barrage, or you may be doing nothing for a long time. It is generally considered that if an Actor is so big that the single Actor does not fit on your screen, you should probably break it up into more Actors.

Honestly I did a horrible job explaining it, and I would supply a lot of descriptive code examples, but I do not have the time to devote right now, although I can later if you so wish. The Actor model has been well studied for multiple decades, and it did have problems at the start (mind you, this was before even the OO model existed, Actor is one of the oldest out), but it was pretty well 'perfected' roughly 2 decades ago as I recall.

But yes (rereading your post), the Actor model is the embodiment of multiple things hitting separate chunks of memory, that is one of the very definitions.

Generated by OvermindDL1's Signature Auto-Add Script that OvermindDL1 did manually since Greasemonkey does not work in Firefox 3.1 yet...

Avatar · November 03, 2008, 03:39:42 PM

Um... hmmm...

Sounds a bit like the Internet, with a bunch of interconnected 'systems' of various types doing various things by themselves and together in clumps, talking over a semi-reliable mess of a network...

-Av-

cheesepuffly · November 03, 2008, 04:44:35 PM

So what makes this thing so special?

VSMIT · November 03, 2008, 10:06:15 PM

It's a new microarchitecture, but aside from the possibility of having 8-core processors, the difference is not as great as the Conroe microarhitecture jump.

VSMIT.

Red Devil · November 03, 2008, 11:28:18 PM

Optical processors will dwarf anything.

TheJamsh · November 04, 2008, 08:23:24 AM

didnt they (broad term

) make a working electron transistor gate? basically a single electron opened and closed the transistor.

imagine if they get those into processors. we are talking ridiculous speeds. however i dont think thatll happen for a LONG time yet.

Battlezone Universe

News:

Core i7 (Nehalem) reviews

GSH

Avatar

CmptrWz

GSH

OvermindDL1

GSH

Angstromicus

Nielk1

Angstromicus

OvermindDL1

Avatar

cheesepuffly

VSMIT

Red Devil

TheJamsh