My first real use of Scala Actors

I have written lots of multi-threaded code in C++ and more recently Java. I have always found it a bit tricky to get right, but more importantly it was easy for someone else maintaining the code to trip up on some of the subtleties. All these mutable data structures that got shared between threads. You want to minimize locks for performance, but that can cause havoc when reorganizing code later (possibly years later).

Actors is another approach for concurrency. Its much more light weight than threads (some use the term ‘fibres’). I heard about Actors in Scala first, then back tracked to Erlang. It felt like a much better way to go. No locks. No need to write fancy queues or other concurrency constructs. I love the way Erlang structures are always immutable, making it *always* safe to share data structures. (Scala does not guarantee this, but its easy to just obey the rules.) But it has all been theoretical.

I hit a problem where I need to get a big batch of data from a database then do lots of accesses to it in memory. It was an interactive application (a user is waiting for responses). So what I really wanted was one thread in the background populating the data structure, where another thread servicing the user could look up values, returning immediately if the results had been loaded already but blocking if the required data was not yet available. I started thinking about the various concurrency constructs to do this in Java. I needed a condition variable to check if all the data was available, a background thread, and so on. For fun, I tried writing the core class for keeping all the results in Scala with Actors instead to see how it came out. For simplicity I assumed the data was an array of integers, where I had the index of the value I wanted to look up.

My Scala Solution

import scala.actors.Actor
import Actor._

case class MoreValues(values: Array[Int])
case class GetValue(index: Int)

object MyArrayActor extends Actor {
private var array = new Array[Int](0)

def act = loop {
    react {
      case MoreValues(values) => array ++= values
      case GetValue(index) if index < array.length => reply(array(index))
    }
}

this.start
}

That was it! No condition variables or any locking constructs in sight! I tested it with the following scrappy code.

object Runner {
def runThread(f: => Unit) =
(new Thread(new Runnable {def run(): Unit = f})).start
}

Runner.runThread {
println(“GetValue: ” + (MyArrayActor !? GetValue(10)))
}
Thread.sleep(1000)
MyArrayActor ! MoreValues(Array(1,2,3))
Thread.sleep(1000)
MyArrayActor ! MoreValues(Array(4,5,6))
Thread.sleep(1000)
MyArrayActor ! MoreValues(Array(7,8,9))
Thread.sleep(1000)
MyArrayActor ! MoreValues(Array(10,11,12))
Thread.sleep(1000)
MyArrayActor ! MoreValues(Array(13,14,15))
Thread.sleep(1000)

The above test code spawns a separate thread to fetch a value from the array in parallel to new values being added. I snuck some extra println() calls into the react loop to prove when messages were processed. In the test code, ‘!’ means send a message and ‘!?’ means send a message and wait for a reply. So to get a value, ‘!?’ was used (and the Actor contains a reply() call to send the value back), whereas to just add values (with no feedback) ‘!’ could be used. Using ‘!’ instead of ‘!?’ has the advantage that code fetching rows from a database (for example) can go full speed. There is no need to wait for the Actor to respond.

How it Works

First let me say this is my first Scala code doing anything more than a single line expression – I am not claiming its perfect Scala. But lets break it down.

The ‘case class’ declarations I think of like structs in C. You define a series of fields and Scala automatically creates a constructor, toString(), equality comparison, etc for you. They are particularly useful in pattern matching ‘case’ clauses as will be seen (which is where I assume the name came from). I have read that using them instead of tuples (as you would in Erlang) gives you more type safety when sending messages. If you change what an Actor can receive, and change the case class, calling code will break. If you just used an anonymous tuple as you would in Erlang, then a caller who was not updated would not fail at compile time. Hence code maintenance is easier.

The ‘object’ is like a class declaration, but it immediately creates a single instance of the class as well. In production code I would use a class (not an object), but object was easier for my play test.

Inside the MyArrayActor there is a private variable ‘values’ holding an array of integers representing data loaded so far. Then there is an ‘act’ method which has a loop processing react calls. I vaguely remember reading why ‘loop’ is used here. It is a special Actor construct, not a ‘while(true)’ as you may expect. The ‘react’ statement waits for a message the dispatches it based on the ‘case’ clauses (the messages the Actor expects to receive). I don’t fully recall all the semantics here just now – but it seems pretty easy to remember to always have ‘def act = loop { react { case … case … } }’.

Oh, if you are not familiar with Actors, the idea is an Actor has a mailbox (a queue of messages) which it reads from. An Actor typically loops around receiving messages, doing some work, then waiting for the next request. Actors are effectively synchronized blocks in Java – there can be at most one thread running an Actor at a time. Actors are not bound to threads – you can have thousands or even millions of Actors and say one thread per core. Threads jump around and run bits of various Actors. It can actually work out quite well, as the thread that runs an Actor that sends a message to another Actor may then jump over to the recipient Actor and do the work – meaning all the chip cache etc are all nicely primed. So don’t think Actor = Thread. Actors are much lighter weight, and you can have as many as you can fit in memory. Threads are heavier weight in that they also need stack space, so there is a much lower ceiling on how many you can have.

So lets look at what messages the Actor can receive and process.

The first message is a MoreValues request, which passes in an array of integers to be added to the array. ‘array ++= values’ creates a new array which is the concatenation of the existing array and the values array. Much easier than allocating a new array and copying the old array across as you would in Java. ‘values’ here in ‘array ++= values’ is a variable automatically declared by the ‘case’ clause and bound to the value in the MoreValues case class – funky pattern matching stuff that is very useful with Actors. The MoreValues message is not responded to. The Actor just swallows it up. (Note that I chose an array here as I will have potentially millions of values, so the memory overhead of a linked list seemed too high. But I have not checked if Scala will turn Array[Int] into a Java int[]. Something I need to look into.)

The second message is a GetValue request to return a value. It replies with the value in the array at the requested index. One of the funky things with Actors in Scala is that the ‘case’ statement can specify a pattern (the case class message) and a ‘guard’ condition. In my case I just added ‘if index < array.length’ which told it to only match if the requested index was within the array bounds. This avoided the need for condition variables. Messages in the mailbox that don’t match a pattern stay there. In my case, it will stay there until enough MoreValues requests have arrived to fill out the array to be big enough.

The last little bit ‘this.start’ was to make sure the Actor was started up automatically.

Conclusion

For my particular case, I truly believe the Scala code using Actors is much shorter, simpler, and more robust that a Java implementation that I could come up with. I really like the hierarchcal design break down too. You have machines with services (say a web server) with various code units (say a servlet), but now there is another layer which is an Actor. I prefer Actors to threads when thinking about concurrency as a thread is not limited to one portion of code – a thread may run lots of different code all around the place (depending on how you write your code). Actors on the other hand are isolated pockets of code. Its just easier to think about.

I do have some question marks in my mind however about Actors still. For example, what would happen if the array never grew to the required size? Would there be a memory leak as the GetValue request would sit in the mailbox of the Actor forever. I also wonder if there is any need for throttling controls on message queues. What if the producer creating new MoreValues messages was much faster than the Actor could receive and process? The inbox would potentially overflow. Does Scala have built in controls to avoid such problems? Hence I think Actors are a step forwards for writing robust concurrent applications, but not a silver bullet.

Tags: Actors, Concurrency, Java, Scala, Threading

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Alan Kent's Blog

My first real use of Scala Actors

My Scala Solution

How it Works

Conclusion

Leave a comment Cancel reply

Recent Posts

Categories

Meta

Calendar

Alan Kent's Blog

My first real use of Scala Actors

My Scala Solution

How it Works

Conclusion

Share this:

Related

Leave a comment Cancel reply

Recent Posts

Categories

Meta

Calendar