Monday, November 12, 2012

Lazy Evaluation in C++ & Scala

Target audience: Intermediate
Estimated reading time: 3'



Overview
Lazy evaluation is a not a new concept in software engineering. The concept is rather simple: initialize a variable or an object once it is needed. A typical use case is the instantiating of a large object in memory from the content of database: the lifecycle of the object is shorter than the application so it is convenient to allocate the memory when and if needed.
Both C++ and Scala support lazy evaluation using two very different approach: C++ relies on a class constructor to initialize iteratively its attribute member of a class while Scala supports lazy initialization natively.
 

Note: For the sake of readability of the implementation of algorithms, all non-essential code such as error checking, comments, exception, validation of class and method arguments, scoping qualifiers or import is omitted.


C++
Lazy evaluation or fetching can be implemented in C++ using a design pattern. Let consider a large object which state is stored in a remote database or initialized through a request REST API. In many cases, only some of the attributes of the object need to be initialized from the remote data source (table for instance): load the content of the entire table would not make much sense.

How C++ address the problem? Let's consider a class that contains all the information regarding the visit of a user in a user.
 
class SiteVisit {
private:
  int  id;
  mutable string*  url;
  mutable long  visitDate;
  mutable std::list* targets;
  mutable char[4] sources; 
  void loadFromDB(std::list*)

public:
  SiteVisit(int id) : id(id), url(null), visitDate(-1L), targets(null), sources(null) { }    
  const string& getName() const;
  long          getVisitDate() const;
  std::list&    getTargets() const;
 const char[4]& getSources();
}


It is conceivable that not all the data describing the visit has to be retrieved for analysis or reporting. For instance, there is no need to load the targeted URLs, for all the visits from the database until a report needs to be created. In this case, the role of the constructor in C++ is to initialize the unique identifier for the object (or table unique primary key) and nothing else. The getTargets method retrieves the list of pages visited at a later stage in the application lifecycle. Therefore, this list is generated only once; the first time it is invoked. The method retrieveAllVisitedPages, iterates through all the pages of the site that have been visited.

std::list<string> SiteVisit::getTargets() {
  if( targets == null) {
     targets = new std::list
     loadTargetsFromDB(targets);
  }
  return targets;
}   

int retrieveAllVisitedPages(std::list&amp; visits, std::map* visitsMap) const {                           
  std::list::const_iterator visitsIt;
  std::list::const_iterator urlsIt;

  int counter = 0, totalPages = 0;
  for( visitsIt =visits.beging()l visitsIt !=visits.end(); visitsIt++) {
    std::list* urls = visitsIt.getTargets();   
    for( urlsIt = urls.begin(); urlsIt != urls.end(); urlsIt++) {
       counter = visitsMap[urlsIt];
       visitsMap-&gt;insert(std::make_pair(urlsIt, ++counter));
    }           
    totalPages += urls.size();
  }
  return totalPages;
}

Although feasible, the implementation of the lazy initialization is quite cumbersome as it involves at a minimum, a constructor and a method (2 step initialization). 


Scala
Scala supports lazy values and lazy local functions. Under some circumstances, the user may want to initialize a value only when it is required. A database developer would qualify this pattern as a write once, read many times pattern.

Note: The developer needs to keep in mind that lazy values or expressions are checked whether the value has been indeed already initialized, each time it is accessed. Such validation is to be thread-safe, and consequently adds overhead to the run-time execution.

Let's evaluate or experience the concept by writing a simple example in Scala. The SiteVisits class implements the basic function to retrieve the list of URLs/pages a user identified by its id access on a specific web site.

class SiteVisits(userId: Int, webSite: String) {
  lazy val urls =  {
    val urlsList = new ArrayBuffer[String]
   
   // Initialize the urls list from the database
    query(userId, webSite)
    urlsList.toArray
  }
  ... 
  def isValidWebsite; Boolean = ....
}

The list of URLs is not loaded from the database when an Instance of SiteVisits is created but when the value urls is read by the client code. For example, the methods of the class SiteVisits such as isValidWebsite can be executed without having to access the list of URLs the user visited on this particular web site. Let's look how lazy evaluation can be applied.

val siteVisit = new SiteVisits(91841, "www.yahoo.com")
  ... 
  if( siteVisit.isValidWebSite )  
     Console.println(siteVisit.urls.mkString(", ") 
  ..
}

In the example above, the list of URLS is not loaded if the site is not valid.


References
  • Effective C++   - S. Meyers  - Addison Wesley 1991
  • More Effective C++  - S. Meyers  - Addison Wesley 1993
  • Programming in Scala  - M. Odersky, L. Spoon, B. Venners  - Artima 2007
  • https://github.com/prnicolas