CHAPTER 31


				INTRODUCTION TO MEMORY MANAGEMENT



	  This chapter focuses on ~memory ~management.	Throughout the preceding
	  chapters, we assumed we could acquire a new block of memory just by
	  wishing for it.	 Memory management is a subsystem which accomodates
	  such wishes.  (Chapter 9 introduced some uses of blocks of memory).

	  This chapter introduces no new language processing techniques.
	  However, all our language processing techniques and numerous other
	  computer applications require automatic memory management.  Automatic
	  memory management is often referred to as ~garbage ~collection.

	  There are a variety of memory management techniques, all of which have
	  some drawbacks of varying severity.  We begin with the simplest
	  possible memory management technique.


31.1
Simplest Memory Management

	pic

	  The simplest form of memory management allocates each desired block
	  from the top of memory.  Figure 31.1(a) shows all of memory.  New
	  blocks are allocated from the top of the figure.  Figure 31.1(b) shows
	  the situation at any point in time:  Previously allocated blocks appear
	  at the top, and the remaining bottom part is free space.	It is
	  unused presently.

	  We maintain a pointer to the ~beginning of free space, and hold it in
	  a variable called FREE_SPACE.  In the figure, everything above
	  FREE_SPACE is used.  FREE_SPACE itself points to the very first bin in
	  free space.

	  We allocate a block via the procedure:

		    ANSWER:=  FREE_SPACE ;
		    FREE_SPACE:= FREE_SPACE + size_of_the_desired_block ;

	  This procedure leaves in ANSWER the address of the first bin of the new
	  block.  That is presently the first bin in free space.  This program
	  then moves FREE_SPACE downward so as to point to the first bin
	  following our new block.  That is, after allocating the new block,
	  free space decreases in size, and FREE_SPACE is left pointing to the
	  now first free bin.

	  The drawback with this simple method is that eventually we will run
	  out of memory.	Free space will reduce to zero size, making it
	  impossible to allocate any new blocks.	At this point, the entire
	  system crashes.

	  The system crashes even if only ~some of the allocated blocks are
	  still needed.  It is easily possible that 90% of the blocks allocated
	  are no longer used.  If we could discover blocks no longer in use,
	  we could reuse them and hence avoid the system crash.


31.1.1
Recycling Of Used Blocks

	  Suppose the computer program knows when it is done using a block.
	  Can the program give that block back, i.e., ~free it?  Yes.  If that
	  block were the last block allocated, we could give it back simply by
	  restoring FREE_SPACE to its value prior to that block's allocation.
	Just subtract the block's size from FREE_SPACE.

	  It is very rare that the program wants to free the
	  ~last block.  Most often, the block lies somewhere in the middle of
	  the used portion of memory.	 We must therefore provide a way to
	  represent free blocks that reside in the middle of used space.


31.1.2
The Free List

	  The nice thing about a free block is that we, the memory management
	  subsystem, can store ~anything we want into the free block.

	pic

	  Since all free blocks contain ~at ~least one bin, let's make use of
	the first bin in all free blocks.  We will have the first bin in each
	point to another free block.  Figure 31.2(a) shows a set of
	four free blocks, with one pointing to the next.  Such a list of
	free blocks is called the ~free ~list.  The ~order of the blocks on
	the free list is irrelevant.  All we need from the free list is a way
	to get to ~another free block.


31.1.2.1
Allocation From The Free List

	When no more memory is available, i.e., free space is empty, we look
	instead at the free list.  The variable FREE_LIST always points to the
	first block in the free list.  We take that first block and deliver it
	as the newly allocated block.

	The allocation procedure follows:

		ANSWER:= FREE_LIST ;

		FREE_LIST := the contents of the ~first ~bin of the block
			     to which FREE_LIST points ;

	The new block (ANSWER) is the same block that FREE_LIST pointed to.
	FREE_LIST is then updated so as to point now to the ~second block
	on the free list.  The contents of the first block can now be
	dictated by the user program.


31.1.2.2
Freeing Blocks By Putting Them Back Onto The Free List

	When the user program knows that it won't be using a given block
	  anymore, it can recycle it by putting it back onto the free list,
	  as in the following.	Assume X points to the given block:

		    the_first_bin_of X := FREE_LIST ;

		    FREE_LIST:= X ;

	  FREE_LIST is made to point to the block X, but first, block X's first
	bin is made to point to the old block referenced by FREE_LIST.  This
	transforms figure 31.2(d) back into figure 31.2(a).  The free list is
	lengthend by one.


31.2
Fixed-Sized Versus Variable-Sized Blocks

	This scenario of taking blocks from and putting blocks back onto the
	free list works this easily only if all blocks are the same size.
	When blocks can differ in size, it may be that the first block on the
	free list is ~too ~small to serve as the newly desired block.  It
	might be 8 bins long whereas the newly desired block may require 9
	bins.

	When block sizes may differ, we refer to the set of all blocks as
	~variable ~sized.  This does not mean that a given block can grow or
	shrink in size.  It means merely that different blocks can have
	different sizes.  In contrast, when all blocks are the same size, we
	refer to the set of blocks as ~fixed ~sized.

	If we want to accomodate variable sized blocks, we must be prepared
	to search the freelist to find a block that is big enough to hold the
	newly desired block, as the first blocks on the free list might be
	too small.

	The time consumed by and the complexity of the support for variable
	sized blocks can be enormous relative to that for fixed sized blocks.
	We will return to consider variable sized memory management in Chapter
	34.  We now concentrate on the simpler, and more efficient, fixed
	sized block memory management.

		BOX:	Where does our simplest form of memory management find
		BOX:	memory for new blocks?
		BOX:
		BOX:	What is the ~free_list for?
		BOX:
		BOX:	Can a block change size within variable-sized
		BOX:	memory management?


31.3
Why Automatic Memory Management?

	Memory management provides for the ~allocation of new blocks and the
	~freeing of blocks no longer in use.  Allocation and freeing sound
	symmetrical, but the complexity of the computation required to perform
	each differs dramatically.

	The allocation of blocks is particularly easy for a computer program
	to perform because the program knows most definitely when it needs a
	new block of memory.  For example, in ICL when one writes:

		P :=  [ REAL_PART: 1   IMAGINARY_PART: 2 ]  ;

	pic

	ICL knows to allocate a block of memory.  ICL might allocate a two-
	bin block, and then fill in the two bins with the two components, as
	shown in figure 31.3.  Any time one or more data are to be associated
	with one another, a new block of memory is allocated to hold those
	data.  Thus, it is always easy to know when to ~allocate blocks.

	In contrast, knowing when to free a block may be next to impossible.
	There is no notation that corresponds to freeing a block.

	Automatic memory management (garbage collection) never requires
	the program to specify what blocks to free, or when to free them.
	~Block ~freeing is done automatically, behind the scenes.  Thus, all
	that remains to do is the easy part, the allocation of blocks.

	Non-automatic memory management requires the user program to know
	when to free what blocks.


31.3.1
Manual Block Freeing Can Give Rise To The Most Severe Of Bugs

	You might believe that it is easy for a program to know when it is
	finished using any given block.  In fact for nontrivial computer
	applications, it may be next to impossible to decide whether a given
	block is in use or not.

	The computation of when to free what blocks can be arbitrarily complex.
	For example, after writing:

		X:= [ REAL_PART: 1  IMAGINARY_PART: 2 ] ;

	we know that the newly allocated block (figure 31.3) is in use, because
	X points to it.  If the very next thing written were:

		X:= [ REAL_PART: 3  IMAGINARY_PART: 4 ] ;

	pic

	then the old block that X used to point to is no longer in use.  We
	see here one way a block ceases to be used.  The one variable that
	pointed to it, X, is made to point elsewhere (figure 31.4(a)).

	This example of a block ceasing to be used might lead one to believe
	that whenever you assign into a variable (X), the block to which it
	~used to point ceases to be in use.  However, this is often incorrect.
	Consider the following program:

		X:= [ REAL_PART: 1  IMAGINARY_PART: 2 ] ;

		Y:= X;

		X:= [ REAL_PART: 3  IMAGINARY_PART: 4 ] ;

	The second assignment makes Y point to the same block to which X
	points.  By the time we arrive at the last statement, Y
	continues to point to the old block.  Upon executing the last
	statement, the block to which X used to point ~is ~still ~in ~use,
	from Y's point of view!	 (See figure 31.4(b)).	Therefore, we must not
	  free it just because X alone ceases to reference it.

	  The difficulty in knowing when to free a block arises because blocks
	  may be ~shared from multiple points of view.	We just saw a block
	  referenced from two points of view, X and Y.	A block might be
	  referenced also from other blocks, as the X block is referenced as a
	  result of the following:

		    Z := [ FIELD1: X  FIELD2: 3 ] ;

	  The new Z block comes to reference the old X block.	 Now, even if X
	  and Y were assigned new values, so as to cease referencing the old
	  X block, Z still requires that block, as it remains accessible from
	  the point of view of the Z block.


31.3.1.1
Freeing Too Late

	  Without automatic memory management, where explicit freeing is
	  required, you can commit errors by freeing a block too soon or too
	  late.  If you free a block too late, your program will still run
	  correctly, although it will crash when memory is used up.

	  By freeing too late (like never), memory eventually fills up with
	  unused blocks that can't be recycled because they are thought to still
	be in use.  Even if there is only one place in your program where you
	forget to free a block, repeated executions of that program fragment
	will leave unfreed many blocks.  Thus, freeing too late or never can
	crash a program in short order.


31.3.1.2
Freeing Too Soon

	While freeing too late eventually crashes the user program, the
	program's behavior up until that point is still correct.  In contrast,
	  freeing too soon can quickly cause severe program errors.

	pic

	  When a block is freed too soon, it comes to reside on the free list
	  ~and continues to be in use from some point of view (figure 31.5(b)).
	  This is exceedingly dangerous.

	  Eventually, that block will be allocated (off the free list) to some
	  other need (figure 31.5(c)).  At this moment, the old point of view (Y)
	  believes that that block belongs to it, while simultaneously this new
	  point of view (W) believes that the block belongs to it.	Whenever one
	  or the other points of view writes into that block, the other point of
	  view will be profoundly fooled.  They are using the ~same memory for
	  two unrelated purposes.  This is analgous to two cars occupying the
	  same spot an an intersection - a collision.


		    BOX:	Which is harder to know: when to ~free a block or
		    BOX:	when to ~allocate a block?
		    BOX:
		    BOX:	What can happen if you free a block too late?
		    BOX:
		    BOX:	What can happen if you free too soon?


31.3.2
Debugging Premature Freeing

	  Suppose you commit this error of freeing a block too soon.  How hard
	  is it to find this bug and correct it?	This kind of bug is by far the
	  hardest to find of all kinds of bugs.

	  Imagine you are one of the contexts that believes this block is yours.
	  It may be the POINT datum in your variable P, for example.  The symptom
	  of this kind of bug is the mysterious overwriting of data in the block.
	  For example, the X-coordinate of your point is overwritten.  It might
	  even be overwritten by data of a foreign type!

	  Suppose you discover the two contexts involved.  One is victimized by
	  having its data overwritten.  The other context does the writing.
	  Even knowing these two contexts, neither is at fault!  ~Someone ~else
	  ~commited ~the ~error by freeing a block prematurely.  That prematurely
	  freed block has evolved into a conflicted usage.

	  Tracking down such a bug requires consideration of the entire program.
	  Unrelated parts of the program may be interacting unexpectedly by
	  chance, by sharing the same memory inadvertently.  What's so bad about
	this bug is that the symptom can be understood only by considering
	interactions in computer memory.  There is no straightforward
	way to get to the cause from the symptom.


31.3.2.1
The Symptom Is Always Foreign To The Application Domain

	Such mysterious overwriting corresponds to nothing in the application
	domain.  Of all the ways objects in the application domain may
	interact, such conflicted memory usage can introduce other,
	~meaningless interactions foreign to the domain.  Unfortunately, the
	programmer has to actually consider such meaningless interactions in
	order to make sense of the situation!

	There often is a loss of datatype integrity.  The same block
	could be used to represent simultaneously two objects of distinct
	types.  Even in strongly typed languages like ICL, ADA, and PASCAL,
	premature freeing undoes that type checking.  In PASCAL, and ADA
	without garbage collection, type integrity is vulnerable, despite the
	thorough compile-time type-checking.

	For example, one context may use the block to represent an apple,
	whereas the other uses it for an orange.  It is most likely that in the
	computer application, no apple ever spontaneously becomes an orange.
	However, if the apple and orange use the same block, then "taking a
	bite" of the apple could have adverse effects on the context that
	expects an orange.

	If we're very lucky, the act of taking a bite of the ~apple will be so
	  civilized as to correspond to taking a bite from the ~orange (but
	  we'd still be puzzled why the orange had a byte taken from it).  This
	would be a fortunate situation because the operation of taking a bite
	would be common in both contexts, and is somewhat understandable by the
	programmer.

	In fact, it is very unusual for an operation on one type
	to make any sense whatsoever within another type.  For example, if
	instead of apples and oranges we were dealing with INTegers and REALs,
	the operation of incrementing a REAL (from one of the points of view)
	would make no sense from the other point of view, the context that
	expects a INT.  Incrementing an INT and a REAL are very different
	operations.

	It is the loss of datatype integrity that really mystifies the
	situation.  Such errors cannot be understood in terms of the
	application's universe.	 They can be understood only in terms of the
	  details of computer memory, including the notions of pointers and
	  addresses.  This can be daunting if thousands or millions of blocks
	  exist.

	  To find such a bug, the poor programmer is forced to understand
	  his or her application in a wierd universe where apples and oranges can
	  spontaneously interact in meaningless ways.

	  Thus, the manual freeing of blocks has a thin tightrope to walk indeed.
	  Freeing too late eventually crashes the system, and freeing too soon
	  causes incomprehensible errors and the loss of datatype integrity.
	  These kinds of bugs are so severe that introducing a schedule for
	  fixing such a bug becomes impossible.  We've seen errors of this type
	take at least three months each to find, which is very possible in the
	C programming language and in assembly language.

	By not providing automatic memory management, you're forcing down the
	  programmer's throat a domain foreign to his or her application, a
	domain that is fraught with details and involves meaningless
	interaction among vastly unrelated parts of the entire program.

	Automatic memory management is a must within complex applications.
	The single greatest contribution of the programming language LISP
	may have been its automatic memory management.  That gave the biggest
	boost to the field of artificial intelligence, which began with and
	continues to use LISP.

		BOX:	Can compile-time type-checking be rendered ineffective
		BOX:	by freeing blocks too soon?  Why?
		BOX:
		BOX:	Does freeing too soon cause a symptom that is always
		BOX:	meaningful in the application domain?
		BOX:
		BOX:	Why is it difficult to debug premature freeing, even if
		BOX:	you've identified the two sections of your program that
		BOX:	are inadvertantly sharing the same block?