BST and Iteration vs. Recursion (Recap on Exercise 4)

With all the focus on recursion in the course, I found it suitable for the last exercise to code something we had previously learnt in a binary search tree but using iteration. In the code for insert, the objective is to first find the point of insertion through a while loop. This is done through the use of a tuple. By storing the current value as well as its parent, you can use the current item to stop the loop when there are no more children for the parent that its associated with. Once the spot you want to insert you value in, you can continue by making changes on the parent. A very similar approach is used for counting the number of a values greater/ less than a certain value.

Unlike the recursion method where you have a base case, and the function is called over and over again, until a condition isnt reached (thus reaches the base case), the while loop continues until a certain condition is met. Thus, despite not having the recall the function again, the amount of steps it takes to check a condition is quite similar, thus I would predict that the recursive formula is no less inefficient than the iterative one. One advantage of the recursive code over iteration, is that its slightly more elegant and shorter as you can see from the two codes provided from class website.

This site provides a good overview of iteration and recursion for the Binary Seach Tree: http://www.sparknotes.com/cs/recursion/examples/section5.rhtml

Efficiency and Sorting

What does Efficiency mean?

It could mean how long a process would take; how much a process would be reused; how much would it consume in the process. But in this class at least, it refers to algorithm efficiency – how much resources (like processing time) is required for a given size of input. So far, we have discussed aspects of it like runtime.

Binary Search Trees

Binary search trees are an example of improving the efficiency of searching a set of numbers and a conventional list if you want to search for the smallest value you would need to compare the numbers with every other number in the list. On the other hand, with a binary search tree, the tree has nodes with the left child being less than the value of the node while the right child is greater than the node. This potentially reduces the search runtime by half because you would only need to search one side of the tree to find the smallest value. WIth this said, a degenerate tree may take the same amount of time as with linear search.

Sorting and Efficiency (Lab 10)

In class we also focused heavily this week on performances of different sorting algorithms. Which type of sort is most efficient? Having once asked that question, the answer I received was that the efficiency would depend specifically on the situation and data set you’re given to work with. Analyze the code and you can adapt it to specific situations.

Lab 10 in particular, let’s us explore and draw a few conclusions when dealing with generated sets with inputs of 400 – 2400 range. At the point in the lab, we had already learned about the Big O’s for Selection Sort O(n^2), Bubble Sort O(n^2), Insertion Sort (O^2), Merge Sort O (nlogn), and Quick Sort O(n^2) and can observe how the worst case scenarios are reflected in the data.

Differences in Bubble Sort  Efficiency in Different Ordered Lists

Bubble sort is incredibly slow and inefficient for large sets of data, as we can already predicted from Big O compared to rest of the sorting methods.  We can extend that further to draw a conclusion that bubble sort completes sorting faster for a randomized set of characters or objects and that bubble sort on a randomized set is faster than on a reversed sorted list. From the code for bubble sort we can see that Bubble sort will compare each character to every single other character no matter what the order is on the first pass. Afterwards it will make n-1 passes through every single other character that is unsorted.Thus, the only difference between going through a sorted list, unsorted list and reverse sorted is the number of times, that the numbers need to be flipped with each other during the passes. It makes sense that the reversed sorted list will have the longest runtime since every element must be flipped during the entire process of each pass. From that, it’s a reasonable guess to also assume for small sets that are close to being sorted, bubble sort may be a good choice.

Which has better efficiency: Insertion Sort, Merge Sort and Quick Sort 

Something I came across on a forum was a discussion on when is a good situation to use Insertion Sort – sorting from the front and inserting each successive integer into the sorted list. In class, we have learned that for large data sets, quick sort and merge sort have better average and worst case performance (unless the pivot point in Quick sort is far too the right or left of the set and becomes at worst n^2 / same as insertion sort’s worst case). However, insertion sort can also be used as the base case for merge sort / quick sort.

The discussion is found here: http://stackoverflow.com/questions/736920/is-there-ever-a-good-reason-to-use-insertion-sort

Timsort:

Timsort is a topic which we were briefly introduced to but not explained in class. It is part of python as the built in sort and was tested in lab 10 to be the most efficient type of sort when implemented in python having the worst case runtime of nlogn. The first question I had after completing lab 10 was “why is the built in sort so fast?”. Reading a little into it, it turns out that Timsort falls in the category of using the merge sort / quick sort and insertion sort hybrid to highly optimize its process. (or perhaps the forum above was a reference to Timsort in disguise)

Comments on other People’s Blogs:

Default Values:

Something interesting I read about from one of the slog posts that I’ve never come across before is a special behaviour for how python deals with default values of mutable objects. We use default parameters when coding all the time (when you have no children in your tree, when you don’t have a certain variable and assign it None). However, when you have a default value as a list or a dictionary, you may come across the default parameter picking up values when a function was used on it previously. This is because

http://yuanslog148.blogspot.ca/2014/03/slog-10.html
http://effbot.org/zone/default-values.htm

 

Review of Linked Lists and Queues

After studying for the term test, here’s a bit on Linked lists as a summary of thing’s I’ve learned. Linked lists make it easy to adjust interior nodes by redirecting the .next of one link to point to a different node head. Linked lists are easy to work with since if you are only switching out one value, you do not need to read through the entire list to do so. You can easily change the pointer of the front of the list or the back of the list without affecting what’s in the middle.  Deletion isn’t required as eventually the garbage collection will clean unused parts of the list ( when nothing points to a node). To delete something in a linked list, all you need to do is change the .next of the node before the node you are deleting to point to the head of the node after the item to be deleted.

On the other hand, when you create enqueue in a queue using linked lists you can’t simply “insert” the back item, you need to work around it by creating a gap at the back of the linked list. This is done in the code by first making a copy of the linked list as self.back, then making the value of self.back as None and prepending before it.

Recursion

Learning Recursion:

When I initially started learning about recursion at the beginning of the term, to write a recursive function I would attempt to retrace what would happen the same way a computer would to solve a recursive problem. When a list has only one layer of nested lists inside, this was possible. However, it soon occurred that neither was this a very pragmatic approach for more complicated scenarios (when there were multiple layers of embedded lists) or the right mindset of a designer. While computers are great at solving complicated paths, we’re best to leave break down problems into simple scenarios and piece them together.

In my first post on OOP, I wrote about the ease of top down approach. Designing a program was to first layout the objects and figure out the implementation later – details and the exact implementation didn’t matter because there would be a way to make it work – similar for recursive cases. Here’s a simple example of one of the questions on an earlier test – defining the function rec_len which returns the number of non-list elements in a list:

def rec_len(L):
    return sum([rec_len(x) if isinstance(x, list) else 1 for x in L])

If we were to write the function above, we would divide it up by the two possibilities (when it is an empty list or when its has a value), write the two scenarios. If individually the parts work and are confirmed through tracing, then it is assumed that larger and more complicated scenarios work as well, thus there is no need to mentally trace the entire thing. If in the implementing the code, it follows a logic that makes sense, then the code will likely work in more complicated “difficult to trace” scenarios as well.

L = [] –> 0
L = [2] -> 1
L = [2,3] -> 2
L = [2,[]] -> 1
L = [2,[2,3]] -> 3
L = [2,[2,3],4] -> 4

What are recursive functions?

A recursive function is basically a function or part of a function that calls itself at least once. Let’s break down the function list_count as an example:

def max_list(L):
    return max([max_list(n) if isinstance(n, list) else n for n in L])

If o = [1,[3,4],[5,6,]], the function would be start with returning [1] and inside the list, it would then go onto the second object in the list and do max_list ([3,4]) which would return 4 and append it to [1] giving [1,4]. Then it would move onto the last item in the list and do max_list([5,6]) returning 6 and appending that to the list: [1,4,6]. Finally the max of the final list would be taking producing 6. Here in this example, the number of recursion depths is 2. If the example were taking a list with a list nested three times, it would take a depth of 3 to return its corresponding result when run in the max_list function.

Why and when should you use recursion / iteration?

Simply, you’d probably know when to use it through practice and preference. A suggestion Danny had was to compare an example of a code using recursion and the same using iteration. For instance, how does the code look like for performing tree traversals in python?

From: Method: Recursion vs iteration for tree traversal in python by Damian Kao, the author has written about the comparison of a function for printing out all the children in a tree list through recursion, and has written the same function using iteration. In order to rewrite the recursive function (6 lines) using iteration( 13 lines), he had to use multiple conditionals. In the recursive function, the line:

For iteration, you need to create an extra variable to append something to (in this case, newNodes = []). Every time Python goes through iterating (for node in nodes) it must change nodes = newNodes so that it changes the subsequent “for loop” to continue the function for the rest of the children. By overwriting the original nodes to newNodes each time, it will iterate through all the children without recalling the function again.This is more complicated than using recursion because when writing iteration, you need to add an extra step where you need to specify “for node in nodes”. Recursion simply recalls the function and does this automatically for you.If you are using recursion for a small set, and it is easier and faster to write with recursion. Readability may be a better choice.

However, since for recursion, there is always a set recursion depth limit, author of this blogpost (http://blog.nextgenetics.net/?e=64), Damian, suggests that unless your recursion would need a depth greater than 1000 (or whatever the limit for python is), iteration over recursion is never a “necessity”.

Memory and recursion:

Digging a little deeper on the web, are several opinions that recursion in python is considered “incredibly inefficient” (compared to an iteration counterpart of the same function). Recursive functions copy itself into the stack and requires a new stack frame for every iteration it runs. While an iteration would return to the beginning of the loop, recursion, goes back to the beginning of the function. “creating and destroying stack frames is more expensive than a simple jump”

Trees and Comments on Other People’s Blogs:

http://yamsoupslog.blogspot.ca/ on Trees
A good point the author nicely sums up is on Search Lists, which we had exposure to in lab 4. Recursion through trees can be made more efficient in the case of structured lists. In a non structured list example, in the worst case scenario, you will have to traverse through the entire tree to determine whether a certain value exists in the tree list. On the other hand, structured lists have the potential to “shorten” the task by checking one node, and traversing only the path of the child that’s specific condition is satisfied. (ex. checking if the value is larger or smaller than n (a specified value you are looking for))

References:

http://blog.nextgenetics.net/?e=64
http://www-inst.eecs.berkeley.edu/~selfpace/cs9honline/Q2/scope.html

Object Oriented Programming (OOP)

Object Oriented Programming (OOP):  All concepts in OOP are represented as objects that have attributes and methods. Since everything is termed as objects, this simple organization is easy to interpret and modify.

The shift towards OOP has its obvious benefits. Unlike procedural programming, OOP does not strictly follow a specific set of orders from beginning to end to reach a desired goal. Rather, objects are given specific characteristics with data that determines how it functions and relates to other objects. Thus the focus in OOP is placed on objects that can have multiple applications instead of being restricted to a single task.

For example, in OOP, an object “oven” can reference a class containing its functions with the method “heat product” in which the “oven” will cook any object placed inside it. Whether the object inside is a”chicken breast”, “fish filet”, or a “shoe”, they will be cooked regardless. On the other hand, procedural programming requires specific instructions for each object to be cooked. The “oven” would need a different set of steps to cook the “chicken breast”, “fish filet”, and “shoe”, thus increasing the complexity of the program. OOP does not have this form of complexity, which is one of the reasons it is much easier to understand.

Further more, OOP has closer similarities with natural languages and can be analogized with examples from real life, it makes the language easier to understand as a first programming language.

A Top Down Approach (designing from a loose idea to refinement):

Let’s say we intend to design an active school building in the same fashion one would develop a program in OOP. We would start by brainstorming aspects of the building (what it can do, what it has, its characteristics, etc.):

– library, classroom, teach, cafeteria, learn, eat, chairs, read, table, student, teacher

We then isolate each one and define each “noun” or “verb” as an instance of a class or method respectively. We can also define the specific characteristics of each or its actions. We can now also find the relationship between each and categorize them.

Rooms – Library: a room to read, Classroom: a room to teach and learn, Cafeteria: a room to eat

Furniture: Chair: an object for for sitting, Table: an object for placing other objects

People – Student: a person that learns, eats and reads, Teacher: a person that teaches, eats and reads

Actions – Eat: to eat food, Read: to read a book, Learn: to learn a subject, Teach: to teach a subject

Using only these, we are able to construct a basic school facility with actions for the people inside it. The same “Chair” can be applied to the Library, Classroom or Cafeteria, and the same “Student” can sit in the “Chair” in each of those rooms. in OOP, as long as we have an initial definition of an object, we can establish the relationship between each object instead of having to redefine the same object for each relationship.

Other aspects of OOP:

  • Reuse: You can reuse parts of the code / an object for more than one task. Presumably this setup also makes it difficult to develop “spaghetti code” – where code references an arbitrary location when a modular organization is favored. Ex. If I want a class car, then I would use this class for all the cars I create.
  • Encapsulation/ Information Hiding: In general, outside of Python, OOP claims to make it easier for encapsulation. It is said that encapsulation also implies that it provides information hiding and protecting it from change. On the other hand, Python does not support encapsulation. Information hiding is used solely as a convention through the use of underscores. This means you can access any part of the code without restriction.
  • Change:  If you want to focus and improve a single function which is reused throughout your code, you can easily do so and it will affect several parts of your program where it is used.

references:

http://net.tutsplus.com/tutorials/python-tutorials/python-from-scratch-object-oriented-programming/

http://www.learnpythonthehardway.org/book/ex43.html