README 8.7 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289
  1. ------------------------------------------------------------------
  2. ECE 264 - FALL 2013 - PROGRAMMING ASSIGNMENT 09
  3. ------------------------------------------------------------------
  4. (Derived from an assignment by Professor Vijay Raghunathan)
  5. INTRODUCTION
  6. ============
  7. You will base your utilities on the widely used algorithmic technique
  8. of Huffman coding, which is used in JPEG compression as well as in MP3
  9. audio compression. For more information, please see the attached PDF file from last semester.
  10. ASSIGNMENT
  11. ==========
  12. Write a program that takes a compressed file, reads the Huffman tree
  13. specified by the header, modify the provided postOrderPrint function and print the traverse of the Huffman Tree to a output file. Hint: the header can be either character-based representation or bit-based representation. See the input files to understand what that means.
  14. SPECIFICATIONS
  15. ==============
  16. There will not be a skeleton provided for this assignment. It will be
  17. up to you to define all functions. In the directory PA09, there will be different files for you to start. pa09.c is for you to write your main function. utility.c is for you to write your utility functions(all the functions except main funtion). pa09.h is for you to declare all the functions and structures. A Makefile will be provided later on for your testing convenience. Even without the Makefile, you should be able to write a simple Makefile for testing purpose.
  18. *The program takes two input arguments:
  19. 1. the input filename of the compressed file; and
  20. 2. the output filename for the uncompressed file.
  21. - A suite of inputs and their expected outputs.
  22. Here are some details for this program:
  23. In pa09.c you will need to do:
  24. 1. Check argc, print error message if the number of arguments are not correct
  25. 2. Open the input file, check if the file has been opened correctly, if not, print error message;
  26. 3. Call a function to create Huffman Tree, return the head of the tree;
  27. 4. Open an output file, use your own modified function postOrderPrint to print the traverse of the tree
  28. In utility.c you will need to do:
  29. 1. Write a function that can create Huffman Tree based on input(You will need to check if the input is char-based or bit-based, they are
  30. different, so your create Huffman Tree should be able to handle both cases);
  31. 2. Write functions to perform push and pop to maintain the Stack;
  32. 3. Write any other functions you think are useful;
  33. In pa09.h you will need to do:
  34. 1. Create two structures, one is for Huffman Tree, one is for Stack(This has been created for you, so don't worry);
  35. 2. Declare all the functions and constants you use in your code(Do not use mysterious constants! That's very bad coding style);
  36. *Reminder:
  37. 1. Be aware of memory leaks!
  38. 2. You must use stack to do this assignment and you need to use linked list to implement the stack.
  39. 3. All you need to do is to read the header and generate the Huffman Tree. The char-based header looks like:
  40. 1g1o01s1 01e1h01p1r0000013\n
  41. The '\n' is the new line. Every '1' indicates the next character you read is the data of a leaf node, every '0' indicates the next character you read is data for non-leaf node.
  42. 4. Bit-representation is a little bit tricky, so start with char-representation.
  43. 5. Start early. You won't be able to finish the code in one day.
  44. SAMPLE OUTPUT
  45. =============
  46. The output is just the printed traverse of nodes you have visited.
  47. Consider the compressed file for string "streets are stone stars are
  48. not", the output file of the program should read as follows:
  49. Left
  50. Left
  51. Left
  52. Back
  53. Right
  54. Back
  55. Leaf: t
  56. Back
  57. Right
  58. Left
  59. Left
  60. Back
  61. Right
  62. Back
  63. Leaf: a
  64. Back
  65. Right
  66. Left
  67. Back
  68. Right
  69. Back
  70. Leaf: r
  71. Back
  72. Back
  73. Back
  74. Right
  75. Left
  76. Left
  77. Left
  78. Left
  79. Back
  80. Right
  81. Back
  82. Leaf: n
  83. Back
  84. Right
  85. Left
  86. Back
  87. Right
  88. Back
  89. Leaf: o
  90. Back
  91. Back
  92. Right
  93. Left
  94. Back
  95. Right
  96. Back
  97. Leaf:
  98. Back
  99. Back
  100. Right
  101. Left
  102. Left
  103. Back
  104. Right
  105. Back
  106. Leaf: e
  107. Back
  108. Right
  109. Left
  110. Back
  111. Right
  112. Back
  113. Leaf: s
  114. Back
  115. Back
  116. Back
  117. Back
  118. Consider the compressed file for string "go go gophers", the output
  119. file of the program should read as follows:
  120. Left
  121. Left
  122. Left
  123. Back
  124. Right
  125. Back
  126. Leaf: g
  127. Back
  128. Right
  129. Left
  130. Back
  131. Right
  132. Back
  133. Leaf: o
  134. Back
  135. Back
  136. Right
  137. Left
  138. Left
  139. Left
  140. Back
  141. Right
  142. Back
  143. Leaf: s
  144. Back
  145. Right
  146. Left
  147. Back
  148. Right
  149. Back
  150. Leaf:
  151. Back
  152. Back
  153. Right
  154. Left
  155. Left
  156. Left
  157. Back
  158. Right
  159. Back
  160. Leaf: e
  161. Back
  162. Right
  163. Left
  164. Back
  165. Right
  166. Back
  167. Leaf: h
  168. Back
  169. Back
  170. Right
  171. Left
  172. Left
  173. Back
  174. Right
  175. Back
  176. Leaf: p
  177. Back
  178. Right
  179. Left
  180. Back
  181. Right
  182. Back
  183. Leaf: r
  184. Back
  185. Back
  186. Back
  187. Back
  188. Back
  189. HINTS
  190. =====
  191. * You need to use structures for this assignment. See tree.h for the
  192. Huffman node structure.
  193. * A stack is a last-in first-out (LIFO) data structure that is very
  194. similar to a linked list, but it has three special operations:
  195. - push: place a new item (a Huffman subtree?) on the top of the
  196. stack, making this the new top.
  197. - peek: access the item (the Huffman subtree?) at the top of the
  198. stack, but does not change the stack.
  199. - pop: removes (and deallocates) the topmost element of the stack,
  200. making the next element the new top.
  201. It is also highly likely that you need a function to check the
  202. number of elements currently on the stack.
  203. * We will check whether your program initializes all pointers to NULL.
  204. This can be accomplished in the following ways.
  205. - int * ptr = NULL; /* initialized to NULL, required */
  206. - int * ptr;
  207. - ptr = NULL; /* immediately after declaring ptr */
  208. - int * ptr = malloc(numberElement * sizeof(int));
  209. If you write the following code without initialization, you will
  210. lose points.
  211. - int * ptr; /* uninitialized pointer. NOT ALLOWED */
  212. Why do we check this? Uninitializing pointers make a program's
  213. behavior unpredictable.
  214. (You are also allowed to declare and allocate memory for a pointer
  215. immediately using malloc() or calloc().)
  216. WHAT TO SUBMIT?
  217. ===============
  218. Submit a .zip file containing the following files:
  219. 1. pa09.c pa09.h utility.c.
  220. 3. A README file that describes the algorithm used in your assignment.
  221. The README should say how the functions accomplish their goals, not
  222. just the goals themselves.
  223. HOW TO SUBMIT?
  224. ==============
  225. Your submission should be submitted through Blackboard. You can
  226. submit as many times on Blackboard as you need before the deadline. We
  227. will grade the final submission, so please make sure that it is your
  228. FINAL version with everything included.
  229. HOW WILL WE GRADE YOUR SUBMISSION?
  230. ==================================
  231. We will grade your assignments on algorithm correctness through DDD,
  232. memory allocation and deletions through valgrind, proper coding
  233. standards, content of README, a proper Makefile, and commenting.
  234. ADDTIONAL FAQ
  235. =============
  236. 1. What should I write in pa09.c?
  237. You should write your main function in pa09.c. This main function should call the utility functions you write in utility.c to open the input, read the header, generate the Huffman Tree, and use post order print to an output file(See Makefile).
  238. 2. What are the tree.c and tree.h?
  239. The tree.h is the header file for tree.c. The tree.c contains a function void Huff_postOrderPrint(HuffNode *tree), You should refer to this and write your own post order print function since you are writing your output to a file. In short, you should open an output file and use fprintf to do that. You should not submit tree.c and tree.h.
  240. 3. What is a bit-representation header?
  241. The header information can be made more economical if we use bits 0 and 1 to distinguish between non-leaf and leaf nodes, and also to indicate the end of a topology. In these two examples, there will be a total of 10 bytes (8 bytes for the leaf nodes and 2 bytes for all the 0's and 1's). The challenge here is that both the compression and decompression programs would have to handle bits instead of characters. For example, in the bit-based approach, the first 16 bits (or the first 2 bytes) of the header information for encoding the string "streets are stone stars are not" are 10111010 01011000 (note that the space in the bit stream is introduced for better clarity). The ASCII coding for 't' straddles the two bytes, 7 bits in the first byte and 1 bit in the second byte. The second most significant bit in the second byte is a 1, indicating that the next 8 bits is an ASCII character, of which the the most significant 6 bits of the character 'a' is contained in the least significant 6 bits of the second byte.
  242. In the bit-based representation of the Huffman coding tree, the last byte may not contain 8 bits. In this case, we again pad it with 0 bits. Consider the case where the input file uses nine distinct ASCII characters. The number of bits required to represent the Huffman coding tree is 9×8 + 9×2 = 90 bits, which can represented by 12 bytes. In other words, the last byte should contain only two useful bits. The 12 bytes are followed by "31\n", the number of characters in the string and the newline character.