哈夫曼编码课程设计报告

数据结构课程设计报告

基于哈夫曼树的文件压缩/解压程序

     专业班级:信科(2)班

     姓名:徐爱娟  谢静

学号:20101614310051

      20101614310050

    

                          

 20##-12_31                                                                                                        

一  需求分析

1.课题要求(实现文件的压缩与解压并计算压缩率)

A.描述压缩基本符号的选择方法

B.运行时压缩原文件的规模应不小于5K

2.设计目标

A软件名称:基于哈夫曼编码的文件压缩实用程序系统

B软件组成:huffman.exe

C制作平台及相关调试工具:

Windows XP sp3       Microsoft Visual C++ 6.0

D运行环境:dos/ win2K/win2003/winxp/

E性能特点:

1.       软件由一个可执行文件组成

huffman.exe为dos系统应用程序,体积小,高效快捷,适用范围广。

2. 对单字节(256叶子)进行哈夫曼编码,压缩率良好

3. 使用二级缓冲压缩/解压技术,速度比一般算法高

4. 可压缩最大体积为4G的文件,达到Fat32文件系统极限

5. 文件索引体积比常规算法小50%

二 概要设计 

1.相关函数介绍

1. bool InitFromFile(string fileadd) 从文件中初始化哈夫曼树函数

2. void HTCreat(HTNode ht[],int n) 构造哈夫曼树函数

3. void HCCreat(HTNode ht[],HCode hcd[],int n) 构造哈夫曼编码函数

4. void ConvertFile(HCode hcd[],string fileadd,string fileadd2) 压缩and写入文件函数

5. void DecompressionFile(string fileadd2,string fileadd3) 文件解压函数

6. string Compression(string fileadd) 压缩函数

7. string Decompression(string fileadd2) 解压函数

三  详细设计

1压缩算法部分

A核心算法:

       Huffman编码是一种可变长编码方式,是由美国数学家David Huffman创立的,是二叉树的一种特殊转化形式。编码的原理是:将使用次数多的代码转换成长度较短的代码,而使用次数少的可以使用较长的编码,并且保持编码的唯一可解性。Huffman算法的最根本的原则是:累计的(字符的统计数字*字符的编码长度)为最小,也就是权值(字符的统计数字*字符的编码长度)的和最小。

B哈夫曼树构造算法:

Huffman树是二叉树的一种特殊转化形式。以下是构件Huffman树的例子:比如有以下数据, ABFACGCAHGBBAACECDFGFAAEABBB先进行统计A(8) B(6) C(4) D(1) E(2) F(3) G(3) H(1)  括号里面的是统计次数

生成Huffman树:每次取最小的那两个节点(node)合并成一个节点(node),并且将累计数值相加作为新的接点的累计数值,最顶层的是根节点(root)  注:列表中最小节点的是指包括合并了的节点在内的所有节点,已经合并的节点不在列表中

运算的过程如下:

1:D+H(2)

2:DE+H(4)

3:F+G(6)

4:C+DEH(8)

5:B+FG(12)

6:A+CDEH(16)

7:ACDEH+BFG(28)

那么转化为Huffman树就是

Huffman树           层数

                         Root         

                        ┌┴┐

                      ACDEH  BFG              1

                     ┌┴┐┌┴┐

                    CDEH  A B   FG            2

                   ┌┴┐     ┌┴┐

                   DEH  C     F    G          3

             ┌┴┐

                 DH   E                       4

┌┴┐

D    H                         5

取左面是1 右面是0 则有。

注:层数就是位数或者说是代码长度,权值=代码长度*该字的统计次数。

                      代码          位数     权值

A = 10          2        16

B = 01          2        12

C = 110         3        12

D = 11111       5         5

E = 1110        4         8

F = 001         3         9

G = 000         2         6

H = 11110       5         5

可以看出Huffman代码是唯一可解的(uniquely decodable),如果你读到110就一定是C ,不会有任何一个编码是以110开始的。

如果不使用Huffman算法,而使用普通的编码,结果是什么呢?

Huffman树            层数

                       Root

                      ┌┴┐

      ABCD   EFGH               1

                 ┌┴┐   ┌┴┐

                AB   CD   EF   GH            2

┌┴┐┌┴┐┌┴┐┌┴┐

A   B C   D E  F   G  H         3

取左面是1 右面是0 则有

代码        位数      权值

A = 111        3        24

B = 110        3        18

C = 101        3        12

D = 100        3         3

E = 011        3         6

F = 010        3         9

G = 001        3         9

H = 000        3         3

利用Huffman编码得到的权值累计是 73,如果利用普通定长编码的话,则要用84字符长度。从这个比较,可以看出,Huffman是怎么进行压缩的。

C哈夫曼编码结构及算法

编码:将ABCDEFGH用Huffman树产生的编码对应着写到文件中,并且保留原始的Huffman树,主要是编码段的信息。一般要编码256个元素的话需要511个单位来储存Huffman树,每个Huffman树都必须有以下的结构:code,char,left,right,probability(出现次数),通常情况是利用一个数组结构。因为在解码的时候只需要用到code,所以只需要记录每个元素的编码就可以了。

解码:利用文件中保存的Huffman编码,一一对应,解读编码,把可变长编码转换为定长编码。

2.解压缩算法部分

A.基于字符匹配的解压算法

              读出结点数就能知道哈夫曼树存入部分的总长,方便读出树哈夫曼树(子结点值和权值),就能由次些信息重新构造完整的哈夫曼树,和各结点的哈夫曼编码。解压时,读取一个字节(8 bit)用一个函数转换为8个字符(用一个数组记录,其元素只是一个0或一个1),然后按哈夫曼树从顶向下查找,如果到达叶子结点,就读出该叶子结点的值,放入缓冲区中,如果不是,则继续找,如此重复,直到缓冲区满了,就写入到解压文件中,再循环以上过程,直到处理完所有数据。

B.缓冲输入输出

        和压缩时采用1M二级缓冲相同,如果的压缩时也采用此技术,也会有很大的速度优化,当然程序也变得更加复杂。

四 用户使用说明

1.运行huffman.exe程序,出现下面的界面

2.选择相应的功能,输入正确文件路径,对文件进行压缩、解压。

五 设计心得体会

通过这次课题实验的程序实践,我实在获益匪浅!数据结构是本学期开展的一门学科,学习好这门学科是非常重要的,在以后的程序设计方面这门学科能给我们很大的帮助。同时,学习这门学科也是艰辛的,因为它比较难懂,这不仅需要我们要发挥我们的聪明才志,还需要我们在不断的实践中领悟。

六 附录

程序清单

在此,给出huffman.exe的程序清单。

#include<fstream>

#include<iostream>

#include<String>

#include<cmath>

using namespace std;

struct HTNode

{

    char data;   //节点数据

    int weight;  //权值

    int parent;  //父亲

    int leftchild;  //左孩子

    int rightchild;  //右孩子

};

typedef char* Code;

HTNode *ht;

Code *hcd;

int maplist[256];       //建立字符与哈夫曼编码的映射

int nodenum=0;          //哈夫曼树结点数

int rearnum=0;          //哈夫曼编码尾补码

int textlen=0;          //需压缩的文件长度

int codelen=0;          //压缩后的文件的哈夫曼编码总长度

int const bufferlen=1024;               //设置读取缓冲区长度

int clean();            //清空节点及编码指针内容

void dtobin(int num,int bin[8]);    //十进制变换成二进制

void HTCreate(HTNode ht[],int n);   //建立哈夫曼树

void HCCreat(HTNode ht[],Code hcd[],int n);      //提取哈夫曼编码

void WriteFile(char *tmp);          //写入文件

unsigned char ConvertBinary(char *tmp);                            //变换二进制文件

void ConvertFile(Code hcd[],string fileadd,string fileadd2);                   //压缩并解压文件

bool InitFromFile(string fileadd);                                //初始化文件

void DecompressionFile(string fileadd2,string fileadd3);    //解压文件

string Compression(string fileadd);                             //压缩文件

string Decompression(string fileadd2);          //解压文件子函数

///////////////十进制转二进制函数/////////////////

int clean()

{

    delete[] ht;

         delete[] hcd;

         return 1;

}

void dtobin(int num,int bin[8])

{

    int i=0;

    for(i=0;i<8;i++)

    {

        bin[i]=0;

    }

    i=0;

    while(num>0)

    {

        bin[8-1-i]=num%2;

        num=num/2;

        i++;

    }

}

//////////////////压缩和写入文件//////////////////

void ConvertFile(Code hcd[],string fileadd,string fileadd2)

{

    fstream infile(fileadd.c_str(),ios::in|ios::binary);

         fstream outfile(fileadd2.c_str(),ios::out|ios::binary);

         if(!infile) cout<<"open file fail!"<<endl;

         if(!outfile) cout<<"creat file fail!"<<endl;

//unsigned

    char ch;

    /////////////写入哈夫曼树//////////////

    ch=nodenum;

    outfile.write(&ch,1);        ///写入结点数

    ch=8;

    outfile.write(&ch,1);        ///写入补位数(预写入)

    codelen=0;

    outfile.write((char *)&codelen,4);  //写入压缩后的文件的哈夫曼编码总长度(预写入)

    int h=0;

    for(h=0;h<nodenum;h++)

    {

        outfile.write((char*)&ht[h].data,sizeof(char));

        outfile.write((char*)&ht[h].weight,sizeof(int));

    }

    char tmp[8];   //设置缓冲区

    char outbuffer[bufferlen];    //设置写入缓冲区

    char *tmpcd;

    int i=0,j,k,last=0;

    char inbuffer[bufferlen];

    int readlen=0;

    //infile.seekg(i,ios::beg);

         h=0;

    do

    {

 infile.read(inbuffer,bufferlen);

    readlen=infile.gcount();

    tmpcd=hcd[maplist[(unsigned char)inbuffer[i]]];

    for(i=0;i<readlen;)

      {

        for(j=last;j<8 && *tmpcd!='\0';j++)

        {

            tmp[j]=*tmpcd;

            tmpcd++;

        }

        if(j==8 && *tmpcd=='\0')

        {

            last=0;

            i++;

            ch=ConvertBinary(tmp);

            //cout<<':'<<(unsigned int)ch<<' ';

            outbuffer[h]=ch;

            h++;

            codelen++;   //压缩文件长度加一

                            if(h==bufferlen)

            {

            outfile.write(outbuffer,bufferlen);

            h=0;

            }

            if(i<readlen) tmpcd=hcd[maplist[(unsigned char)inbuffer[i]]];

            else

                            {

                                     i=0;

                                      break;

                            }

        }

        else if(j<8 && *tmpcd=='\0')

        {

            last=j;

            i++;

            if(i<readlen) tmpcd=hcd[maplist[(unsigned char)inbuffer[i]]];

            else

                            {    i=0;

                                 break;

                            }

            /////继续循换////

        }

        else if(j==8 && *tmpcd!='\0')

        {

            last=0;

                            //WriteFile(tmp);

            ch=ConvertBinary(tmp);

            outbuffer[h]=ch;

            h++;

            codelen++;   //压缩文件长度加一

                            if(h==bufferlen)

            {

            outfile.write(outbuffer,bufferlen);

            h=0;

            }

        }

      }

    }

    while(readlen==bufferlen);

         if(j==8 && readlen<bufferlen)

    {

        outfile.write(outbuffer,h);

    }

    else if(j<8 && readlen<bufferlen)

    {

        for(k=j;k<8;k++)

        {

            tmp[k]='0';

        }

        ch=ConvertBinary(tmp);

        outbuffer[h]=ch;

        h++;

        outfile.write(outbuffer,h);

        codelen++;   //压缩文件长度加一

    }

    cout<<endl;

    ch=8-j;

    rearnum=8-j;

    outfile.seekp(1,ios::beg);

    outfile.write(&ch,1);       //写入真正的补位数

    outfile.seekp(2,ios::beg);

outfile.write((char*)&codelen,4);  //写入真正的压缩后的文件的哈夫曼编码总长度长度

 outfile.close();

    infile.close();

}

//////////////构造哈夫曼树////////////

void HTCreate(HTNode ht[],int n)

{

    int i,k,lnode,rnode;

    int min1,min2;

    for(i=0;i<2*n-1;i++)

    {

        ht[i].parent=ht[i].rightchild=ht[i].leftchild=-1;

                  

    }

    for(i=n;i<2*n-1;i++)

    {

        min1=min2=2147483647;

        lnode=rnode=-1;

        for(k=0;k<=i-1;k++)

        {

            if(ht[k].parent==-1)

            {

                if(ht[k].weight<min1)

                {

                    min2=min1;

                    min1=ht[k].weight;

                    rnode=lnode;

                    lnode=k;

                }

                else if(ht[k].weight<min2)

                {

                    min2=ht[k].weight;

                    rnode=k;

                }

            }

        }

                   ht[lnode].parent=i;

        ht[rnode].parent=i;

        ht[i].weight=ht[lnode].weight+ht[rnode].weight;

        ht[i].leftchild=lnode;

        ht[i].rightchild=rnode;

    }

}

///////////构造哈夫曼编码/////////////

void HCCreat(HTNode ht[],Code hcd[],int n)

{

    int i,p,c;

    Code hc;

    hc=new char[n];

    int start,tmplen;

    for(i=0;i<n;i++)

    {

        tmplen=0;

        start=n-1;

        hc[start]='\0';

        c=i;

        p=ht[i].parent;

        while(p!=-1)

        {

            if(ht[p].leftchild==c)  //是左孩子结点

            {

                hc[--start]='0';

                tmplen++;

            }

            else

            {

                hc[--start]='1';

                tmplen++;

            }

            c=p;

            p=ht[p].parent;

        }

        hcd[i]=new char[n-start];

        strcpy(hcd[i],&hc[start]);

    }

         delete[] hc;

}

void WriteFile(char *tmp)

{

     int i;

     for(i=0;i<8;i++)

     cout<<tmp[i];

          cout<<' ';

          tmp="";

}

unsigned char ConvertBinary(char *tmp)

{

    char ch=0;

    int i;

    for(i=0;i<8;i++)

    {

        ch=(unsigned char)pow(2.0,8-i-1)*(tmp[i]-48)+ch;

    }

    return ch;

}

//////////////打开文件//////////////

bool InitFromFile(string fileadd)

{

          fstream infile(fileadd.c_str(),ios::binary|ios::in);

     if(!infile){cout<<"error!"<<endl;return 0;}

     int table[256];

     int i,j;

     int len=0,num=0;

     unsigned char ch;

     for(i=0;i<256;i++) {table[i]=0;maplist[i]=-1;}

  int readlen=0;

     char buffer[bufferlen];          //设置读取缓冲区,加快读取速度

     do

     {

         infile.read(buffer,bufferlen);

         i=0;

         readlen=infile.gcount();

     while(i<readlen)

                               {

               ch=(unsigned char)buffer[i];

               table[ch]++;

               len++;

               i++;

                               }

    }

    while(readlen==bufferlen);

         for(i=0;i<256;i++)

    {

        if(table[i]!=0) num++;

    }

    ht=new HTNode[2*num-1];

    hcd=new Code[num];

    for(i=0,j=0;i<256;i++)

    {

        if(table[i]!=0)

        {

             ht[j].data=i;

             ht[j].weight=table[i];

             maplist[i]=j;  //建立字符与哈夫曼编码的映射

             j++;

        }

    }

    nodenum=num;

    textlen=len;

    infile.clear();

         infile.close();

    return 1;

}

/////////////从文件解压////////////////////

void DecompressionFile(string fileadd2,string fileadd3)

{

     //cout<<"............解压并输出新文件过程:"<<endl;

          fstream infile(fileadd2.c_str(),ios::in|ios::binary);

          fstream outfile(fileadd3.c_str(),ios::out|ios::binary);

     cout<<endl;

     /////////////////读出哈夫曼树的数据/////////////

     int h=0;

     char buffer[bufferlen];      //读入文件的缓冲区

     char outbuffer[bufferlen];   //写入文件的缓冲区

     infile.read(buffer,1);

     nodenum=(unsigned char)*buffer;//哈夫曼树结点数

     if(nodenum==0) nodenum=256;

     infile.read(buffer,1);

     rearnum=(unsigned char)*buffer;

     infile.read((char*)&codelen,4);

     //cout<<"  读出哈夫曼树数据...."<<endl;

     ht=new HTNode[2*nodenum-1];

          hcd=new Code[nodenum];

          //hcdlen=new int[nodenum];

     for(h=0;h<nodenum;h++)

     {

        infile.read(&ht[h].data,1);

        infile.read((char*)&ht[h].weight,4);

     }

     //////构走哈夫曼树///////

     HTCreate(ht,nodenum);

     //////构造哈夫曼编码/////

     HCCreat(ht,hcd,nodenum);

     ///////////////////////解压并输出解压文件////////////////////////

          char *buffertmp=new char;

     int bin[8],j=0,i=0;

     int coderead=0;       //记录以度的长度,用于判断何时达到文件最后一字节(用codelen比较)

          int readlen=0;

          int child=0;

     int last=2*nodenum-2; //解压时记录上次指示器的位置

     child=last;

          unsigned char outp;

     h=0;

     do

     {

         infile.read(buffer,bufferlen);

                    readlen=infile.gcount();

                    

       

                    for(j=0;j<readlen;j++)

                    {

           coderead++;

                      outp=buffer[j];

           dtobin(outp,bin);

                      if(coderead==codelen)   //达到文件尾

           {

             for(i=0;i<=8-rearnum;i++)

             {

                if(ht[child].leftchild==-1 && ht[child].rightchild==-1)

                {

                    //cout<<ht[child].data;

                    outbuffer[h]=ht[child].data;

                    h++;

                                               if(h==bufferlen) {outfile.write(outbuffer,bufferlen);h=0;}

                                              

                    last=2*nodenum-2;

                    if(i==8-rearnum)

                                               {

                                                        if(h!=0) outfile.write(outbuffer,h);

                                                        child=last;

                                                        break;

                                               }

                                               else i--;

                }

                                     else if(i!=8)

                {    if(bin[i]==0) last=ht[child].leftchild;

                     else if(bin[i]==1) last=ht[child].rightchild;

                }

                                     child=last;

            }

         }

         else             //没达到文件尾

         {

            for(i=0;i<=8;i++)

            {

                if(ht[child].leftchild==-1 && ht[child].rightchild==-1)

                {

                    //cout<<ht[child].data;

                    outbuffer[h]=ht[child].data;

                    h++;

                                               if(h==bufferlen)

                    {

                       outfile.write(outbuffer,bufferlen);

                        h=0;

                    }

                    last=2*nodenum-2;

                                               if(i==8)

                                               {

                                                        child=last;

                                                        break;

                                               }

                                               else i--;

                }

                else if(i!=8)

                {    if(bin[i]==0) last=ht[child].leftchild;

                     else if(bin[i]==1) last=ht[child].rightchild;

                }

                                     child=last;

            }

           }

                    }

      }

          while(readlen==bufferlen);

     //cout<<j<<endl;

     infile.close();

          outfile.close();

}

string Compression(string fileadd)

{

         int i;

         for(i=0;i<fileadd.length();i++)

                   if(fileadd[i]=='\\') fileadd[i]='/';

         string fileadd2;

         fileadd2=fileadd+".rax";

         InitFromFile(fileadd);     //从文件中初始化哈夫曼树

         HTCreate(ht,nodenum);       //构造哈夫曼树 

    HCCreat(ht,hcd,nodenum);   //构造哈夫曼编码

    ConvertFile(hcd,fileadd,fileadd2);   //压缩并写入文件

    clean();

         return fileadd2;

}

string Decompression(string fileadd2)

{

    int i;

         for(i=0;i<fileadd2.length();i++)

                   if(fileadd2[i]=='\\') fileadd2[i]='/';

         string fileclass;

         string fileadd3;

         for(i=fileadd2.length()-5;fileadd2[i]!='.' && i>0;i--)

                   fileclass.insert(0,fileadd2.substr(i,1));

         if(i!=0)

                   fileadd3=fileadd2.substr(0,i)+"_"+'.'+fileclass;

         else

                   fileadd3=fileadd2.substr(0,fileadd2.length()-4)+"_";

         DecompressionFile(fileadd2,fileadd3);

         clean();

         return fileadd3;

}

int main()

{

         cout<<"缓冲区长度:"<<bufferlen<<"Bytes."<<endl<<endl;

         cout<<"\t*******************************************************\n";

    cout<<"\t*                                                     *\n";

    cout<<"\t*                 数据结构课程设计                    *\n";

    cout<<"\t*           基于哈夫曼的文件压缩解压程序              *\n";

    cout<<"\t*                                                     *\n";

    cout<<"\t*******************************************************\n"; 

         string fileadd;

         string fileadd2;

         char usage;

         do

         {

                   cout<<""<<endl;

                   cout<<"(1)压缩C"<<endl;

                   cout<<"(2)解压D"<<endl;

                   cout<<"(3)退出Q "<<endl;

                   cout<<""<<endl;

                   cout<<"*请输入选项:";

             cin>>usage;

                   if(usage=='C' || usage=='c')

                   {

                            cout<<"请输入压缩文件的路径:";

                            cin>>fileadd;

                           

                            cout<<"压缩文件开始...";

                            fileadd2=Compression(fileadd);

                            cout<<"压缩文件完毕,压缩后文件的路径是:"<<fileadd2<<endl<<endl;

                   }

                   else if(usage=='D' || usage=='d')

                   {

                            cout<<"请输入解压文件的路径:";

                            cin>>fileadd;

                            cout<<"解压文件开始...";

                            fileadd2=Decompression(fileadd);

                            cout<<"解压文件完毕,解压后文件的路径是:"<<fileadd2<<endl<<endl;

                   }

                   else if(usage=='Q' || usage=='q')  return 0;

         }

         while(1);

    return 0;

}                

#include<fstream>

#include<iostream>

#include<String>

#include<cmath>

using namespace std;

struct HTNode

{

    char data;   //节点数据

    int weight;  //权值

    int parent;  //父亲

    int leftchild;  //左孩子

    int rightchild;  //右孩子

};

typedef char* Code;

HTNode *ht;

Code *hcd;

int maplist[256];       //建立字符与哈夫曼编码的映射

int nodenum=0;          //哈夫曼树结点数

int rearnum=0;          //哈夫曼编码尾补码

int textlen=0;          //需压缩的文件长度

int codelen=0;          //压缩后的文件的哈夫曼编码总长度

int const bufferlen=1024;               //设置读取缓冲区长度

int clean();            //清空节点及编码指针内容

void dtobin(int num,int bin[8]);    //十进制变换成二进制

void HTCreate(HTNode ht[],int n);   //建立哈夫曼树

void HCCreat(HTNode ht[],Code hcd[],int n);      //提取哈夫曼编码

void WriteFile(char *tmp);          //写入文件

unsigned char ConvertBinary(char *tmp);                            //变换二进制文件

void ConvertFile(Code hcd[],string fileadd,string fileadd2);                   //压缩并解压文件

bool InitFromFile(string fileadd);                                //初始化文件

void DecompressionFile(string fileadd2,string fileadd3);    //解压文件

string Compression(string fileadd);                             //压缩文件

string Decompression(string fileadd2);          //解压文件子函数

///////////////十进制转二进制函数/////////////////

int clean()

{

    delete[] ht;

         delete[] hcd;

         return 1;

}

void dtobin(int num,int bin[8])

{

    int i=0;

    for(i=0;i<8;i++)

    {

        bin[i]=0;

    }

    i=0;

    while(num>0)

    {

        bin[8-1-i]=num%2;

        num=num/2;

        i++;

    }

}

//////////////////压缩和写入文件//////////////////

void ConvertFile(Code hcd[],string fileadd,string fileadd2)

{

    ////////////////////////////////////////

    fstream infile(fileadd.c_str(),ios::in|ios::binary);

         fstream outfile(fileadd2.c_str(),ios::out|ios::binary);

         if(!infile) cout<<"open file fail!"<<endl;

         if(!outfile) cout<<"creat file fail!"<<endl;

    //unsigned

    char ch;

    /////////////写入哈夫曼树//////////////

    ch=nodenum;

    outfile.write(&ch,1);        ///写入结点数

    ch=8;

    outfile.write(&ch,1);        ///写入补位数(预写入)

    codelen=0;

    outfile.write((char *)&codelen,4);  //写入压缩后的文件的哈夫曼编码总长度(预写入)

    int h=0;

    for(h=0;h<nodenum;h++)

    {

        outfile.write((char*)&ht[h].data,sizeof(char));

        outfile.write((char*)&ht[h].weight,sizeof(int));

    }

    char tmp[8];   //设置缓冲区

    char outbuffer[bufferlen];    //设置写入缓冲区

    char *tmpcd;

    int i=0,j,k,last=0;

    char inbuffer[bufferlen];

    int readlen=0;

    //infile.seekg(i,ios::beg);

         h=0;

    do

    {

    infile.read(inbuffer,bufferlen);

    readlen=infile.gcount();

    tmpcd=hcd[maplist[(unsigned char)inbuffer[i]]];

    for(i=0;i<readlen;)

      {

        for(j=last;j<8 && *tmpcd!='\0';j++)

        {

            tmp[j]=*tmpcd;

            tmpcd++;

        }

        if(j==8 && *tmpcd=='\0')

        {

            last=0;

            i++;

            ch=ConvertBinary(tmp);

            //cout<<':'<<(unsigned int)ch<<' ';

            outbuffer[h]=ch;

            h++;

            codelen++;   //压缩文件长度加一

                            if(h==bufferlen)

            {

            outfile.write(outbuffer,bufferlen);

            h=0;

            }

            if(i<readlen) tmpcd=hcd[maplist[(unsigned char)inbuffer[i]]];

            else

                            {

                                     i=0;

                                     break;

                            }

        }

        else if(j<8 && *tmpcd=='\0')

        {

            last=j;

            i++;

            if(i<readlen) tmpcd=hcd[maplist[(unsigned char)inbuffer[i]]];

            else

                            {    i=0;

                                 break;

                            }

            /////继续循换////

        }

        else if(j==8 && *tmpcd!='\0')

        {

            last=0;

                            //WriteFile(tmp);

            ch=ConvertBinary(tmp);

            outbuffer[h]=ch;

            h++;

            codelen++;   //压缩文件长度加一

                            if(h==bufferlen)

            {

            outfile.write(outbuffer,bufferlen);

            h=0;

            }

        }

      }

    }

    while(readlen==bufferlen);

         if(j==8 && readlen<bufferlen)

    {

        outfile.write(outbuffer,h);

    }

    else if(j<8 && readlen<bufferlen)

    {

        for(k=j;k<8;k++)

        {

            tmp[k]='0';

        }

        ch=ConvertBinary(tmp);

        outbuffer[h]=ch;

        h++;

        outfile.write(outbuffer,h);

        codelen++;   //压缩文件长度加一

    }

    cout<<endl;

    ch=8-j;

    rearnum=8-j;

    outfile.seekp(1,ios::beg);

    outfile.write(&ch,1);       //写入真正的补位数

    outfile.seekp(2,ios::beg);

    outfile.write((char*)&codelen,4);  //写入真正的压缩后的文件的哈夫曼编码总长度长度

    outfile.close();

    infile.close();

}

//////////////构造哈夫曼树////////////

void HTCreate(HTNode ht[],int n)

{

    int i,k,lnode,rnode;

    int min1,min2;

    for(i=0;i<2*n-1;i++)

    {

        ht[i].parent=ht[i].rightchild=ht[i].leftchild=-1;

    }

    for(i=n;i<2*n-1;i++)

    {

        min1=min2=2147483647;

        lnode=rnode=-1;

        for(k=0;k<=i-1;k++)

        {

            if(ht[k].parent==-1)

            {

                if(ht[k].weight<min1)

                {

                    min2=min1;

                    min1=ht[k].weight;

                    rnode=lnode;

                    lnode=k;

                }

                else if(ht[k].weight<min2)

                {

                    min2=ht[k].weight;

                    rnode=k;

                }

            }

        }

                   ht[lnode].parent=i;

        ht[rnode].parent=i;

        ht[i].weight=ht[lnode].weight+ht[rnode].weight;

        ht[i].leftchild=lnode;

        ht[i].rightchild=rnode;

    }

}

///////////构造哈夫曼编码/////////////

void HCCreat(HTNode ht[],Code hcd[],int n)

{

    int i,p,c;

    Code hc;

    hc=new char[n];

    int start,tmplen;

    for(i=0;i<n;i++)

    {

        tmplen=0;

        start=n-1;

        hc[start]='\0';

        c=i;

        p=ht[i].parent;

        while(p!=-1)

        {

            if(ht[p].leftchild==c)  //是左孩子结点

            {

                hc[--start]='0';

                tmplen++;

            }

            else

            {

                hc[--start]='1';

                tmplen++;

            }

            c=p;

            p=ht[p].parent;

        }

        hcd[i]=new char[n-start];

        strcpy(hcd[i],&hc[start]);

    }

         delete[] hc;

}

void WriteFile(char *tmp)

{

     int i;

     for(i=0;i<8;i++)

     cout<<tmp[i];

          cout<<' ';

          tmp="";

}

unsigned char ConvertBinary(char *tmp)

{

    char ch=0;

    int i;

    for(i=0;i<8;i++)

    {

        ch=(unsigned char)pow(2.0,8-i-1)*(tmp[i]-48)+ch;

    }

    return ch;

}

//////////////打开文件//////////////

bool InitFromFile(string fileadd)

{

          fstream infile(fileadd.c_str(),ios::binary|ios::in);

     if(!infile){cout<<"error!"<<endl;return 0;}

     int table[256];

     int i,j;

     int len=0,num=0;

     unsigned char ch;

     for(i=0;i<256;i++) {table[i]=0;maplist[i]=-1;}

     int readlen=0;

     char buffer[bufferlen];          //设置读取缓冲区,加快读取速度

     do

     {

         infile.read(buffer,bufferlen);

         i=0;

         readlen=infile.gcount();

     while(i<readlen)

                               {

               ch=(unsigned char)buffer[i];

               table[ch]++;

               len++;

               i++;

                               }

    }

    while(readlen==bufferlen);

         for(i=0;i<256;i++)

    {

        if(table[i]!=0) num++;

    }

    ht=new HTNode[2*num-1];

    hcd=new Code[num];

    for(i=0,j=0;i<256;i++)

    {

        if(table[i]!=0)

        {

             ht[j].data=i;

             ht[j].weight=table[i];

             maplist[i]=j;  //建立字符与哈夫曼编码的映射

             j++;

        }

    }

    nodenum=num;

    textlen=len;

    infile.clear();

         infile.close();

    return 1;

}

/////////////从文件解压////////////////////

void DecompressionFile(string fileadd2,string fileadd3)

{

     //cout<<"............解压并输出新文件过程:"<<endl;

          fstream infile(fileadd2.c_str(),ios::in|ios::binary);

          fstream outfile(fileadd3.c_str(),ios::out|ios::binary);

     cout<<endl;

     /////////////////读出哈夫曼树的数据/////////////

     int h=0;

     char buffer[bufferlen];      //读入文件的缓冲区

     char outbuffer[bufferlen];   //写入文件的缓冲区

     infile.read(buffer,1);

     nodenum=(unsigned char)*buffer;//哈夫曼树结点数

     if(nodenum==0) nodenum=256;

     infile.read(buffer,1);

     rearnum=(unsigned char)*buffer;

     infile.read((char*)&codelen,4);

     //cout<<"  读出哈夫曼树数据...."<<endl;

     ht=new HTNode[2*nodenum-1];

          hcd=new Code[nodenum];

          //hcdlen=new int[nodenum];

     for(h=0;h<nodenum;h++)

     {

        infile.read(&ht[h].data,1);

        infile.read((char*)&ht[h].weight,4);

     }

     //////构走哈夫曼树///////

     HTCreate(ht,nodenum);

     //////构造哈夫曼编码/////

     HCCreat(ht,hcd,nodenum);

     ///////////////////////解压并输出解压文件////////////////////////

          char *buffertmp=new char;

     int bin[8],j=0,i=0;

     int coderead=0;       //记录以度的长度,用于判断何时达到文件最后一字节(用codelen比较)

          int readlen=0;

          int child=0;

     int last=2*nodenum-2; //解压时记录上次指示器的位置

     child=last;

          unsigned char outp;

     h=0;

     do

     {

         infile.read(buffer,bufferlen);

                    readlen=infile.gcount();

                    for(j=0;j<readlen;j++)

                    {

           coderead++;

                      outp=buffer[j];

           dtobin(outp,bin);

                      if(coderead==codelen)   //达到文件尾

           {

             for(i=0;i<=8-rearnum;i++)

             {

                if(ht[child].leftchild==-1 && ht[child].rightchild==-1)

                {

                    //cout<<ht[child].data;

                    outbuffer[h]=ht[child].data;

                    h++;

                                               if(h==bufferlen) {outfile.write(outbuffer,bufferlen);h=0;}

                                              

                    last=2*nodenum-2;

                    if(i==8-rearnum)

                                               {

                                                        if(h!=0) outfile.write(outbuffer,h);

                                                        child=last;

                                                        break;

                                               }

                                               else i--;

                }

                                     else if(i!=8)

                {    if(bin[i]==0) last=ht[child].leftchild;

                     else if(bin[i]==1) last=ht[child].rightchild;

                }

                                     child=last;

            }

         }

         else             //没达到文件尾

         {

            for(i=0;i<=8;i++)

            {

                if(ht[child].leftchild==-1 && ht[child].rightchild==-1)

                {

                    //cout<<ht[child].data;

                    outbuffer[h]=ht[child].data;

                    h++;

                                               if(h==bufferlen)

                    {

                       outfile.write(outbuffer,bufferlen);

                        h=0;

                    }

                    last=2*nodenum-2;

                                               if(i==8)

                                               {

                                                        child=last;

                                                        break;

                                               }

                                              else i--;

                                              

                }

                else if(i!=8)

                {    if(bin[i]==0) last=ht[child].leftchild;

                     else if(bin[i]==1) last=ht[child].rightchild;

                }

                                     child=last;

            }

           

           }

                    }

      }

          while(readlen==bufferlen);

     //cout<<j<<endl;

     infile.close();

          outfile.close();

}

string Compression(string fileadd)

{

         int i;

         for(i=0;i<fileadd.length();i++)

                   if(fileadd[i]=='\\') fileadd[i]='/';

        

         string fileadd2;

         fileadd2=fileadd+".rax";

        

         InitFromFile(fileadd);     //从文件中初始化哈夫曼树

         HTCreate(ht,nodenum);       //构造哈夫曼树 

    HCCreat(ht,hcd,nodenum);   //构造哈夫曼编码

    ConvertFile(hcd,fileadd,fileadd2);   //压缩并写入文件

    clean();

         return fileadd2;

}

string Decompression(string fileadd2)

{

    int i;

         for(i=0;i<fileadd2.length();i++)

                   if(fileadd2[i]=='\\') fileadd2[i]='/';

         string fileclass;

         string fileadd3;

         for(i=fileadd2.length()-5;fileadd2[i]!='.' && i>0;i--)

                   fileclass.insert(0,fileadd2.substr(i,1));

         if(i!=0)

                  fileadd3=fileadd2.substr(0,i)+"_"+'.'+fileclass;

         else

                   fileadd3=fileadd2.substr(0,fileadd2.length()-4)+"_";

         DecompressionFile(fileadd2,fileadd3);

         clean();

         return fileadd3;

}

int main()

{

         cout<<"缓冲区长度:"<<bufferlen<<"Bytes."<<endl<<endl;

         cout<<"\t*******************************************************\n";

    cout<<"\t*                                                     *\n";

    cout<<"\t*                 数据结构课程设计                    *\n";

    cout<<"\t*           基于哈夫曼的文件压缩解压程序              *\n";

    cout<<"\t*                                                     *\n";

    cout<<"\t*******************************************************\n"; 

         string fileadd;

         string fileadd2;

         char usage;

         do

         {

                   cout<<""<<endl;

                   cout<<"(1)压缩C"<<endl;

                   cout<<"(2)解压D"<<endl;

                   cout<<"(3)退出Q "<<endl;

                   cout<<""<<endl;

                   cout<<"*请输入选项:";

             cin>>usage;

                   if(usage=='C' || usage=='c')

                   {

                            cout<<"请输入压缩文件的路径:";

                            cin>>fileadd;

                           

                            cout<<"压缩文件开始...";

                            fileadd2=Compression(fileadd);

                            cout<<"压缩文件完毕,压缩后文件的路径是:"<<fileadd2<<endl<<endl;

                   }

                   else if(usage=='D' || usage=='d')

                   {

                            cout<<"请输入解压文件的路径:";

                            cin>>fileadd;

                            cout<<"解压文件开始...";

                            fileadd2=Decompression(fileadd);

                            cout<<"解压文件完毕,解压后文件的路径是:"<<fileadd2<<endl<<endl;

                   }

                   else if(usage=='Q' || usage=='q')  return 0;

         }

         while(1);

    return 0;

}

                  

相关推荐