数据结构（17）排序（下）-EW帮帮网

一、计数排序

计数排序又称为鸽巢原理，是对哈希直接定址法的变形应用。操作步骤如下：①统计相同元素出现的次数 ②根据统计的结果将序列回收到原来的序列中

比如，现在有一个数组{6,1,2,9,4,2,4,1,4}。该数组中，元素1出现两次，元素2出现两次，元素4出现三次，元素6出现一次，元素9出现一次。我们这时就可以创建一个新数组，把原数组里的数据作为新数组的下标，原数组数据出现的次数作为新数组里下标对应的值，得到的新数组如下图所示：

这时我们定义一个变量 i 遍历 count 数组，如果count[i] 里的数据不为0，我们就循环向arr数组中插入count[i] 次，就可以得到排好序的数组，如下图所示：

思路很简单，但是新创建的count数组的大小如何确定呢？

在上面的示例中，我们似乎是找到了原数组中的最大值9，然后给count数组开辟了0~9共10个空间，那么是不是这样开辟空间大小就行呢？假如现在有一个数组{100,101,109,105,101,105}，如果我们还像刚才那样开辟0~109共110个空间，那么count数组中下标为0~99这100个空间就严重浪费了。所以原数组中找最大值，再让最大值+1这个方法是错误的，可能会造成空间浪费。那该怎么做呢？我们还拿{100,101,109,105,101,105}这个数组为例，最大值max = 109，最小值min =100，100~109之间最多有10个数据，也就是说count数组所需的最大空间就是10，所以我们用max - min + 1就可以得到count数组的空间大小了。

注：计数排序在数据范围集中时效率很高，但是适用范围及场景有限！

代码如下：

//计数排序
void CountSort(int* arr, int n)
{
	//找最大和最小值
	int min = arr[0], max = arr[0];
	for (int i = 1;i < n;i++)
	{
		if (arr[i] < min)
		{
			min = arr[i];
		}
		if (arr[i] > max)
		{
			max = arr[i];
		}
	}
	// max - min + 1 确定count数组大小
	int range = max - min + 1;
	int* count = (int*)malloc(range * sizeof(int));
	if (count == NULL)
	{
		perror("malloc fail");
		exit(1);
	}
	//初始化
	memset(count, 0, range * sizeof(int));
	for (int i = 0;i < n;i++)
	{
		count[arr[i] - min]++;
	}
	//将count数组还原到原数组中,使其有序
	int index = 0;
	for (int i = 0;i < range;i++)
	{
		while (count[i]--)
		{
			arr[index++] = min + i;
		}
	}
}

计数排序时间复杂度为O(N + range)

稳定性：稳定

二、排序算法的稳定性分析

稳定性：假设在待排序的序列中，存在多个具有相同关键字的记录，经过排序，这些记录的相对次序保持不变。即在原序列中，r[i] == r[j]，且r[i] 在 r[j] 之前，而排序后的序列，r[i] 仍在 r[j] 之前，则称这种排序算法是稳定的。

下面给出七种排序算法的时间复杂度和稳定性：

排序方法	时间复杂度	稳定性
冒泡排序	O(n^2)	稳定
直接选择排序	O(n^2)	不稳定
直接插入排序	O(n^2)	稳定
希尔排序	O(nlogn)~O(n^2)	不稳定
堆排序	O(nlogn)	不稳定
归并排序	O(nlogn)	稳定
快速排序	O(nlogn)	不稳定

三、快速排序拓展

快速排序最核心的一点在于基准值的确定。关于如何找基准值，我们在上一章讲了三种方法。但是，当待排序数组中含有大量重复数据时，再使用之前的方法会让快速排序的时间复杂度接近于O(n^2)。因此，这里分享两种优化方法。

1、三路划分

三路划分，顾名思义，就是把数组划分成三个区间。把重复的相等的数据集中在中间，左边就是较小值，右边就是较大值。方法如下图所示：

根据上述的方法思路，我们来模拟实现一下该流程：

单次循环划分完毕之后，只需对左右两块区间递归即可。

void QuickSort(int* arr, int left, int right)
{
	if(left >= right)
	{
		return;
	}
	
	int begin = left; 
	int end = right;
	
	int key = arr[left];
	int cur = left + 1;
	while(cur <= right)
	{
		if(arr[cur] < key)
		{
			Swap(&arr[cur], &arr[left]);
			cur++;
			left++;
		}
		else if(arr[cur] > key)
		{
			Swap(&arr[cur], &arr[right]);
			right--;
		}
		else
		{
			cur++;
		}
	}
	//[begin, left-1]  [left, right]  [right+1, end]
	QuickSort(arr, begin, left - 1);
	QuickSort(arr, right + 1, end);
}

2、自省排序（introsort）

introsort是introspective sort的缩写，顾名思义，该方法的思路就是进行自我的侦测与反省，快排递归深度太深（sgi STL中使用的是深度为2倍排序元素数量的对数值）那就说明在这种数据序列下，选key出现了问题，性能在快速退化，那么就不要再进行快排分割递归了，改换为堆排序进行排序。

了解即可，不做代码要求。

四、归并排序拓展

1、非递归版本归并排序

//非递归版本归并排序
void MergeSortNonR(int* arr, int n)
{
	int* tmp = (int*)malloc(sizeof(int) * n);
	if (tmp == NULL)
	{
		perror("malloc fail");
		exit(1);
	}
	int gap = 1;
	while (gap < n)
	{
		//根据gap划分组，两两一合并
		for (int i = 0;i < n;i += 2 * gap)
		{
			int begin1 = i, end1 = i + gap - 1;
			int begin2 = i + gap, end2 = i + gap + gap - 1;
			if (begin2 >= n)
			{
				break;
			}
			if (end2 >= n)
			{
				end2 = n - 1;
			}
			int index = begin1;
			//两个有序序列进行合并
			while (begin1 <= end1 && begin2 <= end2)
			{
				if (arr[begin1] < arr[begin2])
				{
					tmp[index++] = arr[begin1++];
				}
				else
				{
					tmp[index++] = arr[begin2++];
				}
			}
			while (begin1 <= end1)
			{
				tmp[index++] = arr[begin1++];
			}
			while (begin2 <= end2)
			{
				tmp[index++] = arr[begin2++];
			}
			//导入到原数组中
			memcpy(arr + i, tmp + i, sizeof(int) * (end2 - i + 1));
		}
		gap *= 2;
	}
}

2、文件归并排序

2.1 外排序

外排序（External sorting）是指能够处理极大量数据的排序算法。通常来说，外排序处理的数据不能能一次装入内存，只能放在读写较慢的外存储器（通常是硬盘）上。外排序通常采用的是一种“排序---归并”的策略。在排序阶段，先读入能放在内存中的数据量，将其排序输出到一个临时文件，依次进行，将待排序数据组织为多个有序的临时文件。然后在归并阶段将这些临时文件组合为一个大的有序文件，即排序结果。

跟外排序对应的就是内排序，我们之间讲的常见的排序都是内排序。它们的排序思想适应的是数据在内存中，支持随机访问。归并排序的思想是不需要随机访问数据，只需要依次按序列读取数据，所以归并排序既是一个内排序，也是一个外排序。

2.2 思路分析

2.3 参考代码

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

//创建N个随机数，写到文件中
void CreateNData()
{
	//造数据
	int n = 100000;
	srand(time(0));
	const char* file = "data.txt";
	FILE* fin = fopen(file, "w");
	if(fin == NULL)
	{
		perror("fopen fail");
		return;
	}
	for(int i = 0;i < n;i++)
	{
		int x = rand() + i;
		fprintf(fin, "%d\n", x);
	}
	fclose(fin);
}

int compare(const void* a, const void* b)
{
	return (*(int*)a - *(int*)b);
}

//返回实际读到的数据个数，没有数据就返回0
int ReadNDataSortToFile(FILE* fout, int n, const char* file1)
{
	FILE* fin = fopen(file1, "w");
	if(fin == NULL)
	{
		perror("fopen fail");
		return -1;
	}
	
	int x = 0;
	int* arr = (int*)malloc(sizeof(int) * n);
	if(arr == NULL)
	{
		perror("malloc fail");
		return -1;
	}
	//读取n个数据,如果遇到文件结束,应该读j个
	int j = 0;
	for(int i = 0;i < n;i++)
	{
		if(fscanf(fout, "%d", &x) == EOF)
		{
			break;
		}
		arr[j++] = x;
	}
	
	if(j == 0)
	{
		return 0;
	}
	
	//排序
	qsort(arr, j, sizeof(int), compare);
	
	//写回file1文件
	for(int i = 0;i < j;i++)
	{
		fprintf(fin, "%d\n", arr[i]);
	}
	
	fclose(fin);
	return j;
}

void MergeFile(const char* file1, const char* file2, const char* mfile)
{
	FILE* fout1 = fopen(file1, "r");
	if(fout1 == NULL)
	{
		perror("fopen fail");
		return;
	}
	FILE* fout2 = fopen(file2, "r");
	if(fout2 == NULL)
	{
		perror("fopen fail");
		return;
	}
	FILE* mfin = fopen(mfile, "r");
	if(mfin == NULL)
	{
		perror("fopen fail");
		return;
	}
	
	int x1 = 0;
	int x2 = 0;
	int ret1 = fscanf(fout1, "%d", &x1);
	int ret2 = fscanf(fout2, "%d", &x2);
	while(ret1 != EOF && ret2 != EOF)
	{
		if(x1 < x2)
		{
			fprintf(mfin, "%d\n", x1);
			ret1 = fscanf(fout1, "%d", &x1);
		}
		else
		{
			fprintf(mfin, "%d\n", x2);
			ret2 = fscanf(fout2, "%d", &x2);
		}
	}
	while(ret1 != EOF)
	{
		fprintf(mfin, "%d\n", x1);
		ret1 = fscanf(fout1, "%d", &x1);
	}
	while(ret2 != EOF)
	{
		fprintf(mfin, "%d\n", x2);
		ret2 = fscanf(fout2, "%d", &x2);
	}
}

int main()
{
	CreateNData();
	const char* file1 = "file1.txt";
	const char* file2 = "file2.txt";
	const char* mfile = "mfile.txt";
	
	
	FILE* fout = fopen("data.txt", "r");
	if(fout == NULL)
	{
		perror("fopen fail");
		return;
	}
	
	ReadNDataSortToFile(fout, 100, file1);
	ReadNDataSortToFile(fout, 100, file2);
	
	while(1)
	{
		MergeFile(file1, file2, mfile);
		//删除file1和file2
		remove(file1);
		remove(file2);
		//重命名mfile为file1
		rename(mfile, file1);
		//当再次读取数据，一个都读不到，说明已经没有数据了
		//已经归并完成，归并结果在file1
		if(ReadNDataSortToFile(fout, 100, file2) == 0)
		{
			break;
		}
	}
	return 0;
}

数据结构（17）排序（下）

一、计数排序

二、排序算法的稳定性分析

三、快速排序拓展

1、三路划分

2、自省排序（introsort）

四、归并排序拓展

1、非递归版本归并排序

2、文件归并排序

2.1 外排序

2.2 思路分析

2.3 参考代码

网站公告

今日签到

热门文章

最新发布

数据结构（17）排序（下）

一、计数排序

二、排序算法的稳定性分析

三、快速排序 拓展

1、三路划分

2、自省排序（introsort）

四、归并排序 拓展

1、非递归版本归并排序

2、文件归并排序

2.1 外排序

2.2 思路分析

2.3 参考代码

网站公告

今日签到

热门文章

最新发布

三、快速排序拓展

四、归并排序拓展