算法与数据结构的问题,Consider a problem in which you have to store a catalogue of (unique) planet names for use at alater time.Compare and contrast arrays,binary search trees,AVL- trees and hash tables with linearhashing for the following t

来源:学生作业帮助网 编辑:作业帮 时间:2024/05/07 23:14:51

算法与数据结构的问题,Consider a problem in which you have to store a catalogue of (unique) planet names for use at alater time.Compare and contrast arrays,binary search trees,AVL- trees and hash tables with linearhashing for the following t
算法与数据结构的问题,
Consider a problem in which you have to store a catalogue of (unique) planet names for use at a
later time.Compare and contrast arrays,binary search trees,AVL- trees and hash tables with linear
hashing for the following two situations,and specify which data structure you would choose to
make the insertion process the fastest it could be in each of the two cases.
i) Inserting the words from a novel of 100,000 words,in which 10,000 words are unique
(distinct).
ii) Inserting the words from a dictionary of 50,000 unique words,which are stored in a mostly
sorted file.
You may assume that speed efficiency is of a higher priority than memory efficiency.For your
information log2 10,000 is approximately 13,log2 50,000 is approximately 16 and log2100,000 is
approximately 17.
Note:
“To compare” means to discuss the things that are the same or similar about the use of the
various data structures to solve the problem.
“To contrast” means to discuss the things that are different about the use of the various data
structures to solve the problem.
帮我解答下i 和 ii 选哪个方法,并且这4个之间的简单对比就好了,

算法与数据结构的问题,Consider a problem in which you have to store a catalogue of (unique) planet names for use at alater time.Compare and contrast arrays,binary search trees,AVL- trees and hash tables with linearhashing for the following t
现考虑一将随后可能用到的多个行星名称(名称皆唯一)存储在一目录中的问题.针对后续的两个使用场景,请比较并对比数组、二叉查找树、avl-树和使用线性hash函数的hash表,请指出你为达成令下列两种情况下插入过程最快的目的所选择的数据结构.
i)从小说中选择100000个词插入,其中10000个词是不重复的(译者按:即不按字母序随机插入)
ii)从字典中选择50000个不同的词插入,这些词大部分都是排序的
你只需关心速度性能即可,可暂不考虑内存性能.供参考,log2(10000)约等于13,log2(50000)约等于16,log2(100000)约等于17.
注意:
“比较”意即讨论不同数据结构的相似特点
“对比”则是请讨论不同数据结构的不同之意.
这题不是选1和2那个方法,1和2是不同场景,第一个是随机信息的插入,且90%数据是可不用插入的,第二个是有序信息的插入.
这题蛮麻烦,不能保证答对,简单分析下好了,对于插入操作,基本上都是两个过程“查找”(顺便也决定了插入的位置)和“插入”:
1,数组:得分两种情况
a,非排序数组,则查找时间复杂度为O(N),插入复杂度为O(1),总是插入至最末尾
b,排序数组,使用二分查找,查找时间复杂度为O(log2(N)),插入的话,大部分情况需要移动内存,题中不考虑内存性能,则时间复杂度为O(1)
2,二叉查找树,也得分两种情况
a,平衡的,则查找复杂度O(log2(N)),插入基本上是O(1)
b,非平衡,虽然平均查找复杂度还是O(log2(N)),但实际大多数情况都比这个值要大,插入基本O(1)
3,avl-树,这种树实质上是也是一种二叉查找树,它的名字是“自平衡二叉查找树”,因此它的任何操作都保证他是平衡二叉查找树,插入时间复杂度见上
4,线性hash函数的hash表,hash表的特点是预分配bucket count个单元,常常采用每个单元存储一个链表的冲突解决方法,他的查找性能取决于样本key数据在hash函数的图像上是否是均匀分布的,越均匀效果越高,查找复杂度和插入复杂度都决定于bucket长度、hash函数的选取、冲突的解决方法
结合case1和2分析:
case1:由于插入的是无序数据,且其中90%是重复的(意味着查找性能比插入性能重要),显然,适当选取bucket值和hash函数(即便是线性函数也有选择的余地)和解决冲突方法的hash表性能最好(趋于O(1)),avl-tree次之,排序数组和avl-tree性能相当(因这里插入操作比例低),非平衡二叉查找树也有类似的性能,非排序数组性能最差.
case2:输入数据是有序的,仍旧是参数合适的hash表性能最佳,趋于O(1)的时间复杂度,avl-tree次之.排序数组查找复杂度也是log2(N),由于是有序数据,在大小顺序与字典顺序一致时,插入复杂度很低,而相反时,插入复杂度很高,每次都要移动几乎整体的数据.有序数据还导致非平衡的二叉查找树的左右子树严重失衡,查找复杂度趋于O(N),性能相当低.非排序数组仍旧是低效的