抽样统计知识
发布于:2009-5-12 已被阅读: 次 

根据中心极限定理,任何一种连续型随机变量,不管它本身的图形如何,只要它的样本个数超过30个,

它的均值就可以视为服从正态分布.

抽样统计学原理概要

我们从一个总数为N的群体中选取n个样本,并估计参数μ和σ2,即样本容量和方差。

可以用这两个参数来描述分布状态,尤其是正态分布。

随机性确保了群体中的每个单元都有均等的入选机会,它排除了选择的偏差。估计值ā和s2,

即样本的平均值和方差都有它们各自的分布形式,我们常假定正态分布是最佳分布形式。

可以用这种分布来估计z的概率和正态偏差(即用t分布估计t的概率)或者形成确定样本数的z、t分布表。

有许多种随机取样方法,最简单的是对随机性没有限制的简单随机取样。

例如,如果一个取样区域的一部分是斜坡,而另一部分是平地,那么,这两个部分应该分别进行取样分析和解释。

我们可以对随机性附加些限定条件,如在分层随机取样中我们希望去除层次之间的变异,

其限制条件是在每一个层次中都分别随机性处理。在简单随机取样中,样本平均值总是群体平均值的无偏估计值。

我们谈到的“最优”估计值是指它的取样方差最小。

其结果是样本平均值和样本方差都能达到最优等。

人们经常想到的是样本的大小。如果样本的采集方法合适,我们知道,取样分数n/N小,

它的值就很难保证估计的精确度,其有效精确度依赖于样本数绝对值。这也就意味着在估计最佳样本数时,

有必要考虑绝对样本数,而不是样本百分数。在确定样本数的公式中,经常用n而不用n/N。

从样本数和精确度考虑,样本平均值ā的精确度随样本数的提高而提高。

在不考虑抽样群体的总体形状时,样本均值ā随样本数的增大而更接近于正态分布,它的根据是中心极限定理。30个样本对于标准估计是足够的(但是,我们也可以抽取超过30个的样本从而达到必要的精确度)。

这种假设关系的根据是,方差是有限的,而从总体中抽取样本是随机的。

 

 

 

--------------------------------------------------------------------------

First, you have to make sure whether these data are the means of the subgroups or individual samples. If they are individual samples(I guess this is the case you are talking about), the standard deviation of the data are estimated by the moving range, which is related to the sequence of the data. So if you change the sequence of the data, you'll get different standard deviation and thus different Cpk given the process specifications are fixed.

 

首先,你必须确定这些数据是否为这个小组或者个体样品计算所提供的手段。如果它们为个体样品(我猜测这个正是你们在谈论的情形),这些数据的标准偏差在允许的范围内被估计,其和数据的序列有关系。所以,如果你改变数据的序列,你将会得到不同的标准偏差。同时,如此不同的 CPK会给不同的规格提供修正。

 

Second, you have to understand thoroughly what Cpk is all about. It's a process capability ratiCpk=min{Cpl,Cpu}. It shows how well the process is centered on the target comparing with Cp. So generally people use Cp and Cpk togather trying to figure out the process capability. Furthermore, there are cases where process capability is low but the process is in control, and there are cases where the process is out of control but the process capability is comparative high. These are all related to the variance of the process and how well the process is targeted. There are lots of misleadings in the use of the process capability ratios in the industries.CP

 

其次,你必须全面理解什么是CPK。 它是一个加工能力比率Cpk=min{Cpl,Cpu},其显示出该能力是目标中心并优于CP。因此,人们通常将CP和CPK一起使用,并尝试理解其加工能力。此外,这里有很多情形就是加工能力低但加工处于控制之中,而加工处于控制之外时加工能力则相对较高。他们与加工的方方面面联系,同时加工有很强的目的性。在工业CP中使用加工能力的过程中出现了许多误导的情形。

 

In some industries, such as auto industry, people call the calculation of Cpk as Ppk.

 

As to why people use 32 or more data to calculate Cpk, I did a little research about it. In the industry, people accept Cp 1.33 as a commom sense for existing process which corresponds to 4 sigma variance level. If you use this date to do a little calculatiuon and check the table published by Quality Society of America ( I was trying to post that table before, but it didn't work. It was all messy. I guess the admin deleted that post), you will get the number approximately 32. But even 32 is not enough sometimes to get a unbiased estimation of the process capability ratio.

 

在诸如汽车业的一些产业中,人们将对CPK的计算称作PPK。至于人们为什么用32或者更多的数据来计算CPK,我对此做了一些研究。人们在运算中视cp1。33为普通理解与当前能力与4sigma的离差保持一致。如果你用这个数据做一些计算然后对照美国质量出版社出版的表格。(我曾尝试着邮寄那张表,但都没有成行。这简直太糟了,我猜想管理部门遗失了该邮件)。你可以取值接近32,但即使32有时候也不足以得到一个没有误差的加工能力比率

What I wanna stress again is that capability ratio is not everything, there are too many misuses in the industry, don't count all on it.我想再一次强调的是加工能力比率并不是万能的,在工业上有很多的误用,不要全部依靠它来计算。

 

Here is my answer to the question of 32 sample size:这里是我对样本尺寸为32的问题的回答。

 

A practice that is increasingly common in industry is to require a supplier to demonstrate process capability as part of the contractual agreement. Thus, it is frequently necessary to prove that the process capability ratio Cp meets or exceeds some particular target value---say, Cp0. This problem may be formulated as a hypothesis testing problem:

 

一个要在工业中日渐成熟的练习是需要一个供应者示范如契约的协议部份般的程序能力。 因此,有必要经常证明加工能力比率CP等于或者超过如CP0的一些特殊目标价值。这个问题可能被制定为一个假设的测试问题:

 

H0: Cp= Cp0 (or the process is not capable)

 

H1: Cp≥ Cp0 (or the process is capable)

 

We would like to reject H0 (recall that in statistical hypothesis testing rejection of Null hypothesis is always a strong conclusion), thereby demonstrating that the process is capable. We can formulate the statistical test in terms of Cp’, so that we will reject H0 if Cp’ exceeds a critical value C.

 

我们想要否定H0( 取消对统计的假设中无效力假设的测试否定一直是一个强大的结论)。因此,示范加工是有能力的。我们可以根据 Cp' 制定统计的测试, 所以如果 Cp'超过一个关键的价值 C,那么我们会否定H0 。

 

Kane(1986) has investigated this test, and provide a table of sample sizes and critical values for C to assist in testing process capability. We may define Cp(High) as a process capability that we would like to accept with probability (1-α) and Cp(low) as a process capability that we’d like to reject with probability (1-β). Please refer to the table created by Kane and used by American Society for Quality Control.

 

凯恩 (1986) 已经调查这上述测试, 而且向C提供一张有样品大小和关键值的表给来协助测试的加工能力。就如我们喜欢接受(1-α)的可能性和CP(低)作为程序能力和否定(1-β)的可能性一样,我们可以将CP(高)定义为一个加工能力。请查阅凯恩所创建的并为美国社会质量控制所用的表格。

Now we take the minimum required Cp value from the first table for two-sided specifications, which is 1.33. thus, the hypothesis testing problem then becomes:

 

现在,我们将从第一张表格中得到的具有两面规格的CP的最小需求量设置为1.33,假设测试的问题就将变为:

 

H0: Cp= 1.33

 

H1: Cp≥ 1.33

 

Now we want to be sure, at the 95% confidence level, that the process capability is bigger or lower than 1.33 before we accept or reject it. And we set the high value as 2, which is actually 6-sigma quality level. Namely, Cp(high)=2, Cp(low)=1.33 , α =β=1-0.95=0.05.

 

目前,在信度为95%的水平下,我们通过加工能力值的高1。33或低1。33来确定是接受还是否定。同时,我们把高的值设定为2,其实际的质量水平为6-Σ,即为Cp(high)=2, Cp(low)=1.33 , α =β=1-0.95=0.05.

 

Cp(high)/Cp(low)=2/1.33=1.504

 

Then check the table, the corresponding sample size is about n=32. And 接下来核对该表,对应的样品大小为n=32

 

C/Cp(low)= 1.2

 

So, C= 1.2*Cp(low)=1.2*1.33=1.6

 

Thus, to demonstrate the capability, the supplier must take a sample of n=32, and the sample process capability ratio must exceed C=1.6.

 

This is obtained using minimum process capability requirement in the industry. The higher the requirements, the smaller the Cp(high)/Cp(low) value will be. From the second table we know that the required sample sizes are increasing. It’s fairly common practice to accept the process as capable at the level Cp≥ 1.33 based on a sample of size 30≤n≤50 parts. Clearly, this procedure does not account for sampling variation in the estimate of sigma, and larger values of sample size may be necessary in practice.

因此, 就示范能力而言,供应者定会提供一个 n=32 的样品,而且样品加工能力比一定超过 C=1.6。这被视为获得到使用工业的最小程序能力需求。需求愈高,Cp(高度)/Cp(低点)的比值愈小。从第二张表格中我们知道必需的样品尺寸正在逐渐增加。公平而常见的做法是接受程序能力在以一个大小 30 ≤ n ≤ 50个部份的样品为基础的 Cp ≥ 1.33 的水平上。清楚地,这个程序不涉及到在Σ的估算中考虑样本的不同,同时,样本尺寸的值不断变大在实践中是很必要的。
 

编辑:admin
SPConline.net