Blog de carácter Educativo, Aquí se desarrolla todo el tema 5 de la materia probabilidad y estadística, con fin de consulta y apoyado de ejercicios y material didactico
miércoles, 30 de mayo de 2018
Bienvenida y comentarios acerca de la creacion del blog
Este blog fue creado con la intención de ofrecer la información, formulas, métodos, y análisis de los temas tratados para la solución de los ejercicios planteados aquí mismo.
Durante su creación se presentaron algunas dificultades que impedían proseguir, por ejemplo el desconocimiento o la complejidad de desarrollar algún subtema y resolver los ejercicios propuestos, aunque cabe señalar que gracias a la consulta de algunos libros que ofrece la institución mismos que se encuentran señalados en el apartado bibliográfico, ademas de la consulta al asesor de la materia se logro satisfactoriamente el objetivo.
También es importante dar agradecimiento a los dueños de los canales de Youtube de los cuales fueron tomados algunos vídeos explicativos de ejercicios.
Como conclusión final resaltar la clara diferencia de aprendizaje y conocimiento (sobretodo por parte de los autores ) entre el antes de empezar y al final al presentar el blog, donde la mayoría mostraron cierto dominio sobre el tema y se logro una buena experiencia.
Clarifications about the English version
From this post the same content of the blog is offered in the English language, this with the purpose of expanding the coverage of it and offering more students or common people the information presented here, adapting completely even the images, and only omitting the videos presented
Enjoy the content in English, and in case you find any error in the translation do not hesitate to leave the comment to correct the error as soon as possible and improve the content
5.1.7 Measurement errors
The measurement error is defined as the difference between the measured value and the "true value". Measurement errors affect any measuring instrument and can be due to different causes. Those that can be predicted in some way, calculated, eliminated by calibrations and compensations, are called deterministic or systematic and are related to the accuracy of the measurements. Those that can not be predicted, because they depend on unknown causes, or stochastic, are called random and are related to the precision of the instrument.
Although it is impossible to know all the causes of the error, it is convenient to know all the important causes and have an idea that allows us to evaluate the most frequent errors. The main causes that produce errors can be classified as:
Error due to the measuring instrument.
Error due to the operator.
Error due to environmental factors.
Error due to geometric tolerances of the piece itself.
The measurement error is defined as the difference between the measured value and the "true value". Measurement errors affect any measuring instrument and can be due to different causes. Those that can be predicted in some way, calculated, eliminated by calibrations and compensations, are called deterministic or systematic and are related to the accuracy of the measurements. Those that can not be predicted, because they depend on unknown causes, or stochastic, are called random and are related to the precision of the instrument.
Random error The laws or mechanisms that cause it due to its excessive complexity or its small influence on the final result are not known.
To know this type of errors we must first make a sampling of measurements. With the data of the successive measurements we can calculate its mean and sample standard deviation.
Systematic error They remain constant in absolute value and in the sign when measuring, a magnitude in the same conditions, and the laws that cause it are known.
To determine the systematic error of the measurement, a series of measurements must be made on a quantity Xo, the arithmetic mean of these measurements must be calculated and then the difference between the mean and the magnitude X0 must be found.
Systematic error = | media - X0 |
Although it is impossible to know all the causes of the error, it is convenient to know all the important causes and have an idea that allows us to evaluate the most frequent errors. The main causes that produce errors can be classified as:
Error due to the measuring instrument.
Error due to the operator.
Error due to environmental factors.
Error due to geometric tolerances of the piece itself.
5.1.6 Confidence intervals and tests for the correlation coefficient
5.1.6 Confidence intervals and tests for the correlation
coefficient
In statistics, it is called
confidence interval to a pair or several pairs of numbers between which it is
estimated that there will be a certain unknown value with a certain probability
of success. Formally, these numbers determine a range, which is calculated from
data from a sample, and the unknown value is a population parameter.
The probability of success in
the estimation is represented by 1 - α and is called confidence level. In these
circumstances, α is the so-called random error or level of significance, that
is, a measure of the possibilities of failure in the estimation by such an
interval.
Use the confidence interval to
evaluate the estimation of the population parameter. For example, a
manufacturer wants to know if the average length of the pencils he produces is
different from the target length. The manufacturer takes a random sample of
pencils and determines that the average length of the sample is 52 millimeters
and the confidence interval of 95% is (50.54). Therefore, you can be 95% sure
that the average length of all pencils is between 50 and 54 millimeters.
5.1.5 Two-dimensional normal distribution
5.1.5 Two-dimensional normal distribution
In statistics, the binomial distribution is a discrete probability distribution that counts the number of successes in a sequence of n independent Bernoulli trials, with a fixed probability p of occurrence of success between trials.
A Bernoulli experiment is characterized by being dichotomous, that is, only two results are possible. One of these is called "success" and has a probability of occurrence p and the other, "failure", with a probability q = 1 - p. In the binomial distribution, the experiment is repeated n times, independently, and the probability of a certain number of successes is calculated.
To represent that a random variable X follows a binomial distribution of parameters n and p, it is written:
Its probability function is
where
In statistics, the binomial distribution is a discrete probability distribution that counts the number of successes in a sequence of n independent Bernoulli trials, with a fixed probability p of occurrence of success between trials.
A Bernoulli experiment is characterized by being dichotomous, that is, only two results are possible. One of these is called "success" and has a probability of occurrence p and the other, "failure", with a probability q = 1 - p. In the binomial distribution, the experiment is repeated n times, independently, and the probability of a certain number of successes is calculated.
To represent that a random variable X follows a binomial distribution of parameters n and p, it is written:
Its probability function is
where
5.1.4 Linear correlation coefficient
5.1.4 Linear correlation coefficient
The linear correlation coefficient is the quotient between the covariance and the product of the standard deviations of both variables.
Properties
1.
The correlation coefficient does not change when the measurement scale does it.
That is, if we express the height in meters or in centimeters, the correlation coefficient does not change.
2.
The sign of the correlation coefficient is the same as that of the covariance.
3.
The linear correlation coefficient is a real number between -1 and 1.
Four.
If the linear correlation coefficient takes values close to -1, the correlation is strong and inverse, and will be stronger the closer a r approaches -1.
5.
If the linear correlation coefficient takes values close to 1, the correlation is strong and direct, and will be stronger the closer a r approaches.
6
If the linear correlation coefficient takes values close to 0, the correlation is weak.
7
If r = 1 or -1, the points of the cloud are on the increasing or decreasing line. Between both variables there is functional dependence.
The linear correlation coefficient is the quotient between the covariance and the product of the standard deviations of both variables.
Properties
1.
The correlation coefficient does not change when the measurement scale does it.
That is, if we express the height in meters or in centimeters, the correlation coefficient does not change.
2.
The sign of the correlation coefficient is the same as that of the covariance.
3.
The linear correlation coefficient is a real number between -1 and 1.
Four.
If the linear correlation coefficient takes values close to -1, the correlation is strong and inverse, and will be stronger the closer a r approaches -1.
5.
If the linear correlation coefficient takes values close to 1, the correlation is strong and direct, and will be stronger the closer a r approaches.
6
If the linear correlation coefficient takes values close to 0, the correlation is weak.
7
If r = 1 or -1, the points of the cloud are on the increasing or decreasing line. Between both variables there is functional dependence.
5.1.3 Correlation
5.1.3 Correlation
By definition, the correlation is the correspondence or relationship between two or more things, in statistics, the degree of dependence between random variables that intervene in a multidimensional distribution. It is that which indicates the force and the linear direction that is established between two random variables.
It is considered that two variables of a quantitative type have a correlation with each other when the values of one of them vary systematically with respect to the homonymous values of the other. For example, if we have two variables that are called A and B, there will be the aforementioned correlation phenomenon if increasing the values of A are also the values of B and vice versa
By definition, the correlation is the correspondence or relationship between two or more things, in statistics, the degree of dependence between random variables that intervene in a multidimensional distribution. It is that which indicates the force and the linear direction that is established between two random variables.
It is considered that two variables of a quantitative type have a correlation with each other when the values of one of them vary systematically with respect to the homonymous values of the other. For example, if we have two variables that are called A and B, there will be the aforementioned correlation phenomenon if increasing the values of A are also the values of B and vice versa
5.1.2 Simple Linear Regression
5.1.2 Simple Linear Regression
The objective of a regression model is to try to explain the relationship that exists between a dependent variable, (response variable) and a set of independent variables, (explanatory variables)
In the simple linear regression model, we try to explain the relationship that exists between the response variable AND a single explanatory variable X.
Y = α + βX + ε
Where α is the ordinate at the origin (the value that Y takes when X is 0)
β is the slope of the line, (and indicates how Y changes by increasing X by one unit)
ε is a variable that includes a large set of factors, each of which influences the response only in a small amount to what is called "error".
ESTIMATION OF THE REGRESSION STRAIGHT BY THE MINIMUM SQUARE METHOD
First, we will proceed to represent the scatter diagram, or point cloud. Suppose it is the one obtained in the figure. Although the cloud reveals a large dispersion, we can observe a certain linear tendency by increasing X and Y (a trend that is not entirely accurate, for example, if we assume that X is age and Y is the size, obviously, not only the size it depends on the age, in addition there can also be measurement errors).
The regression line should have a mid-line character, it should fit well with most of the data, that is, it should pass as close as possible to all the points, that you have little of each and every one of them means that we should adopt a particular criterion that is generally known as SQUARE MINIMUM. This criterion means that the sum of the squares of the vertical distances of the points to the line must be as small as possible.
The objective of a regression model is to try to explain the relationship that exists between a dependent variable, (response variable) and a set of independent variables, (explanatory variables)
In the simple linear regression model, we try to explain the relationship that exists between the response variable AND a single explanatory variable X.
Y = α + βX + ε
Where α is the ordinate at the origin (the value that Y takes when X is 0)
β is the slope of the line, (and indicates how Y changes by increasing X by one unit)
ε is a variable that includes a large set of factors, each of which influences the response only in a small amount to what is called "error".
ESTIMATION OF THE REGRESSION STRAIGHT BY THE MINIMUM SQUARE METHOD
First, we will proceed to represent the scatter diagram, or point cloud. Suppose it is the one obtained in the figure. Although the cloud reveals a large dispersion, we can observe a certain linear tendency by increasing X and Y (a trend that is not entirely accurate, for example, if we assume that X is age and Y is the size, obviously, not only the size it depends on the age, in addition there can also be measurement errors).
The regression line should have a mid-line character, it should fit well with most of the data, that is, it should pass as close as possible to all the points, that you have little of each and every one of them means that we should adopt a particular criterion that is generally known as SQUARE MINIMUM. This criterion means that the sum of the squares of the vertical distances of the points to the line must be as small as possible.
5.1.1 Scatter diagrams
5.1.1 Scatter diagrams
The scatter diagram is a graphical tool that helps identify the possible relationship between two variables. Represents the relationship between two variables graphically, which makes it easier to visualize and interpret the data. is a type of mathematical diagram that uses the Cartesian coordinates to show the values of two variables for a set of data.
The dispersion diagram allows to analyze if there is any kind of relationship between two variables. For example, it can happen that two variables are related so that increasing the value of one increases the value of the other. In this case we would talk about the existence of a positive correlation. It could also happen that when one occurs in one direction, the other derives in the opposite direction; for example, by increasing the value of the variable x, reduce that of the variable y. Then, there would be a negative correlation. If the values of both variables are revealed independent of each other, it would be affirmed that there is no correlation.
One of the most powerful aspects of a scatter plot, however, is its ability to show the non-linear relationships between the variables. Furthermore, if the data is represented by a simple relationship mixing model, these relationships are visually evident as overlapping patterns.
The scatter diagram is a graphical tool that helps identify the possible relationship between two variables. Represents the relationship between two variables graphically, which makes it easier to visualize and interpret the data. is a type of mathematical diagram that uses the Cartesian coordinates to show the values of two variables for a set of data.
The dispersion diagram allows to analyze if there is any kind of relationship between two variables. For example, it can happen that two variables are related so that increasing the value of one increases the value of the other. In this case we would talk about the existence of a positive correlation. It could also happen that when one occurs in one direction, the other derives in the opposite direction; for example, by increasing the value of the variable x, reduce that of the variable y. Then, there would be a negative correlation. If the values of both variables are revealed independent of each other, it would be affirmed that there is no correlation.
One of the most powerful aspects of a scatter plot, however, is its ability to show the non-linear relationships between the variables. Furthermore, if the data is represented by a simple relationship mixing model, these relationships are visually evident as overlapping patterns.
martes, 29 de mayo de 2018
5.1.7 Errores de medición
5.1.7 Errores de medición
El
error de medición se define como la diferencia entre el valor medido y el
"valor verdadero". Los errores de medición afectan a cualquier
instrumento de medición y pueden deberse a distintas causas. Las que se pueden
de alguna manera prever, calcular, eliminar mediante calibraciones y
compensaciones, se denominan deterministas o sistemáticos y se relacionan con
la exactitud de las mediciones. Los que no se pueden prever, pues dependen de
causas desconocidas, o estocásticas se denominan aleatorios y están
relacionados con la precisión del instrumento.
Error
aleatorio. No se conocen las leyes o mecanismos que lo causan por su excesiva
complejidad o por su pequeña influencia en el resultado final.
Para
conocer este tipo de errores primero debemos realizar un muestreo de medidas.
Con los datos de las sucesivas medidas podemos calcular su media y la
desviación típica muestra.
Error
sistemático. Permanecen constantes en valor absoluto y en el signo al medir,
una magnitud en las mismas condiciones, y se conocen las leyes que lo causan.
Para
determinar el error sistemático de la medición se deben de realizar una serie
de medidas sobre una magnitud Xo, se debe de calcular la media aritmética de
estas medidas y después hallar la diferencia entre la media y la magnitud X0.
Error
sistemático = | media - X0 |
Aunque
es imposible conocer todas las causas del error es conveniente conocer todas
las causas importantes y tener una idea que permita evaluar los errores más
frecuentes. Las principales causas que producen errores se pueden clasificar
en:
Error
debido al instrumento de medida.
Error
debido al operador.
Error
debido a los factores ambientales.
Error
debido a las tolerancias geométricas de la propia pieza.
5.1.6 Intervalos de confianza y pruebas para el coeficiente de corelacion
5.1.6 Intervalos de confianza y pruebas para el coeficiente de corelación
En
estadística, se llama intervalo de confianza a un par o varios pares de números
entre los cuales se estima que estará cierto valor desconocido con una
determinada probabilidad de acierto. Formalmente, estos números determinan un
intervalo, que se calcula a partir de datos de una muestra, y el valor
desconocido es un parámetro poblacional.
La probabilidad de éxito en la
estimación se representa con 1 - α y se denomina nivel de confianza. En estas
circunstancias, α es el llamado error aleatorio o nivel de significación, esto
es, una medida de las posibilidades de fallar en la estimación mediante tal
intervalo.
Utilice
el intervalo de confianza para evaluar la estimación del parámetro de
población. Por ejemplo, un fabricante desea saber si la longitud media de los
lápices que produce es diferente de la longitud objetivo. El fabricante toma
una muestra aleatoria de lápices y determina que la longitud media de la
muestra es 52 milímetros y el intervalo de confianza de 95% es (50,54). Por lo
tanto, usted puede estar 95% seguro de que la longitud media de todos los
lápices se encuentra entre 50 y 54 milímetros.
Ejercicios de Distribución bidimensional
Considerando las formulas mostradas anteriormente, realice el ejercicio presentado.
siendo
Ejercicio 1.
Se lanza una moneda 37 veces y se quiere conocer la probabilidad de que caiga sol 18 veces
lunes, 28 de mayo de 2018
5.1.5 Distribución normal bidimensional
5.1.5 Distribución normal bidimensional
En
estadística, la distribución binomial es una distribución de probabilidad
discreta que cuenta el número de éxitos en una secuencia de n ensayos de
Bernoulli independientes entre sí, con una probabilidad fija p de ocurrencia
del éxito entre los ensayos.
Un
experimento de Bernoulli se caracteriza por ser dicotómico, esto es, solo dos
resultados son posibles. A uno de estos se denomina «éxito» y tiene una
probabilidad de ocurrencia p y al otro, «fracaso», con una probabilidad q = 1 -
p. En la distribución binomial el experimento se repite n veces, de forma
independiente, y se trata de calcular la probabilidad de un determinado número
de éxitos.
Para
representar que una variable aleatoria X sigue una distribución binomial de
parámetros n y p, se escribe:
Su función de probabilidad es
donde
siendo las combinaciones de en ( elementos tomados de en )
Ejercicios Coeficiente de correlacion
La formula para sacar el coeficiente de correlacion (denominado r) es:
esto quiere decir "covarianza sobre el producto de las variaciones típicas de X e Y
Nota:
como puede observarse, para calcular el coeficiente de correlación hace falta
antes conocer la covarianza y las variaciones típicas de este modo:
Para desviación típica
Para covarianza
Ahora si se presentan dos Ejercicios
Ejercicio N° 1:
X
|
Y
|
1
|
10
|
2
|
17
|
3
|
30
|
4
|
28
|
5
|
39
|
6
|
47
|
Una
empresa de publicidad tiene la siguiente distribución de datos donde x=número de
anuncios publicitarios transmitidos e Y= al número de ventas conseguidas, Se desea
saber el coeficiente de correlación entre X e Y
Ejercicio
N° 2
X
|
Y
|
2
|
1
|
3
|
3
|
4
|
2
|
4
|
4
|
5
|
4
|
6
|
4
|
6
|
6
|
7
|
4
|
7
|
6
|
8
|
7
|
10
|
9
|
10
|
10
|
Las notas de 12 alumnos de un grupo de secundaria en dos materias diferentes son:
Donde X corresponde a las calificaciones en matemáticas e Y a Fisica
*Calcular
el coeficiente de correlación.
5.1.4 Coeficiente de correlación lineal
5.1.4 Coeficiente
de correlación lineal
El
coeficiente de correlación lineal es el cociente entre la covarianza y el
producto de las desviaciones típicas de ambas variables.
Vídeo del calculo de coeficiente de correlación lineal y análisis
Propiedades
1.
El coeficiente de correlación no varía al hacerlo la escala de medición.
Es
decir, si expresamos la altura en metros o en centímetros el coeficiente de
correlación no varía.
2.
El signo del coeficiente de correlación es el mismo que el de la covarianza.
3.
El coeficiente de correlación lineal es un número real comprendido entre −1 y
1.
4.
Si el coeficiente de correlación lineal toma valores cercanos a −1 la
correlación es fuerte e inversa, y será tanto más fuerte cuanto más se aproxime
r a −1.
5.
Si el coeficiente de correlación lineal toma valores cercanos a 1 la
correlación es fuerte y directa, y será tanto más fuerte cuanto más se aproxime
r a 1.
6.
Si el coeficiente de correlación lineal toma valores cercanos a 0, la
correlación es débil.
7.
Si r = 1 ó −1, los puntos de la nube están sobre la recta creciente o
decreciente. Entre ambas variables hay dependencia funcional.
5.1.3 Correlación
5.1.3 Correlación
Por definición la correlación es la correspondencia o
relación que mantienen dos o más cosas entre sí, en estadística, el grado de
dependencia entre variables aleatorias que intervienen en una distribución
multidimensional. Es aquello que indicara la fuerza y la dirección lineal que
se establece entre dos variables aleatorias.
Se considera que dos variables de tipo cuantitativo
presentan correlación la una respecto a la otra cuando los valores de una de
ellas varíen sistemáticamente con respecto a los valores homónimos de la otra.
Por ejemplo, si tenemos dos variables que se llaman A y B, existirá el mencionado
fenómeno de correlación si al aumentar los valores de A lo hacen también los
valores de B y viceversa.
5.1.2 Regresión Lineal simple
5.1.2 Regresión Lineal simple
El objetivo de un modelo de regresión es tratar de
explicar la relación que existe entre na variable dependiente, (variable de
respuesta) y un conjunto de variables independientes, (variables explicativas)
En el modelo de regresión lineal simple se trata de explicar
la relación que existe entre la variable de respuesta Y una única variable
explicativa X.
Y=α+βX+ε
En donde α es la ordenada en el origen (el valor que toma
Y cuando X vale 0)
β es la pendiente de la recta, (e indica cómo cambia Y
al incrementar X en una unidad)
ε es una variable que incluye un conjunto grande de
factores, cada uno de los cuales influye en la respuesta solo en pequeña
magnitud a la que se le llama “error”.
ESTIMACIÓN DE LA RECTA DE REGRESIÓN POR EL MÉTODO DE
LOS MÍNIMOS CUADRADOS
En primer lugar, procederemos a representar el
diagrama de dispersión, o nube de puntos. Supongamos que es la obtenida en la
figura. Aunque la nube revele una gran dispersión, podemos observar una cierta
tendencia lineal al aumentar X e Y (tendencia que no es del todo exacta; por
ejemplo, si suponemos que X es la edad e Y es la talla, obviamente, la talla no
sólo depende de la edad, además también puede haber errores de medida).
La recta de regresión debe tener carácter de línea
media, debe ajustarse bien a la mayoría de los datos, es decir, que pase lo más
cerca posible de todos los puntos, que diste poco de todos y cada uno de ellos
significa que hemos de adoptar un criterio particular que en general se conoce
como MÍNIMOS CUADRADOS. Este criterio significa que la suma de los cuadrados de
las distancias verticales de los puntos a la recta debe ser lo más pequeña
posible.
Suscribirse a:
Entradas (Atom)