abstrak penerapan metode bayes pada model … · secara umum, analisis regresi diartikan sebagai...

i

ABSTRAK

PENERAPAN METODE BAYES PADA MODEL CAMPURAN

LINIER GAUSS

Oleh

Sandy Vantika

NIM: 30113005

(Program Studi Doktor Matematika)

Secara umum, analisis regresi diartikan sebagai suatu analisis tentang

ketergantungan suatu peubah kepada peubah lain yaitu peubah bebas (prediktor)

dalam rangka membuat penaksiran atau prediksi nilai peubah terikat (respons) jika

nilai peubah bebas (prediktor) diketahui. Pada mulanya, analisis ini banyak

diaplikasikan dalam bidang medis, peternakan, dan pendidikan. Dalam

perkembangannya, analisis regresi mulai digunakan dalam bidang-bidang lain,

yaitu: perkebunan, genetika, hidrogeologi, dan kehutanan. Pada disertasi ini,

analisis ini akan diterapkan pada genetika.

Model campuran linier (MCL) secara umum ditulis sebagai 𝒚 = X𝜷 + Z𝝂 + 𝜺 dengan 𝒚 adalah vektor observasi berukuran 𝑛 × 1, X adalah

matriks desain berukuran 𝑛 × 𝑝 yang diketahui, 𝜷 adalah vektor parameter regresi

(sering disebut parameter pengaruh tetap) berukuran 𝑝 × 1 yang tidak diketahui, Z

adalah matriks desain berukuran 𝑛 × 𝑞 yang diketahui, 𝝂 adalah vektor parameter

pengaruh acak berukuran 𝑞 × 1, dan 𝜺 adalah vektor galat berukuran 𝑛 × 1.

Dibandingkan model regresi linier (MRL), jelas bahwa perbedaannya terletak

pada penambahan Z𝝂. Berdasarkan MCL tersebut dan mempertimbangkan

sebaran parameter 𝜷 maka dibentuk MCL Gauss dengan metode Bayes.

Penjumlahan antara pengaruh acak Z𝝂 dan galat 𝜺 dinotasikan oleh 𝜹. Oleh karena

itu, MCL dapat dituliskan pula menjadi 𝒚 = X𝜷 + 𝜹. Vektor 𝜹 diasumsikan

bersebaran 𝑁𝑛(0, V = K + R). Matriks K adalah matriks fungsi peragam

berukuran 𝑛 × 𝑛. Sementara, matriks R adalah matriks peragam dari galat 𝜺.

Karakteristik matriks peragam V khususnya kesingularan dinyatakan dalam suatu

lema.

MCL Bayes memiliki tiga parameter yang terdiri atas vektor parameter pengaruh

tetap 𝜷, ragam parameter pengaruh acak 𝜎𝜈2, dan ragam galat 𝜎𝜀

2. Dalam

penaksiran parameter, metode yang digunakan adalah likelihood maksimum

(LM), likelihood maksimum terbatas (LMT), LM-Bayes A, LM-Bayes B, LMT-

Bayes A, dan LMT-Bayes B. LM dan LMT menjadi metode untuk menaksir dua

parameter ragam. LM-Bayes A dan LMT-Bayes A menggunakan hasil taksiran

parameter ragam tersebut untuk menaksir vektor parameter pengaruh tetap.

ii

Sementara itu, LM-Bayes B dan LMT-Bayes B menaksir nilai observasi baru

tanpa harus menaksir parameter parameter pengaruh tetap terlebih dahulu.

Sebagai penerapan, MCL diterapkan pada data bobot jagung dari the Drought

Tolerance Maize for Africa Project of CIMMYT’s Global Maize Program.

Sebagai kebaruan, matriks peragam eksponensial dan eksponensial kuadrat

dibentuk sebagai alternatif dari matriks hubungan genomik yang merupakan

matriks peragam linier. Matriks peragam eksponensial dan eksponensial kuadrat

merupakan matriks yang komponennya terdiri atas fungsi peragam eksponensial

dan eksponensial kuadrat dengan hiperparameter tertentu. Matriks peragam linier

memberikan taksiran nilai parameter pengaruh tetap yang lebih tinggi

dibandingkan matriks peragam eksponensial dan eksponensial kuadrat dengan

perbedaan nilai rataan kuadrat galat (RKG) yang cukup signifikan. Kelebihan

matriks peragam eksponensial dan eksponensial kuadrat adalah bisa melihat

perbedaan karakteristik pengaruh acak pada setiap pasang individu dan

menggunakan hiperparameter yang bisa diatur.

Lema menjadi kontribusi yang signifikan pada perkembangan aljabar dalam

MCL. Algoritma pemprograman MCL Gauss dengan metode Bayes juga

dihasilkan. Berdasarkan karakteristik data yang serupa maka algoritma tersebut

dapat dipakai oleh para praktisi di berbagai bidang selain genetika.

Kata kunci: Bayes, penaksiran, fungsi peragam, model campuran linier.

iii

ABSTRACT

APPLICATION OF BAYESIAN METHOD ON GAUSSIAN

LINEAR MIXED MODEL

By

Sandy Vantika

NIM: 30113005

(Doctoral Program in Mathematics)

In general, regression analysis is defined as an analysis of the dependence of a

variable on other variables, namely the independent variables (predictor) in order

to make estimates or predictions of the value of the dependent variable (response)

if the value of the independent variables (predictor) is known. At first, this

analysis was widely applied in the fields of medical, animal husbandry, and

education. In its development, regression analysis began to be used in other

fields, namely: plantations, genetics, hydrogeology, and forestry. In this

dissertation, this analysis will be applied to genetics.

Linear mixed model (LMM) are generally written as 𝒚 = X𝜷 + Z𝝂 + 𝜺 with 𝒚 is

an n × 1 observation vector, X is a known n × p design matrix, 𝜷 is an unknown p

× 1 vector of parameter of regression (often called the fixed effect parameter), Z

is known n × q design matrix, 𝝂 is a q × 1 random-effect parameter vector, and 𝜺

is a n × 1 error vector. Compared to the linear regression model (LRM), it is

clear that the difference lies in the addition of Z𝝂. Based on the LMM and

considering the distribution of the 𝜷 parameter, Gaussian LMM was formed using

the Bayesian method.

The sum of the random effects Z𝝂 and error 𝜺 is denoted by 𝜹. Therefore, LMM

can also be written as 𝒚 = X𝜷 + 𝜹. The vector 𝜹 is assumed to have a distribution

of 𝑁𝑛(0, V = K + R). The K matrix is a n × n covariance function matrix.

Meanwhile, matrix R is the covariance matrix of error 𝜺. The characteristics of

the covariance matrix V, especially the singularity are expressed in a lemma.

Bayesian LMM has three parameters consisting of a fixed effect parameter vector

𝜷, random effect parameter variance 𝜎𝜈2, and error variance 𝜎𝜀

2. In parameter

estimation, the method used is maximum likelihood (ML), restricted maximum

likelihood (RML), ML-Bayes A, ML-Bayes B, RML-Bayes A, and RML-Bayes B.

ML and RML become methods for estimating two variance parameters. ML-Bayes

A and RML-Bayes A use the result of the parameter variance estimation to

estimate the vector of fixed effect parameters. Meanwhile, ML-Bayes B and RML-

Bayes B estimate the value of new observations without having to estimate the

vector of fixed effect parameters.

iv

As an application, LMM was applied to the data of the weight of corn from the

Drought Tolerance Maize for Africa Project of CIMMYT's Global Maize

Program. As a novelty, the exponential and quadratic exponential covariance

matrices are formed as alternative from the genomic relationship matrix which is

a linear covariance matrix. The exponential covariance matrix and quadratic

exponential are matrices whose components consist of exponential covariance

functions and quadratic exponential with certain hyperparameters. The linear

covariance matrix provides a higher estimated value of fixed effect parameter

than the exponential and quadratic exponential covariance matrix with a

significant difference in the mean value of the squared error (MSE). The

advantage of exponential and quadratic exponential covariance matrices is that

we can see the different characteristics of random effects on each individual pair

and use adjustable parameters.

Lemma becomes a significant contribution to the development of algebra in LMM.

Gaussian LMM programming algorithm with Bayesian method is also generated.

Based on the characteristics of similar data, the algorithm can be used by

practitioners in many fields other than genetics.

Keywords: Bayesian, estimation, covariance function, linear mixed model.

abstrak penerapan metode bayes pada model … · secara umum, analisis regresi diartikan sebagai...

Documents