instalasi microsoft visual studio
Post on 16-Oct-2021
26 Views
Preview:
TRANSCRIPT
45
LAMPIRAN A
Instalasi Microsoft Visual Studio
Gambar 1 Install Visual Studio
Gambar 2 Setup Preparation
46
Gambar 3 Path Instalasi
Pada langkah ini tentukan dimana letak Visual Studio akan di install. Setelah itu klik Next
dan tunggu proses instalasi sampai selesai.
Gambar 4 Instalasi Komponen Visual Studio
47
Gambar 5 Restart Komputer
Gambar 6 Proses Instalasi Setelah Komputer Restart
48
Gambar 7 Proses Instalasi Selesai
Instalasi MPICH2
Gambar 8 Instalasi MPICH2
49
Gambar 9 Proses Instalasi dan Finishing Setup
Ikuti perintah next setelah window setup muncul sampai muncul window path installation,
kemudian tentukan dimana MPICH2 akan di install. Klik next, maka proses instalasi akan dimulai,
tunggu sampai selesai kemudian finish.
Gambar 10 Install smpd dan Validasi MPI
Untuk menjalankan fungsi dari MPI yang akan di integrasikan dengan Visual Studio, maka
service dari MPI perlu diaktifkan, install service smpd dengan cara smpd –install setelah install
MPICH2. Setelah itu aktifkan smpd dengan command smpd –start , lalu cek dengan mpiexec –
validate, bila success maka service MPI sudah berjalan.
50
Setting MPI Pada Visual Studio
Gambar 11 Additional Include Directories.
Klik kanan project pada Solution Manager, kemudian pilih properties. Pada
Configuration Properties, expand C/C++ pilih general, kemudian pada kolom
Additional Include Directories, berikan path dari folder include OpenMPI, supaya
header dapat terbaca oleh sistem.
Gambar 12 Additional Library Directories.
51
Gambar 13 Additional Dependencies.
Expand menu linker, kemudian pilih general, pada Additional Library Directories berikan
path folder lib supaya file mpi.lib yang dideklarasikan di Additional Dependencies pada
sub menu linker input, dapat berjalan pada saat compile dan menjalankan aplikasi.
52
Setting Koneksi Cluster
Konfigurasi Firewall
Firewall pada masing-masing komputer user harus terbuka, supaya koneksi dari
MPI yang dikirimkan dari komputer cluster tidak di block oleh komputer lainnya.
Gambar 14 Pencarian Firewall dengan searchbox.
Gambar 15 Advanced Security Firewall.
53
Gambar 16 Firewall Properties.
Kemudian status Firewall State pilih menjadi off. Sehingga inbound dan outbound
connections tidak memblokir koneksi MPI pada saat mengirim data pada cluster atau pada
saat menerima data.
Konfigurasi IP dan User Credential
Gambar 17 Search Network and Sharing Center.
54
Gambar 18 Network and Sharing Center.
Pilih pada Change adapter setting , kemudian pada Local Area Connection klik
kanan dan pilih properties.
Gambar 19 Local Area Connection Properties.
55
Gambar 20 IPV4 Properties.
Setting masing-masing PC user dengan menggunakan cara yang sama, dan set
masing-masing IP PC. Dalam project ini PC pertama menggunakan IP 192.168.62.10 dan
PC kedua menggunakan IP 192.168.62.11
56
PC 1 PC 2
Gambar 3.19 User Account host dan client.
Nama user pada PC 1 dan PC 2 dan juga password harus identik, supaya pada
proses eksekusi OpenMPI PC 2 terdeteksi, dan MPI dapat melakukan transfer data antara
PC 1 dan PC2.
Setting Component Service
Pada search box start menu, ketikkan dcomcnfg.exe , tekan enter, pilih Component
service, kemudian masuk ke folder Computer, pada my Computer klik kanan pilih
properties.
Gambar 21 Component Service.
57
Gambar 22 Limit COM Security pada My Computer Properties.
Klik COM Security pilih edit limits. Disini akan di konfigurasikan koneksi user ke
komputer utama, supaya security PC memberikan status allow pada user yang terhubung
pada komputer utama. Add terlebih dahulu user yang akan diberikan permission untuk
mengakses komputer utama.
Gambar 23 Search Select User.
Klik advanced sehingga muncul menu untuk menambahkan jenis user yang akan di
tambahkan ke permission.
58
Gambar 24 Advanced Select User.
Klik Find now untuk mencari jenis user, kemudian pilih everyone, lalu klik OK.
Gambar 25 Edit Permission untuk user yang dipilih.
Check box yang terdapat pada Access Permission dan launch and activation permision
pada user Everyone. Beri check pada allow untuk semua opsi nya. Lalu OK dan tutup
Component Service.
59
Tes Koneksi dan Eksekusi Aplikasi MPI
Gambar 26 Test Ping
Gunakan command ping dengan diikuti nomor IP komputer cluster untuk mengetahui
koneksi cluster yang sudah terhubung.
Gambar 27 Eksekusi MPI dengan Menggunakan Command prompt
Aplikasi yang di implementasikan dengan MPI dijalankan menggunakan command prompt
dengan perintah :
Local : mpirun –np 2 file.exe
Angka 2 pada command tersebut digunakan untuk mensimulasikan jumlah proses
yang secara virtual berjalan pada local host, bisa diganti dengan angka yang berjumlah 2n
Cluster : mpirun –np 2 –host host1,host2 file.exe
Sama seperti dengan local, hanya ditambahkan dengan –host dan juga dengan nama
komputer masing-masing host, jumlah host dan angka host harus sama dengan sejumlah 2n.
60
Gambar 28 Task Manager Komputer Cluster
Pastikan pada saat eksekusi dengan menggunakan MPI , CPU usage pada komputer cluster
menunjukkan aktivitas pemrosesan. Hal ini menandakan ada data yang di proses di komputer
cluster.
Setting Nvidia Nsight
Langkah awal dalam menggunakan Nvidia Nsight, adalah pada PC user sudah
terinstall visual studio, supaya pada waktu instalasi Nvidia Toolkit, template dari Nsight
dapat terintegrasi pada new project visual studio, sehingga dapat langsung digunakan oleh
user. Setelah Instalasi berhasil dilakukan, cek kompatibilitas dari hardware GPU, support
atau tidak untuk memprogram dan menjalankan CUDA.
Gambar 29 Summary NVIDIA Installer setelah installing Toolkit.
61
Gambar 30 Pencarian Code Samples untuk uji coba GPU.
Untuk mengetahui apakah GPU yang terpasang di PC mendukung CUDA dapat
dilakukan pada NVIDIA CUDA samples browser, search dengan kata kunci particles
kemudian pada smoke particles klik run.
Gambar 31 Smoke screen code samples.
Apabila muncul render smoke screen , maka GPU mendukung CUDA
62
Gambar 32 Template dari CUDA yang terintegrasi dengan Visual Studio.
Setelah proses instalasi selesai maka installation summary akan menampilkan
fitur-fitur dan komponen dari CUDA Nsight yang telah berhasil di integrasikan pada visual
studio dan pada PC user. Dan pada visual studio sudah terintegrasi template project CUDA
runtime.
Gambar 33 Path CUDA pada environment variables.
Pada Environment Variabel yang terdapat di My Computer Properties lalu pilih
Advanced system settings, pastikan terdapat CUDA path yang berisi letak dari folder bin ,
include, dan library dari CUDA, supaya program CUDA dapat dikompilasi dan dieksekusi
oleh user.
63
Eksekusi Aplikasi CUDA
Pada saat CUDA di eksekusi pastikan GPU berjalan dengan menggunakan aplikasi GPU-z
atau CUDA – z , pada aplikasi tersebut terdapat sensor dari processor GPU yang akan
menunjukkan kepada user.
Gambar 34 Eksekusi aplikasi CUDA
Gambar 35 Sensor GPU pada saat idle dan Mengeksekusi Program
64
LAMPIRAN B
Source Code CPU Computing
Sorting
#include <stdio.h>
#include <conio.h>
#include <stdlib.h>
#include <iostream>
#include <windows.h>
void quicksort(float [10],int,int);
int main()
{
LARGE_INTEGER frequency;
LARGE_INTEGER t1,t2;
double elapsedTime;
QueryPerformanceFrequency(&frequency);
int size,i;
float *x;
float aa = 100.0;
printf("Enter size of the array: ");
scanf("%d",&size);
x = (float *)malloc( (size+1)*sizeof(float) );
for(i=0;i<size;i++)
{
x[i]=((float)rand()/(float)(RAND_MAX)) * aa;
}
QueryPerformanceCounter(&t1);
quicksort(x,0,size-1);
QueryPerformanceCounter(&t2);
elapsedTime = (t2.QuadPart - t1.QuadPart)*1000.0/
frequency.QuadPart;
printf("\n\n%f ms\n",elapsedTime);
system("pause");
return 0;
}
void quicksort(float x[],int first,int last)
{
int pivot,j,i;
float temp;
if(first<last)
{
pivot=first;
i=first;
j=last;
while(i<j)
{
while(x[i]<=x[pivot]&&i<last)
i++;
while(x[j]>x[pivot])
j--;
65
if(i<j)
{
temp=x[i];
x[i]=x[j];
x[j]=temp;
}
}
temp=x[pivot];
x[pivot]=x[j];
x[j]=temp;
quicksort(x,first,j-1);
quicksort(x,j+1,last);
}
}
Binary Search
#include <stdio.h>
#include <conio.h>
#include <stdlib.h>
#include <iostream>
#include <windows.h>
int main()
{
LARGE_INTEGER frequency;
LARGE_INTEGER t1,t2;
double elapsedTime;
int c,n;
int first, last, middle;
float search;
double *array;
float c2=1.25;
printf("number of elements\n");
scanf("%d",&n);
array = (double *)malloc((n+1) * sizeof(double));
//printf("Enter %d integers\n", n);
QueryPerformanceFrequency(&frequency);
for ( c = 0 ; c < n ; c++ )
{
array[c]=c2;
c2=c2+1.25;
}
printf("\nvalue to find\n");
scanf("%f",&search);
first = 0;
last = n - 1;
middle = (first+last)/2;
QueryPerformanceCounter(&t1);
while( first <= last )
{
if ( array[middle] < search ){
first = middle + 1;}
else if ( array[middle] == search ){
printf("%f found at location %d.\n", search, middle+1);
66
break;}
else
{
last = middle - 1;
}
middle = (first + last)/2;
}
if ( first > last )
{ printf("Not found! %d is not present in the list.\n",
search); }
QueryPerformanceCounter(&t2);
elapsedTime = (t2.QuadPart - t1.QuadPart)*1000.0/
frequency.QuadPart;
printf("\n\n\n%f ms\n",elapsedTime);
system("pause");
return 0;
}
Matrix Multiplication
#include <stdio.h>
#include <conio.h>
#include <stdlib.h>
#include <iostream>
#include <windows.h>
int main()
{ //FLOATING
int i, j, k;
double **mat1, **mat2, **res;
long n;
float aa = 5.0;
LARGE_INTEGER frequency;
LARGE_INTEGER t1,t2;
double elapsedTime;
// get the order of the matrix from the user
printf("Size of matrix:");
scanf("%d", &n);
QueryPerformanceFrequency(&frequency);
// dyamically allocate memory to store elements
mat1 = (double **)malloc(sizeof(double) * n);
mat2 = (double **)malloc(sizeof(double) * n);
res = (double **) malloc(sizeof(double) * n);
for (i = 0; i < n; i++)
{
mat1[i] = (double *)malloc(sizeof(double) * n);
mat2[i] = (double *)malloc(sizeof(double) * n);
res[i] = (double *)malloc(sizeof(double) * n);
}
// get the input matrix
printf("\n");
for (i = 0; i < n; i++) {
for (j = 0; j < n; j++) {
67
//mat1[i][j] = rand() % 10 +1;
mat1[i][j] =
((float)rand()/(float)(RAND_MAX)) * aa;
}
}
printf("matrix 1:\n");
for(int aa=0; aa<n ; aa++)
{
for(int bb=0; bb<n ;bb++)
{
printf("%.2f ",mat1[aa][bb]);
}
printf("\n");
}
printf("\n");
// get the input for second matrix from the user
printf("matrix 2:\n");
for (i = 0; i < n; i++)
{
for (j = 0; j < n; j++)
{
//mat2[i][j] = rand() % 10 +1;
mat2[i][j]=((float)rand()/(float)(RAND_MAX)) * aa;
}
}
for(int aa=0; aa<n ; aa++)
{
for(int bb=0; bb<n ;bb++)
{
printf("%.2f ",mat2[aa][bb]);
}
printf("\n");
}
QueryPerformanceCounter(&t1);
// multiply first and second matrix
for (i = 0; i < n; i++) {
for (j = 0; j < n; j++) {
*(*(res + i) + j) = 0;
for (k = 0; k < n; k++) {
*(*(res + i) + j) = *(*(res + i) + j) +
(*(*(mat1 + i) + k) * *(*(mat2 + k) + j));
}
}
}
QueryPerformanceCounter(&t2);
elapsedTime = (t2.QuadPart - t1.QuadPart)*1000.0/
frequency.QuadPart;
printf("\n\n\n%f ms\n",elapsedTime);
// print the result
printf("\nResult :\n");
for (i = 0; i < n; i++) {
for (j = 0; j < n; j++) {
printf("%.2f ", *(*(res + i) + j));
}
printf("\n");
68
}
free(mat1);
free(mat2);
free(res);
system("pause");
return 0;
}
Gauss Jordan Elimination
#include <stdio.h>
#include <conio.h>
#include <stdlib.h>
#include <iostream>
#include <windows.h>
#include <math.h>
#include <malloc.h>
#include <windows.h>
int main()
{
int i, j, n;
double **a, *b, *x;
LARGE_INTEGER frequency;
LARGE_INTEGER t1,t2;
double elapsedTime;
void gauss_jordan(int n, double **a, double *b, double *x);
printf("\nNumber of equations: ");
scanf("%d", &n);
float aa = 10.0;
QueryPerformanceFrequency(&frequency);
x = (double *)malloc( (n+1)*sizeof(double) );
b = (double *)malloc( (n+1)*sizeof(double) );
a = (double **)malloc( (n+1)*sizeof(double *) );
for(i = 1; i <= n; i++)
a[i] = (double *)malloc( (n+1)*sizeof(double) );
for(i = 1; i <= n; i++)
{
for(j = 1; j <= n; j++)
{
//a[i][j]=rand()%10 + 1;
a[i][j]=((float)rand()/(float)(RAND_MAX)) * aa;
}
//b[i]=rand()%10 + 1;
b[i]=((float)rand()/(float)(RAND_MAX)) * aa;
}
for(int aa = 1 ; aa<=n ; aa++)
{
69
for(int bb = 1 ; bb<=n ; bb++)
{
printf("%.1f ",a[aa][bb]);
}
printf(" %.1f ",b[aa]);
printf("\n");
}
printf("\n\n");
QueryPerformanceCounter(&t1);
gauss_jordan(n, a, b, x);
QueryPerformanceCounter(&t2);
elapsedTime = (t2.QuadPart - t1.QuadPart)*1000.0/
frequency.QuadPart;
printf("\n\n\n%f ms\n",elapsedTime);
printf("\nSolution\n");
printf("------------------------------------------------\n");
printf("x = (");
for(i = 1; i <= n-1; i++) printf("%lf, ", x[i]);
printf("%lf)\n\n", x[n]);
system("pause");
return(0);
}
void gauss_jordan(int n, double **a, double *b, double *x)
{
int i, j, k;
int p;
double factor;
double big, dummy;
for(k = 1; k <= n; k++)
{
// pivoting
if(k < n)
{
p = k;
big = fabs(a[k][k]);
for(i = k+1; i <= n; i++)
{
if(big < fabs(a[i][k]))
{
big = fabs(a[i][k]);
p = i;
}
}
if(p != k)
{
for(j = 1; j <= n; j++)
{
dummy = a[p][j];
70
a[p][j] = a[k][j];
a[k][j] = dummy;
}
dummy = b[p];
b[p] = b[k];
b[k] = dummy;
}
}
// Gauss-Jordan elimination
factor = a[k][k];
for(j = 1; j <= n; j++) a[k][j] /= factor;
b[k] /= factor;
for(i = 1; i <= n; i++)
{
if(i == k) continue;
factor = a[i][k];
for(j = 1; j <= n; j++) a[i][j] -=
a[k][j]*factor;
b[i] -= b[k]*factor;
}
}
for(i = 1; i <= n; i++) x[i] = b[i];
return;
}
Source Code GPU Computing
Sorting
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <iostream>
#include <windows.h>
using namespace std;
#include <cuda.h>
#include <stdio.h>
#include <stdlib.h>
#include <conio.h>
#include <cuda_runtime_api.h>
//#define NUM 8
__device__ inline void swap(float & a, float & b)
{
float tmp = a;
a = b;
b = tmp;
}
71
__global__ void bitonicSort(float * values, float N)
{
extern __shared__ float shared[];
const unsigned int tid = threadIdx.x;
shared[tid] = values[tid];
for (unsigned int k = 2; k <= N; k *= 2)
{
for (unsigned int j = k / 2; j>0; j /= 2)
{
unsigned int ixj = tid ^ j;
if (ixj > tid)
{
if ((tid & k) == 0)
{
if (shared[tid] > shared[ixj])
{
swap(shared[tid], shared[ixj]);
}
}
else
{
if (shared[tid] < shared[ixj])
{
swap(shared[tid], shared[ixj]);
}
}
}
}
}
values[tid] = shared[tid];
}
int main(void)
{
cudaEvent_t start, stop;
float time;
float * dvalues;
float * values;
double NUM;
float aa = 5.0;
scanf("%d",&NUM);
values = (float *)malloc( (NUM+1)*sizeof(float) );
size_t size = NUM * sizeof(int);
for(int i = 0; i < NUM; i++)
{
//values[i]=rand()%10 + 1;
values[i] = ((float)rand()/(float)(RAND_MAX)) * aa;
}
/*printf("\n nilai awal: ");
for (int i=0; i<NUM; i++) printf(" %i",values[i]); */
cudaMalloc((void **)&dvalues,size);
72
cudaMemcpy(dvalues, values, size , cudaMemcpyHostToDevice);
cudaEventCreate(&start);
cudaEventCreate(&stop);
cudaEventRecord(start,0);
bitonicSort<<<1, NUM, size >>>(dvalues,NUM);
cudaEventRecord(stop,0);
cudaEventSynchronize(stop);
cudaEventElapsedTime(&time, start, stop);
cudaMemcpy(values, dvalues, size, cudaMemcpyDeviceToHost);
cudaFree(dvalues);
/*printf("\n hasil pengurutan: ");
for (int i=0; i<NUM; i++) printf(" %i",values[i]);*/
printf("%f ms\n",time);
printf("\n");
system("pause");
}
Binary Search
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <stdio.h>
#include <conio.h>
#include <stdlib.h>
#include <iostream>
#include <windows.h>
#include <assert.h>
__device__ int get_index_to_check(int thread, int num_threads, int
set_size, int offset) {
return (((set_size + num_threads) / num_threads) * thread) +
offset;
}
__global__ void p_ary_search(float search, int array_length, int
*arr, int *ret_val ) {
const int num_threads = blockDim.x * gridDim.x;
const int thread = blockIdx.x * blockDim.x + threadIdx.x;
int set_size = array_length;
while(set_size != 0){
int offset = ret_val[1];
int index_to_check = get_index_to_check(thread,
num_threads, set_size, offset);
if (index_to_check < array_length){
int next_index_to_check =
get_index_to_check(thread + 1, num_threads, set_size, offset);
if (next_index_to_check >= array_length){
next_index_to_check = array_length - 1;
}
if (search > arr[index_to_check] && (search <
arr[next_index_to_check])) {
ret_val[1] = index_to_check;
}
73
else if (search == arr[index_to_check]) {
ret_val[0] = index_to_check;
}
}
set_size = set_size / num_threads;
}
}
float chop_position(float search, float *search_array, int
array_length)
{
float time;
cudaEvent_t start, stop;
int array_size = array_length * sizeof(int);
if (array_size == 0) return -1;
int *dev_arr;
cudaMalloc((void**)&dev_arr, array_size);
cudaMemcpy(dev_arr, search_array, array_size,
cudaMemcpyHostToDevice);
int *ret_val = (int*)malloc(sizeof(int) * 2);
ret_val[0] = -1; // return value
ret_val[1] = 0; // offset
array_length = array_length % 2 == 0 ? array_length :
array_length - 1; // array size
int *dev_ret_val;
cudaMalloc((void**)&dev_ret_val, sizeof(int) * 2);
cudaMemcpy(dev_ret_val, ret_val, sizeof(int) * 2,
cudaMemcpyHostToDevice);
// Launch kernel
cudaEventCreate(&start);
cudaEventCreate(&stop);
cudaEventRecord(start,0);
p_ary_search<<<16, 64>>>(search, array_length, dev_arr,
dev_ret_val);
cudaEventRecord(stop,0);
cudaEventSynchronize(stop);
cudaEventElapsedTime(&time, start, stop);
// Get results
cudaMemcpy(ret_val, dev_ret_val, 2 * sizeof(int),
cudaMemcpyDeviceToHost);
int ret = ret_val[0];
printf("\nFound %i\n",ret_val[1]);
printf("\nElapsed Time : %f ms",time);
// Free memory on device
cudaFree(dev_arr);
cudaFree(dev_ret_val);
74
free(ret_val);
return ret;
}
static float * build_array(int length) {
float *ret_val = (float*)malloc(length * sizeof(float));
for (int i = 0; i < length; i++)
{
ret_val[i] = (i * 2 + 0.5) - 1;
//ret_val[i] = i;
printf("%.2f ",ret_val[i]);
}
return ret_val;
}
static void test_array(int length, float search, float index) {
printf("Length %i Search %.2f\n", length, search);
assert(index == chop_position(search, build_array(length),
length) && "test_small_array()");
}
static void test_arrays() {
int length;
float search;
scanf("%d",&length);
scanf("%f",&search);
test_array(length, search, -1);
}
int main(){
test_arrays();
system("pause");
}
Matrix Multiplication
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <iostream>
#include <windows.h>
using namespace std;
#include <cuda.h>
#include <stdio.h>
#include <stdlib.h>
#include <conio.h>
75
#include <cuda_runtime_api.h>
#define BLOCK_SIZE 100
__global__ void gpuMM(float *A, float *B, float *C, int N)
{
int row = blockIdx.y*blockDim.y + threadIdx.y;
int col = blockIdx.x*blockDim.x + threadIdx.x;
float sum = 0.f;
for (int n = 0; n < N; ++n)
sum += A[row*N+n]*B[n*N+col];
C[row*N+col] = sum;
}
int main(int argc, char *argv[])
{
LARGE_INTEGER frequency;
LARGE_INTEGER t1,t2;
double elapsedTime;
int N,K,L;
awal:
scanf("%d",&L);
if(L < 1000)
{
printf("Input must be greater than 1000\n");
goto awal;
}
K = L/100;
N = K*BLOCK_SIZE;
float time;
cudaEvent_t start, stop;
float *hA,*hB,*hC;
hA = new float[N*N];
hB = new float[N*N];
hC = new float[N*N];
float aa=5.0;
for (int j=0; j<N; j++){
for (int i=0; i<N; i++){
hA[j*N+i] = ((float)rand()/(float)(RAND_MAX)) * aa;
hB[j*N+i] = ((float)rand()/(float)(RAND_MAX)) *
aa;
}
}
int size = N*N*sizeof(float); // Size of the memory in
bytes
float *dA,*dB,*dC;
cudaMalloc(&dA,size);
cudaMalloc(&dB,size);
cudaMalloc(&dC,size);
dim3 threadBlock(BLOCK_SIZE,BLOCK_SIZE);
dim3 grid(K,K);
76
// Copy matrices from the host to device
cudaMemcpy(dA,hA,size,cudaMemcpyHostToDevice);
cudaMemcpy(dB,hB,size,cudaMemcpyHostToDevice);
//Execute the matrix multiplication kernel
cudaEventCreate(&start);
cudaEventCreate(&stop);
cudaEventRecord(start,0);
gpuMM<<<grid,threadBlock>>>(dA,dB,dC,N);
cudaEventRecord(stop,0);
cudaEventSynchronize(stop);
cudaEventElapsedTime(&time, start, stop);
float *C;
C = new float[N*N];
cudaMemcpy(C,dC,size,cudaMemcpyDeviceToHost);
cudaFree(dA);
cudaFree(dB);
cudaFree(dC);
printf("%f ms\n",time);
system("pause");
}
Gauss Jordan Elimination
main.cpp
#include<stdio.h>
#include<conio.h>
#include<stdlib.h>
#include "Common.h"
int main(int argc , char **argv)
{
float *a_h = NULL ;
float *b_h = NULL ;
float *result , sum ,rvalue ;
int numvar ,j ;
float aa = 5.0;
numvar = 0;
scanf("%d",&numvar);
a_h = (float*)malloc(sizeof(float)*numvar*(numvar+1));
b_h = (float*)malloc(sizeof(float)*numvar*(numvar+1));
int ii=0;
for(int i = 1; i <= numvar; i++)
{
for(int i = 1; i <= numvar+1; i++)
77
{
//a_h[ii]=rand()%10 + 1;
a_h[ii]=((float)rand()/(float)(RAND_MAX)) * aa;
ii++;
}
}
//Calling device function to copy data to device
DeviceFunc(a_h , numvar , b_h);
//Showing the data
printf("\n\n");
/*for(int i =0 ; i< numvar ;i++)
{
for(int j =0 ; j< numvar+1; j++)
{
printf("%.2f ",b_h[i*(numvar+1) + j]);
}
printf("\n");
} */
//Using Back substitution method
result = (float*)malloc(sizeof(float)*(numvar));
for(int i = 0; i< numvar;i++)
{
result[i] = 1.0;
}
for(int i=numvar-1 ; i>=0 ; i--)
{
sum = 0.0 ;
for( j=numvar-1 ; j>i ;j--)
{
sum = sum + result[j]*b_h[i*(numvar+1) + j];
}
rvalue = b_h[i*(numvar+1) + numvar] - sum ;
result[i] = rvalue / b_h[i *(numvar+1) + j];
}
//Tampil hasil
/*for(int i =0;i<numvar;i++)
{
printf("[X%d] = %+f\n", i ,result[i]);
}*/
_getch();
return 0;
}
DeviceFunc.cu
78
#include <cuda.h>
#include "Common.h"
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <stdio.h>
#include <conio.h>
#include <stdlib.h>
#include <iostream>
#include <windows.h>
__global__ void Kernel(float *, float * ,int );
void DeviceFunc(float *temp_h , int numvar , float *temp1_h)
{
float time;
float *a_d , *b_d;
LARGE_INTEGER frequency;
LARGE_INTEGER t1,t2;
double elapsedTime;
cudaEvent_t start, stop;
//Memory allocation on the device
cudaMalloc(&a_d,sizeof(float)*(numvar)*(numvar+1));
cudaMalloc(&b_d,sizeof(float)*(numvar)*(numvar+1));
//Copying data to device from host
cudaMemcpy(a_d, temp_h,
sizeof(float)*numvar*(numvar+1),cudaMemcpyHostToDevice);
//Defining size of Thread Block
dim3 dimBlock(numvar+1,numvar,1);
dim3 dimGrid(1,1,1);
//Kernel call
cudaEventCreate(&start);
cudaEventCreate(&stop);
cudaEventRecord(start,0);
Kernel<<<dimGrid , dimBlock>>>(a_d , b_d , numvar);
cudaEventRecord(stop,0);
cudaEventSynchronize(stop);
cudaEventElapsedTime(&time, start, stop);
//Coping data to host from device
cudaMemcpy(temp1_h,b_d,sizeof(float)*numvar*(numvar+1),cudaMemcpyD
eviceToHost);
//Deallocating memory on the device
cudaFree(a_d);
cudaFree(b_d);
printf("%f ms\n",time);
}
Kernel.cu
#include <cuda.h>
#include "Common.h"
79
__global__ void Kernel(float *a_d , float *b_d ,int size)
{
int idx = threadIdx.x ;
int idy = threadIdx.y ;
//int width = size ;
//int height = size ;
//Allocating memory in the share memory of the device
__shared__ float temp[16][16];
//Copying the data to the shared memory
temp[idy][idx] = a_d[(idy * (size+1)) + idx] ;
for(int i =1 ; i<size ;i++)
{
if((idy + i) < size)
{
float var1 =(-1)*( temp[i-1][i-1]/temp[i+idy][i-1]);
temp[i+idy][idx] = temp[i-1][idx] +((var1) *
(temp[i+idy ][idx]));
}
}
b_d[idy*(size+1) + idx] = temp[idy][idx];
}
Common.h
#ifndef __Common_H
#define __Common_H
#endif
void getvalue(float ** ,int *);
void DeviceFunc(float * , int , float *);
Source Code Cluster Computing
Sorting
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#define DEBUG
#define ROOT 0
#define ISPOWER2(x) (!((x)&((x)-1)))
float *merge(float array1[], float array2[], float size) {
float *result = (float *)malloc(2*size*sizeof(float));
int i=0, j=0, k=0;
while ((i < size) && (j < size))
result[k++] = (array1[i] <= array2[j])? array1[i++] : array2[j++];
while (i < size)
80
result[k++] = array1[i++];
while (j < size)
result[k++] = array2[j++];
return result;
}
float sorted(float array[], float size) {
int i;
for (i=1; i<size; i++)
if (array[i-1] > array[i])
return 0;
return 1;
}
int compare(const void *p1, const void *p2) {
return *(float *)p1 - *(float *)p2;
}
int main(int argc, char** argv) {
int i, b=1, npes, myrank;
long datasize;
float localsize, *localdata, *otherdata, *data = NULL;
int active = 1;
MPI_Status status;
double start, finish, p, s;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
MPI_Comm_size(MPI_COMM_WORLD, &npes);
datasize = strtol(argv[1], argv, 10);
if (!ISPOWER2(npes)) {
if (myrank == ROOT) printf("Processor number must be power of
two.\n");
return MPI_Finalize();
}
if (datasize%npes != 0) {
if (myrank == ROOT) printf("Datasize must be divisible by
processor number.\n");
return MPI_Finalize();
}
if (myrank == ROOT) {
data = (float *)malloc(datasize * sizeof(float));
for (i = 0; i < datasize; i++)
data[i] = rand()%99 + 1;
}
start = MPI_Wtime();
localsize = datasize / npes;
localdata = (float *) malloc(localsize * sizeof(float));
MPI_Scatter(data, localsize, MPI_INT, localdata, localsize,
MPI_INT,
ROOT, MPI_COMM_WORLD);
qsort(localdata, localsize, sizeof(int), compare);
81
while (b < npes) {
if (active) {
if ((myrank/b)%2 == 1) {
MPI_Send(localdata, b * localsize, MPI_INT, myrank - b, 1,
MPI_COMM_WORLD);
free(localdata);
active = 0;
} else {
otherdata = (float *) malloc(b * localsize * sizeof(float));
MPI_Recv(otherdata, b * localsize, MPI_INT, myrank + b, 1,
MPI_COMM_WORLD, &status);
localdata = merge(localdata, otherdata, b * localsize);
free(otherdata);
}
}
b <<= 1;
}
finish = MPI_Wtime();
if (myrank == ROOT) {
#ifdef DEBUG
if (sorted(localdata, npes*localsize)) {
printf("\nParallel sorting succeed.\n\n");
} else {
printf("\nParallel sorting failed.\n\n");
}
#endif
free(localdata);
p = finish - start;
printf(" Parallel : %.8f\n", p);
/*start = MPI_Wtime();
qsort(data, datasize, sizeof(float), compare);
finish = MPI_Wtime();*/
free(data);
}
return MPI_Finalize();
}
Binary Search
#include "mpi.h"
#include <iostream>
#include <math.h>
using namespace std;
int main(int argc,char **argv)
{
const int Master = 0;
const int Tag_Size = 1;
const int Tag_Data= 2;
82
const int Tag_Max=3;
int max;
double MaxInAll;
int MyId, P;
double* A;
int ArrSize, Target;
int n, Start;
int i, x;
int Source, dest, Tag;
int WorkersDone = 0 ;
double start, finish, p;
MPI_Status RecvStatus;
MPI_Init(&argc, &argv);
MPI_Comm_rank (MPI_COMM_WORLD, &MyId);
MPI_Comm_size (MPI_COMM_WORLD, &P);
start = MPI_Wtime();
//start working..
if (MyId == Master)
{
.
cout<<"This is the master process on "<<P<<" Processes\n";
MaxInAll=0;
int GlobIndx;
cout<<"Enter the number of elements you want to
generate..";
cin>> ArrSize;
..
A = new double[ArrSize];
srand ( P ); /* initialize random seed: */
for ( i= 0; i<ArrSize; i++)
{
A[i] = i+1.25;
}
n = ArrSize/(P-1);
for( i = 1; i < P; i++)
{
dest = i;
if (i == P-1)
n = ArrSize - (n*(P-2));
Tag = Tag_Size;
MPI_Send(&n, 1, MPI_DOUBLE, dest, Tag,
MPI_COMM_WORLD);
Tag = Tag_Data;
83
Start = (i - 1) * ( ArrSize/(P-1) );
MPI_Send(A+Start, n, MPI_DOUBLE, dest, Tag,
MPI_COMM_WORLD);
}
WorkersDone = 0;
int MaxIndex = 0;
while (WorkersDone < P-1 )
{
MPI_Recv(&x, 1, MPI_DOUBLE, MPI_ANY_SOURCE, MPI_ANY_TAG,
MPI_COMM_WORLD, &RecvStatus);
Source = RecvStatus.MPI_SOURCE;
Tag = RecvStatus.MPI_TAG;
if (Tag == Tag_Max)/
{
GlobIndx = (Source - 1)*(ArrSize/(P-1) ) + x;
if ( A[GlobIndx] > MaxInAll)
{
MaxInAll = A[GlobIndx];
MaxIndex = GlobIndx;
}
WorkersDone++;
}
}
if(WorkersDone==P-1)
cout << "Process "<<Source<<" found the max of the
array "<< MaxInAll<<" at index "<<MaxIndex;
delete [] A;
}
else
{
max=0;
cout<<"Process "<<MyId<<" is alive...\n";
Source = Master;
Tag = Tag_Size;
MPI_Recv(&n, 1, MPI_DOUBLE, Source, Tag, MPI_COMM_WORLD,
&RecvStatus);
A = new double[n];
Tag = Tag_Data;
MPI_Recv(A, n, MPI_DOUBLE, Source, Tag, MPI_COMM_WORLD,
&RecvStatus);
cout<<"Process "<<MyId<< "Received "<<n<<" data
elements\n";
int max_i;
i = 0;
while (i<n )
{
if ( A[i] > max )
{
max=A[i];
max_i=i;
}
i++;
84
}
dest = Master;
Tag = Tag_Max;
cout<<"Process "<<MyId<< " has max equals "<<max<<endl;
MPI_Send(&max_i, 1, MPI_DOUBLE, dest, Tag,
MPI_COMM_WORLD);
delete [] A;
}
finish = MPI_Wtime();
if (MyId == 0)
{
p = finish - start;
printf(" Parallel : %.8f\n", p);
}
MPI_Finalize();
return 0;
}
Matrix Multiplication
#include <stdio.h>
#include "mpi.h"
#define N 5000 /* number of rows and columns in matrix */
MPI_Status status;
double a[N][N],b[N][N],c[N][N];
int main(int argc, char **argv)
{
double start, finish, p;
int
numtasks,taskid,numworkers,source,dest,rows,offset,i,j,k,remainPar
t,originalRows;
//struct timeval start, stop;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &taskid);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
numworkers = numtasks-1;
start = MPI_Wtime();
if (taskid == 0) {
for (i=0; i<N; i++) {
for (j=0; j<N; j++) {
a[i][j]= 1.25;
b[i][j]= 2.25;
}
85
}
//gettimeofday(&start, 0);
/* send matrix data to the worker tasks */
rows = N/numworkers;
offset = 0;
remainPart = N%numworkers;
for (dest=1; dest<=numworkers; dest++)
{
if (remainPart > 0)
{
originalRows = rows;
++rows;
remainPart--;
MPI_Send(&offset, 1, MPI_INT, dest, 1, MPI_COMM_WORLD);
MPI_Send(&rows, 1, MPI_INT, dest, 1, MPI_COMM_WORLD);
MPI_Send(&a[offset][0], rows*N, MPI_DOUBLE,dest,1,
MPI_COMM_WORLD);
MPI_Send(&b, N*N, MPI_DOUBLE, dest, 1, MPI_COMM_WORLD);
offset = offset + rows;
rows = originalRows;
}
else
{
MPI_Send(&offset, 1, MPI_INT, dest, 1, MPI_COMM_WORLD);
MPI_Send(&rows, 1, MPI_INT, dest, 1, MPI_COMM_WORLD);
MPI_Send(&a[offset][0], rows*N, MPI_DOUBLE,dest,1,
MPI_COMM_WORLD);
MPI_Send(&b, N*N, MPI_DOUBLE, dest, 1, MPI_COMM_WORLD);
offset = offset + rows;
}
}
/* wait for results from all worker tasks */
for (i=1; i<=numworkers; i++)
{
source = i;
MPI_Recv(&offset, 1, MPI_INT, source, 2, MPI_COMM_WORLD,
&status);
MPI_Recv(&rows, 1, MPI_INT, source, 2, MPI_COMM_WORLD,
&status);
MPI_Recv(&c[offset][0], rows*N, MPI_DOUBLE, source, 2,
MPI_COMM_WORLD, &status);
}
}
if (taskid > 0) {
source = 0;
MPI_Recv(&offset, 1, MPI_INT, source, 1, MPI_COMM_WORLD,
&status);
MPI_Recv(&rows, 1, MPI_INT, source, 1, MPI_COMM_WORLD,
&status);
MPI_Recv(&a, rows*N, MPI_DOUBLE, source, 1, MPI_COMM_WORLD,
&status);
MPI_Recv(&b, N*N, MPI_DOUBLE, source, 1, MPI_COMM_WORLD,
&status);
86
/* Matrix multiplication */
for (k=0; k<N; k++)
for (i=0; i<rows; i++) {
c[i][k] = 0.0;
for (j=0; j<N; j++)
c[i][k] = c[i][k] + a[i][j] * b[j][k];
}
MPI_Send(&offset, 1, MPI_INT, 0, 2, MPI_COMM_WORLD);
MPI_Send(&rows, 1, MPI_INT, 0, 2, MPI_COMM_WORLD);
MPI_Send(&c, rows*N, MPI_DOUBLE, 0, 2, MPI_COMM_WORLD);
}
finish = MPI_Wtime();
if (taskid == 0)
{
p = finish - start;
printf(" Parallel : %.8f\n", p);
}
MPI_Finalize();
}
Gauss Jordan Elimination
#include <stdlib.h>
#include <stdio.h>
#include <iostream>
#include "mpi.h"
double serial_gaussian( double *A, double *b, double *y, int n )
{
int i, j, k;
double tstart = MPI_Wtime();
for( k=0; k<n; k++ ) {
for( j=k+1; j<n; j++ ) {
if( A[k*n+k] != 0)
A[k*n+j] = A[k*n+j] / A[k*n+k];
else
A[k*n+j] = 0;
}
if( A[k*n+k] != 0 )
y[k] = b[k] / A[k*n+k];
else
y[k] = 0.0;
87
A[k*n+k] = 1.0;
for( i=k+1; i<n; i++ ) {
for( j=k+1; j<n; j++ )
A[i*n+j] -= A[i*n+k] * A[k*n+j];
b[i] -= A[i*n+k] * y[k];
A[i*n+k] = 0.0;
}
}
return tstart;
}
void print_equations( double *A, double *y, int n )
{
int i, j;
for( i=0; i<n; i++ ) {
for( j=0; j<n; j++ ) {
if( A[i*n+j] != 0 ) {
std::cout << A[i*n+j] << "x" << j;
if( j<n-1 ) std::cout << " + ";
}
else
std::cout << " ";
}
std::cout << " = " << y[i] << std::endl;
}
}
int main( int argc, char *argv[] )
{
double *A, *b, *y, *a, *tmp, *final_y; // var decls
int i, j, n, row, r;
double tstart, tfinish, TotalTime; // timing decls
float aa = 5.0;
if( argc < 2 ) {
std::cout << "Usage\n";
std::cout << " Arg1 = number of equations / unkowns\n";
return -1;
}
n = atoi(argv[1]);
A = new double[n*n]; // space for matricies
b = new double[n];
y = new double[n];
for( i=0; i<n; i++ ) { // creates a matrix of random
b[i] = 0.0;
for( j=0; j<n; j++ ) {
r = ((float)rand()/(float)(RAND_MAX)) * aa;
A[i*n+j] = r;
b[i] += j*r;
88
}
}
MPI_Init (&argc,&argv); // Initialize MPI
MPI_Comm com = MPI_COMM_WORLD;
int size,rank; // Get rank/size info
MPI_Comm_size(com,&size);
MPI_Comm_rank(com,&rank);
int manager = (rank == 0);
if (size == 1)
tstart = serial_gaussian ( A, b, y, n);
else
{
if ( ( n % size ) != 0 )
{
std::cout << "Unknowns must be multiple of processors." <<
std::endl;
return -1;
}
int np = (int) n/size;
a = new double[n*np];
tmp = new double[n*np];
if ( manager )
{
tstart = MPI_Wtime();
final_y = new double[n];
}
MPI_Scatter(A,n*np,MPI_INT,a,n*np,MPI_INT,0,com);
for ( i=0; i < (rank*np); i++ )
{
MPI_Bcast(tmp,n,MPI_INT,i/np,com);
MPI_Bcast(&(y[i]),1,MPI_INT,i/np,com);
for (row=0; row<np; row++)
{
for ( j=i+1; j<n; j++ )
a[row*n+j] = a[row*n+j] - a[row*n+i]*tmp[j];
b[rank*np+row] = b[rank*np+row] - a[row*n+i]*y[i];
a[row*n+i] = 0;
}
}
for (row=0; row<np; row++)
{
for ( j=rank*np+row+1; j < n ; j++ )
89
{
a[row*n+j] = a[row*n+j] / a[row*n+np*rank+row];
}
y[rank*np+row] = b[rank*np+row] / a[row*n+rank*np+row];
a[row*n+rank*np+row] = 1;
for ( i=0; i<n ; i++ )
tmp[i] = a[row*n+i];
MPI_Bcast (tmp,n,MPI_INT,rank,com);
MPI_Bcast (&(y[rank*np+row]),1,MPI_INT,rank,com);
for ( i=row+1; i<np; i++)
{
for ( j=rank*np+row+1; j<n; j++ )
a[i*n+j] = a[i*n+j] - a[i*n+row+rank*np]*tmp[j];
b[rank*np+i] = b[rank*np+i] -
a[i*n+row+rank*np]*y[rank*np+row];
a[i*n+row+rank*np] = 0;
}
}
for (i=(rank+1)*np ; i<n ; i++)
{
MPI_Bcast (tmp,n,MPI_INT,i/np,com);
MPI_Bcast (&(y[i]),1,MPI_INT,i/np,com);
}
MPI_Barrier(com);
MPI_Gather(a,n*np,MPI_INT,A,n*np,MPI_INT,0,com);
MPI_Gather(&(y[rank*np]),np,MPI_INT,final_y,np,MPI_INT,0,com);
y = final_y;
}
if (manager || (size==1) )
{
tfinish = MPI_Wtime();
TotalTime = tfinish - tstart;
printf("%f",TotalTime);
std::cout << std::endl;
}
MPI_Finalize();
}
top related