435x Filetype PPTX File size 0.08 MB Source: web.njit.edu
Chi-square test
• The chi-square test is a popular feature selection
method when we have categorical data and
classification labels as opposed to regression
• In a feature selection context we would apply
the chi-square test to each feature and rank
them chi-square values (or p-values)
• A parallel solution is to calculate chi-square for
all features in parallel at the same time as
opposed to one at a time if done serially
Chi-square test
Contingency table
• We have two random variables:
– Label (L): 0 or 1 Feature=A Feature=B
– Feature (F): Categorical
• Null hypothesis: the two variables are
independent of each other (unrelated) Label=0 Observed=c1 Observed=c2
• Under independence Expected=X1 Expected=X2
– P(L,F)= P(D)P(G)
– P(L=0) = (c1+c2)/n Label=1 Observed=c3 Observed=c4
– P(F=A) = (c1+c3)/n Expected=X3 Expected=X4
• Expected values
– E(X1) = P(L=0)P(F=A)n
• We can calculate the chi-square statistic for a
given feature and the probability that it is d-1 2
independent of the label (using the p-value). 2 (c - x)
• Features with very small probabilities deviate c = i i
significantly from the independence assumption å x
and therefore considered important. i=0 i
Parallel GPU implementation of chi-square
test in CUDA
• The key here is to organize the data to enable
coalescent memory access
• We define a kernel function that computes the chi-
square value for a given feature
• The CUDA architecture automatically distributes the
kernel across different GPU cores to be processed
simultaneously.
no reviews yet
Please Login to review.