7.04.008 Datahandling and presentation Masterseminar, 8.04.08 Frode Volden Themes Ethics Data loss and security Software Data presentation and analyses Respekt og redelighet Forskeren har et ansvar for å respektere andres forskningsresultater og å utøve god henvisningsetikk. Dette innebærer at forskere: Ikke tolererer plagiering av forskning. Gir balanserte og sannferdige fremstillinger av andres forskning. Ikke aksepterer vitenskapelig uredelighet, verken som forfalskning, manipulasjon eller selektiv beretning av data av egen eller andres forskning. Gjør data tilgjengelig for andre til etterprøving i en periode. Sitat: Høringsutkastet utarbeidet av den nasjonale forskningsetiske komité for naturvitenskap og teknologi (NENT), 9. november 005
7.04.008 Interessekonflikter Forskere som er tilknyttet f. eks politiske eller religiøse interesser og forskere som påtar seg oppdrag fra industri eller myndigheter, kan være med på å skapeusikkerhetomkring forhold som kan ha påvirket forskningens resultater. Ved minste tvil: beskriv mulig interessekonflikt i arbeidet (re) traceable data Data are collected and analysed, but.. What did the different datapoints/variables mean? Already manipulated and difficult to analyzeagainagain or in new ways Keep original data Keep track of the analyses you have performed Everything should be possible to reconstruct CONSTRUCTING DATA COLLECTION FORMS One column for each variable ID Gender Grade Building Reading Score 3 4 5 8 8 4 0 55 4 4 5 45 Mathematics Score 0 44 37 59 3 One row for each subject
7.04.008 Data If you have survey data: Make sure that individual sheets/data can be identified/retraced Butremember privacy issues Keep: Variable names and labels Value labels Data analyses and data reduction Data analyses is often to combine or recalculate variables and values. Cleaning Indexes cultivation of effects But we tend to forget what we have done Software Spreadsheet programs often have a many statistical procedures included Mathlab Dedicated d statistical i programs: SPSS SS+ + 3
7.04.008 Frequency Distributions First step in organization of data Can see how the scores are distributed Illustrate relationships between variables in a cross tabulation Simplify distributions by using a grouped frequency distribution Descriptive Statistics Are used to describe the data Many types of descriptive statistics Frequency distributions Summary measures Graphical representations of the data A way to visualize the data The first step in any statistical analysis Cross Tabulation Example Males Females Total Yes 4 5 9 No 7 Don t know 7 8 Total 7 7 4 4
7.04.008 Graphing Data Visual displays are often easier to comprehend Obvious differences and trends may be obvious Different graphs Histograms Frequency bar charts Histograms A bar graph, as shown at Sample Histogram the right Can be used to graph 0 either 50 Data representing discrete 40 categories Freq 30 Data representing scores 0 from a continuous variable 0 0 3 4 5 Scores Sample Figure 00 80 0 % 40 0 0 Percent reporting improvement during the week Week Week Week 3 Week 4 Cognitive Behavioral Analytic 5
7.04.008 Measuring Variability Range: lowest to highest score Average Deviation: average distance from the mean Variance: average squared distance from the mean Used in later inferential statistics Standard Deviation: square root of variance Illustrating uncertainties Measures of Relationship Pearson product moment correlation Used with interval or ratio data Spearman rank order correlation Used when one variable is ordinal and the second is at least ordinal Scatter plots Visual representation of a correlation Helps to identify nonlinear relationships
7.04.008 INTERPRETING THE PEARSON CORRELATION COEFFICIENT Eyeball method Correlations between Are said to be ±.8 and.0 Very strong ±. and.8 Strong ±.4 and. Moderate ±. and.4 Weak ± 0 and. Very weak Regression Using a correlation to predict one variable from knowing the score on the other variable Usually a linear regression (finding the best fitting straight line for the data) But also methods to check for nonlinear relationships Best illustrated in a scatter plot with the regression line also plotted 7
7.04.008 Reporting Statistics Reporting t tests Group A was significantly slower than Group B, t (38) = 3., p <.0. Reporting ANOVAs There was a significant main effect for training, F (, 3) = 4.0, p <.05. Reporting chi squares Boys were significantly more likely to drop out of the program than girls, X (, N=50) = 7.9, p <.0. Other issues Statistics objectify evaluations, but do not guarantee correct decisions id i d hi h l l f Avoid reporting data on a to high level of preciscion 8