Monday 28 April 2014

Text Mining

Peter Bruce, President and Founder of Statistics.com, says, as I read about the frenzy of interest in mining Twitter data, I think of the 1949 Gold Rush, which boosted the settler population of California from under 1,000 to over 100,000 in two years.  Most miners made little money, but suppliers of tools and owners of technology did quite well.

"Text Mining" will introduce the essential techniques of text mining, understood here as the extension of data mining's standard predictive methods to unstructured text. This course will discuss these standard techniques, and will devote considerable attention to the data preparation and handling methods that are required to transform unstructured text into a form in which it can be mined.

Learn about text mining techniques in Nitin Indurkhya's online course, "Text Mining," at statistics.com. For more details please visit http://www.statistics.com/textmining.

Who can take this course?
IT professionals, web marketing analysts, data mining and statistical consultants. In general: analysts and researchers who need to pilot, implement or analyze data mining methods aimed at data containing unstructured text (forms, surveys, etc.).

Course Program:
Course outline: The course is structured as follows

 

SESSION 1: Introduction and Data preparation
  • Overview of text mining
  • Tokenization
  • Dictionary creation
  • Vector generation for prediction
  • Feature generation and selection
  • Parsing

SESSION 2: Predictive Models for Text
  • Document classification
  • Document similarity and nearest-neighbor
  • Decision rules
  • Probabilistic models
  • Linear models
  • Performance evaluation
  • Applications

SESSION 3: Retrieval and Clustering of Documents
  • Measuring similarity for retrieval
  • Web-based document search and link analysis
  • Document matching
  • Clustering by similarity
  • k-means clustering
  • Hierarchical clustering
  • The EM algorithm for clustering
  • Evaluation of clustering

SESSION 4: Information Extraction
  • Goals of information extraction
  • Finding patterns and entities
  • Entity Extraction: The Maximum Entropy method
  • Template filling
  • Applications

HOMEWORK:
Homework in this course consists of short answer questions to test concepts and guided data analysis problems using software. In addition to assigned readings, this course also has a get started guide, and supplemental readings available online.

Software:
Python is used in the course.

Instructor Dr. Nitin Indurkhya, co-author of "Text Mining" (Springer) and co-editor of the "Handbook of Natural Language Processing" (CRC), was also Principal Research Scientist at eBay and Professor at the School of Computer Science and Engineering, University of New South Wales (Australia), as well as the founder and president of Data-Miner Pty Ltd, an Australian company engaged in data-mining consulting and education.

You will be able to ask questions and exchange comments with the instructors via a private discussion board throughout the course.   The courses take place online at statistics.com in a series of 4 weekly lessons and assignments, and require about 15 hours/week.  Participate at your own convenience; there are no set times when you must be online. You have the flexibility to work a bit every day, if that is your preference, or concentrate your work in just a couple of days.

We, the Center for eLearning and Training (C-eLT), Pune, partner with Statistics.com and offer these courses to Indian participants at special prices payable in INR.

For India Registration and pricing, please visit us at www.india.statistics.com.

Call: 020 6680 0300 / 322

Websites:

Friday 18 April 2014

Sample Size and Power Determination

"How many subjects do I need?" is a common question from researchers.  "It depends" is the common, unsatisfactory, response.  Learn what the answer depends on in Thomas Ryan's online course, “Sample Size and Power Determination,” at Statistics.com.
For more details please visit at http://www.statistics.com/samplesize/.   

"Sample Size and Power Determination" covers how to plan the appropriate sample size for a study, striking the optimal balance of feasible sample size, reasonable assumptions, and acceptable power. The power of a study (the study's ability to prove a treatment effect exists) is determined by such factors as the magnitude of the treatment effect, the sample size, alpha (the level of statistical significance required), and (for survival studies) the study duration.

Since some of these factors are under the researcher's control while others are not, the goal of power analysis is to balance them as a series of "What if's." For example "What sample size would we need if the treatment reduces the risk of death by 10%, and what sample size would we need if the treatment reduces the risk of death by 20%?" Or, "How would power be affected if the study followed patients for two years rather than three?" This process of finding a balance among factors is done most effectively with graphs that allow the researcher to grasp (and communicate) a range of options in a single picture, and find the one that strikes the optimal balance of feasible sample size, reasonable assumptions, and acceptable power.  Illustrations include examples from means, proportions, correlations, and survival analysis, and possibly from other procedures as well.

Course Program:
Course outline: The course is structured as follows

SESSION 1: Introduction to Sample Size Determination and Power, Including Useful Software
  • Hypothesis tests and confidence intervals
  • Factors that determine sample size
  • Sample size for estimating a population mean
  • Examples, including a study from the literature
  • External and internal pilot studies
  • Ways to estimate sigma
  • What should be avoided:  Retrospective power and standardized effect sizes
  • Ethical issues in power analysis
  • Recommended references
  • Software
SESSION 2: Tests on Population Means (continued)
  • T-Test or Z-Test for population mean?
  • Testing the normality assumption
  • Confidence Intervals on Power and/or Sample Size?
  • Two-sample study from the literature with unequal sample sizes
    • Sample sizes determined by scientist in two stages without software
    • Illustration of more efficient sample determination using software
  • Using coefficient of variation
  • Paired data
  • Additional examples

SESSION 3: Tests on Proportions and Variances
  • One proportion    
    • Software disagreement and rectification
  • Two proportions
  • Options, including transformations built into software, for tests of proportions
  • One variance  and two variances
  • Examples
 SESSION 4: Regression and Design
  • Simple linear regression
    • Complexity caused by what must be inputted
  • Multiple linear regression
  • Optional material:  Repeated measures designs, Logrank test for survival analysis
  • Literature references for sample size determination with more advanced
    statistical methods and some information on corresponding software
    capability
     
HOMEWORK:
Homework in this course consists of short answer questions to test concepts and guided data analysis problems using software.
In addition to assigned readings, this course also has supplemental readings available online

Who Should Take This Course?
Anyone responsible for the planning of a study, or its subsequent analysis. Investigators writing grant applications or other proposals in which sample size must be specified.

The instructor, Dr. Thomas P. Ryan, is the author of "Sample Size and Power Determination" (Wiley, 2103), a number of other books, plus numerous papers in peer-reviewed journals.  He is an elected Fellow of the American Statistical Association, American Society for Quality, and Royal Statistical Society. Participants can ask questions and exchange comments with Dr. Ryan via a private discussion board throughout the period of the course.

You will be able to ask questions and exchange comments with the instructors via a private discussion board throughout the course.   The courses take place online at statistics.com in a series of 4 weekly lessons and assignments, and require about 15 hours/week.  Participate at your own convenience; there are no set times when you must be online. You have the flexibility to work a bit every day, if that is your preference, or concentrate your work in just a couple of days.

We, the Center for eLearning and Training (C-eLT), Pune, partner with Statistics.com and offer these courses to Indian participants at special prices payable in INR.

For India Registration and pricing, please visit us at www.india.statistics.com.

Call: 020 6680 0300 / 322

Websites:

Wednesday 16 April 2014

Practical Rasch Measurement - Core Topics

Surveys and tests produce prodigious quantities of data for diagnostic and assessment purposes, but how useful is that data for decision-making?  Rasch models help you optimize the utility of quantitative information from surveys and exams. Learn what the Rasch model involves by taking Dr. Everett V. Smith’s online course "Practical Rasch Measurement - Core Topics". For more details please visit at http://www.statistics.com/rasch1.

Rasch analysis constructs linear measures from scored observations, such as responses to multiple-choice questions, Likert scales and quality-of-life assessments. This course covers the practical aspects of data setup, analysis, output interpretation, fit analysis, differential item functioning, dimensionality and reporting. Simple test linking and equating designs are addressed. Supporting theory is presented conceptually.
 
Who can take this course:
Survey researchers, social scientists who use surveys or questionnaires in their work, education assessment analysts and managers.

Course Program:
Course outline: The course is structured as follows

SESSION 1: Basic Concepts and Operations
  • Winsteps software installation and operation
  • Basic measurement and Rasch concepts
  • Simple dichotomous analysis
  • Constructing data files

SESSION 2: Fit Analysis and Measurement Models
  • Rasch-Andrich Rating Scale Model
  • Quality-control fit statistics
  • Scalograms

SESSION 3: Rating Scales, Reliability and Anchoring
  • Partial Credit Model
  • Category Description
  • Standard errors and Reliability
  • Anchoring

SESSION 4: DTF, DIF, and dimensionality
  • Differential Test Functioning
  • Differential Item Functioning
  • Investigating Dimensionality

HOMEWORK:
Homework in this course consists of short answer questions to test concepts and end of course project.
In addition to assigned readings, this course also has supplemental readings available online, and an end of course data modeling project.

The Instructor, Dr. Everett V. Smith Jr., Professor of Educational Psychology at the University of Illinois at Chicago, is co-editor of “Introduction to Rasch Measurement: Theory, Models, and Applications” (2004), “Rasch Measurement: Advanced and Specialized Applications” (2007), and “Criterion-Reference Testing: Practice Analysis to Score Reporting using Rasch Measurement Models” (2009). He also serves as the Associate Editor for the “Journal of Applied Measurement” and is on the editorial boards of “Educational and Psychological Measurement and the Journal of Nursing Measurement.” Participants can ask questions and exchange comments directly with Dr. Smith via a private discussion board throughout the period.

You will be able to ask questions and exchange comments with the instructors via a private discussion board throughout the course.   The courses take place online at statistics.com in a series of 4 weekly lessons and assignments, and require about 15 hours/week.  Participate at your own convenience; there are no set times when you must be online. You have the flexibility to work a bit every day, if that is your preference, or concentrate your work in just a couple of days.

We, the Center for eLearning and Training (C-eLT), Pune, partner with Statistics.com and offer these courses to Indian participants at special prices payable in INR.

For India Registration and pricing, please visit us at www.india.statistics.com.

Call: 020 6680 0300 / 322

Websites:

Wednesday 9 April 2014

Biostatistics in R: Clinical Trial Applications

Would you like to learn how to use R to compare treatments, incorporate covariates into the analysis, analyse survival (time-to-event) trials, model longitudinal data, and analysis of bioequivalence trials.

Would you like to learn the implementation in R of statistical procedures important for the clinical trial statistician?

Learn all this and more in Prof. Din Chen and Prof. Karl Peace’s online course Biostatistics in R: Clinical Trial Applications at Statistics.com.
For more detail please visit at http://www.statistics.com/Clinical-Trials-R.

Course Program:

Course outline: The course is structured as follows
WEEK 1: Treatment Comparisons
·         R fundamentals associated with clinical trials
·         A simple simulated clinical trial
·         Statistical models for treatment comparisons
·         Incorporating covariates

WEEK 2: Survival Analysis
·         Time-to-event data structure
·         Statistical models for survival data
·         Right-censored data analysis
·         Interval-censored data analysis

WEEK 3: Analysis of Data from Longitudinal Clinical Trials
·         Trial designs and data structure
·         Statistical models and analysis

WEEK 4: Analysis of Bioequivalence Clinical Trials
·         Data from bioequivalence clinical trials
·         Bioequivalence clinical trial endpoints
·         Statistical methods to analyze bioequivalence

HOMEWORK:
Homework in this course consists of short answer questions to test concepts, guided data analysis problems using software, and guided data modeling problems using software.
In addition to assigned readings, this course also has an end of course data modeling project, example software files, and supplemental readings available online.

Instructors: Prof. Din Chen, Univ. University of Rochester Medical Center, co-author of "Clinical Trial Methodology" and "Clinical Trial Data Analysis Using R," and the author or co-author of 80 refereed articles in scholarly journals.
Prof. Karl E. Peace, Jiann-Ping Hsu College of Public Health at Georgia Southern University, Georgia Cancer Coalition Distinguished Cancer Scholar, founding director of the Center for Biostatistics, and the founder of Biopharmaceutical Research Consultants, Inc. (BRCI), and is Founder and Chair of the Biopharmaceutical Applied Statistics Symposium (BASS). He has contributed heavily to the medical, scientific and statistical literature by authoring or co-authoring over 150 articles and six books.

Who Should Take This Course:
Analysts and statisticians at pharmaceutical companies and other health research organizations who need or want to become involved in the design, monitoring or analysis of clinical trials and who are familiar with R software and considering its use in clinical trials.

You will be able to ask questions and exchange comments with the instructors via a private discussion board throughout the course.   The courses take place online at statistics.com in a series of 4 weekly lessons and assignments, and require about 15 hours/week.  Participate at your own convenience; there are no set times when you must be online. You have the flexibility to work a bit every day, if that is your preference, or concentrate your work in just a couple of days.

We, the Center for eLearning and Training (C-eLT), Pune, partner with Statistics.com and offer these courses to Indian participants at special prices payable in INR.

For India Registration and pricing, please visit us at www.india.statistics.com.

Call: 020 6680 0300 / 322

Websites:

Spatial Statistics with Geographic Information Systems (GIS)

Peter Bruce (President and Founder- Statistics.com) says, “Wandering around Memphis, TN recently, I was able to use my phone to tell me the value of a mansion I was passing by, and locate the hotel famous for letting ducks use the elevator.  I could also have used it to find Graceland, but some tasks still lie well within the realm of the human brain. Location data is now the fastest growing type of data, and its effective use is the province of spatial statistics.”

Learn more about the statistical foundations of geospatial analysis in David Unwin's online course "Spatial Statistics with Geographic Information Systems (GIS)," at Statistics.com. For more detail please check at http://www.statistics.com/spatial-statistics-GIS/.

Course Program:

Course outline: The course is structured as follows

SESSION 1: Some Basics:
·         Geographical data
·         Statistics
·         Describing spatial data using maps

SESSION 2: The Analysis of Patterns in Point Data:
·         Introductory methods for detecting non-randomness in dot/pin map distributions

SESSION 3: The Analysis of Patterns in Area Data:
·         Detecting and measuring spatial autocorrelation in lattice data

SESSION 4: The Analysis of Continuous Field Data:
·         Creating contour-type maps using inverse distance weighting and geostatistical methods

Note that the course does not concentrate on the analysis of spatially continuous data using methods that are collectively referred to as geostatistics. Lesson 4 has a brief introduction to the basic concepts as used in interpolation, but this is all.

HOMEWORK:
In this course the homework is a mixture of some simple exercises and consists of guided data analysis problems using public domain software.
In addition to assigned readings, this course also has an end of course data modeling project, and supplemental video lectures.

The instructor, Dr. David Unwin, is Emeritus Chair of Geography at Birkbeck College, and Visiting Professor in the Department of Geomatic Engineering at University College, both in the University of London. His work using and developing spatial statistics in research stretches back some 40 years, and he has authored over a hundred academic papers in the field, together with a series of texts, of which the most recent are his “Geographic Information Analysis, 2nd edition” (with D. O'Sullivan, 2010) and a series of edited collections at the interface between geography and computer science in “Visualization in GIS” (Hearnshaw and Unwin, 1994), “Spatial Analytical Perspectives on GIS” (Fischer, Scholten and Unwin, 1996) “Virtual Reality in Geography” (Fisher and Unwin, 2002) and, most recently representation issues in “Re-presenting GIS” (Fisher and Unwin, 2005). Having developed the world's first wholly internet-delivered Master's program in GIS in 1998, David Unwin has considerable experience of teaching and tutoring online.  Participants can ask questions and exchange comments directly with Dr. Unwin via a private discussion board during the course.

Aim of the course:
Spatial analysis often uses methods adapted from conventional analysis to address problems in which spatial location is the most important explanatory variable. This course, which is directed particularly to students with backgrounds in either computing or statistics but who lack a background in the necessary geospatial concepts, will explain and give examples of the analysis that can be conducted in a geographic information system such as ArcGIS or Mapinfo. The motivation is simple: it is one thing to run a GIS, but quite another to use it analytically to help answer questions such as:

- Is there an unusual cluster of crimes/cases of a disease here that we need to worry about?
- Do these data show variation across the country that I need to know about?
- What is the most probable air temperature here?

In the course we will explore methods that enable answers to be given to these, and similar, questions involving spatial variation.


Who Should Take This Course?
Analysts and researchers who need to know how to use and interpret the data from Geographic Information Systems (GIS's), including those in environmental analysis and management, banking, insurance, logistics, law enforcement services, defence, media, real estate, retail and more.

You will be able to ask questions and exchange comments with the instructors via a private discussion board throughout the course.   The courses take place online at statistics.com in a series of 4 weekly lessons and assignments, and require about 15 hours/week.  Participate at your own convenience; there are no set times when you must be online. You have the flexibility to work a bit every day, if that is your preference, or concentrate your work in just a couple of days.

We, the Center for eLearning and Training (C-eLT), Pune, partner with Statistics.com and offer these courses to Indian participants at special prices payable in INR.

For India Registration and pricing, please visit us at www.india.statistics.com.

Call: 020 6680 0300 / 322

Websites: