

Today researchers across a wide variety of fields find themselves having to analyse an increasing amount of qualitative information. The objective of this summer school therefore, is to provide participants the requisite toolkit necessary for the successful planning, conducting and subsequent statistical analysis of qualitative text. To this end, an overview of the following methodologies: qualitative analysis, quantitative content analysis and text mining, to text analysis is provided. The opening sessions focus on the fundamental role of data preparation to the analysis, before moving on to identifying themes and correlations using both the text mining and content analysis approach. The final sessions address the more advanced topics of importing and exporting data, together with document classification.
In common with TStat’s training philosophy, the summer school takes very much a hands-on approach to qualitative and quantitative text analysis. Each individual session is composed of both a theoretical component (in which the techniques and underlying principles behind them are explained), and an applied (hands-on) segment, during which participants have the opportunity to implement the techniques using real data under the watchful eye of the course tutor. Theoretical sessions are reinforced by case study examples, in which the course tutor discusses and highlights potential pitfalls and the advantages of individual techniques. The intuition behind the choice and implementation of a specific technique is of the utmost importance. In this manner, the course leader is able to bridge the “often difficult” gap between abstract theoretical methodologies, and the practical issues one encounters when conducting text analysis on real data. Throughout the course the applied sessions are carried out using Provalis Research’s QDA Miner, WordStat and SimStat text analysis software. WordStat is a flexible text analysis software, offering both text mining tools for fast extraction of themes and trends and state-of-the-art quantitative content analysis tools, which in conjunction with SimStat (Provalis Research’s statistical data analysis tool) and QDA Miner (for qualitative data analysis) offer users an extremely powerful and flexible integrated toolkit for qualitative and quantitative text analysis.
At the end of the summer school participants are expected to be in a position to autonomously implement, with the aid of the routines utilized during the sessions, the theories and methodologies discussed during the course of the week. In particular, participants should be in able to identify the type of data required for their specific research topic; evaluate which methodology is more appropriate for the analysis in hand; and finally test the appropriateness and sensitivity of their estimated model and the robustness of the results obtained.
The summer school is aimed at:
- academic researchers, evaluators, policy advisers, social workers, educators and students working in economics, public health, sociology, psychology and political science;
- data mining and market research analysts based in the automotive, market research, logistics or transportation, telecommunications sectors, needing to analyse comments from surveys, blogs, websites, social media platforms and other textual format sources;
- insurance analysts looking to analyse and categorize claims from customers;
- researchers based in pharmaceutical companies and medical research laboratories required to analyse healthcare reports, notes from medical doctors, interviews and/or focus groups with patients.
SETTING THE SCENE
SESSION I: THREE APPROACHES TO TEXT ANALYSIS
- Qualitative Analysis
- Quantitative Content Analysis
- Text Mining
SESSION II: QDA MINER AND WORDSTAT – A BRIEF OVERVIEW
QDA MINER
- Introduction and project management
- Codebook management and manual coding
- Security features and text retrieval tools
- Coding Frequency and Retrieval
- Code co-occurrence and case similarity analysis
- Assessing relationship between coding and variables
- Using the Report Manager and the Command Log
- Performing teamwork
- Miscellaneous Functions
WORDSTAT
- Content Analysis or Text Mining
- Analyzing words without dictionaries – a text mining approach
- Content Analysis – Principles of dictionary construction
- Importing and exporting data
- Introduction to automatic document classification
QDA MINER
SESSION I: INTRODUCTION AND PROJECT MANAGEMENT
- Introduction to CAQDAS using QDA Miner
- The CASE x VARIABLE file structure
- The Mixed-Method approach
- Quick overview of the work environment
- The four windows – CASE, VARIABLES, CODES, and DOCUMENT
- The menu system
- Creating of a new project
- Creating a new project from a list of documents
- Creating a new project from an existing data file
- Creating an empty project / defining structure
- Using the document conversion wizard
- Customizing and personalizing the project
- The PROJECT | PROPERTIES dialog
- The PROJECT | NOTES command
- Manipulating variables
- Adding a variable
- Deleting a variable
- Changing the variable data type
- Recoding the values of a variable
- Reordering variables
- Changing variable properties
- Manipulating cases
- Add a new case
- Deleting cases
- Importing new documents in new cases
- Changing the case grouping and description
SESSION II:CODEBOOK MANAGEMENT AND MANUAL CODING
- Creating codes and managing the codebook
- Creating codes and categories
- Modifying an existing code
- Delete existing codes
- Moving codes in the codebook
- Merging codes in the codebook
- Splitting codes in the codebook
- Importing an existing codebook
- Manual coding of documents (versus autocoding)
- The four basic methods for assigning codes to text segments:
- Highlight text segment then drag a code
- Highlight text segment then double-click a code
- Highlight text segment then select code and button (toolbar)
- Drag and drop a code over a paragraph (or a sentence – press ALT)
- Assignment of multiple codes to the same segment (press CTRL)
- Modifying existing coding
- Working with code marks
- Viewing coding information
- Adding a comment to a coding
- Remove a coding
- Change the code assigned to a text segment
- Resizing a segment
- Consolidating codes
- Searching and replacing codes
- Hiding code marks
- Highlighting coded segments
SESSION III: SECURITY FEATURES AND TEXT RETRIEVAL TOOLS
- Using backup features
- Creating a permanent backup
- Restoring a backup
- Using the temporary session backup
- Text retrieval tools (4)
- Searching for text
- Performing a simple text search
- Performing a complex text search (using Boolean and wildcard
- Performing a thesaurus search
- Using the “search hits” table
- Performing manual coding and autocoding
- Saving to disk or printing the table
- Retrieving sections in structured documents
- Performing a query by example
- Finding text similar to a sample text segment
- Providing relevance feedback to improve search results
- Finding text similar to specific coded segments
- Performing a “fuzzy string matching”
- Performing a keyword search
- Assigning keywords to codes
- Performing a keyword retrieval on internal codes
- Performing a keyword retrieval on WordStat dictionary files
SESSION IV: CODING FREQUENCY AND RETRIEVAL
- Coding frequency
- Creating a frequency list of all codes
- Creating a barchart or a pie chart on selected codes
- Customizing the chart
- Coding Retrieval
- Performing a simple coding retrieval
- Performing a complex search
- Creating a text report
- Creating a new project from
- A shortcut for simple coding retrieval
- Saving and Retrieving Queries
- Retrieving a list of comments
SESSION V: CODE CO-OCCURRENCE AND CASE SIMILARITY ANALYSIS
- Analyzing codes co-occurrences
- Hierarchical clustering of codes
- 2D and 3D multidimensional scaling plots
- Using the Proximity plots
- Assessing similarity of cases
- Analyzing code sequences
- Choosing codes and setting minimum / maximum distances
- Using the Sequence matrix
- Searching and coding specific sequences
SESSION VI: ASSESSING RELATIONSHIP BETWEEN CODING AND VARIABLES
- Analyzing coding by variables
- Crosstabulating coding frequency by variables
- Setting the content and format of the table
- Computing correlation or comparison statistics
- Comparing frequencies using barcharts or line charts
- Creating and interpreting 2D and 3D correspondence plots
- Creating and interpreting heatmaps
- A quick overview of graphic coding features
SESSION VII: USING THE REPORT MANAGER AND THE COMMAND LOG
- Using the Report Manager
- Accessing the Report Manager
- The Report Manager interface
- Appending tables, graphics and quotes
- Moving and organizing items using the table of content
- Editing existing items / adding comments
- Adding empty documents or folders and deleting existing items
- Importing documents, images or tables
- Searching and replacing text
- Exporting results to HTML, Word or RTF files
- Using the Command Log
- Introduction to the command log – Filtering log entries
- Adding comments to log entries
- Undoing previously performed operations
- Repeating previously performed operations
- Exporting the log table to disk
SESSION VIII: PERFORMING TEAMWORK
- Preparing projects for teamwork
- Creating user accounts and setting privileges
- Creating new accounts
- Defining users access rights
- Forcing users to log in
- Creating duplicate copies of a project
- Sending a project by email
- Merging projects and assessing coding reliability
- Merging two or more projects
- Planning teamwork for assessing coding agreement
- Adjusting colors of code marks
- Computing coding agreement
- The codebook and segmentation problems
- Four levels of agreement
- Presence or absence (0 or 1)
- Frequency (0, 1, 2, etc.)
- Coding importance (% of words)
- Coding overlap
- Correcting (or not) for chance agreement
- Identifying disagreements
WORDSTAT
SESSION IX: BASIC WORD STATISTICS AND TEXT MINING
- Content Analysis or Text Mining
- Running WordStat from QDA Miner or Simstat
- Analyzing words without dictionaries – a text mining
- approach
- Data preparation – misspelling and control characters
- Basic word frequency analysis
- Application of text pre-processing methods
- Exclusion list – use with care
- Lemmatization and stemming – limits and benefits
- Setting upper and lower frequency criteria
- A few additional options
- Numeric and other non-alphabetic characters Braces and 13. square brackets
- Random sampling
- Using disk or memory as the working space
- Identifying themes using word co-occurrence analysis
- Clustering words and measuring their proximity
- Clustering documents based on the words they contains
- Correlation and comparison analysis based on word usage
- Performing crosstabs and computing statistics
- Comparing words among the sources (document or text variables)
- Correspondence analysis and heatmaps
SESSION X: CONTENT ANALYSIS PRINCIPLES OF DICTIONARY CONSTRUCTION
- Introduction to WordStat categorization dictionary
- Dictionary structure and functions
- Opening, saving, and creating categorization dictionaries
- Creating manually categories of words and phrases
- Principles of dictionary construction – Extracting features
- Identification of technical terms and proper names (persons, places, products)
- Identification of common misspellings
- Extracting phrases
- Creating an initial dictionary – Phrases technical terms and proper nouns words
- Adding words manually
- Adding words from tables Using the drag and drop editor
- Organizing the dictionary (drag and drop)
- Applying the dictionary
- Setting different levels
- Mixing dictionaries with words
- Validating the dictionary
- Finding words or phrases with improper meanings using the KWIC list
- WordStat evaluation order – how to use this at your advantage
- Disambiguation methods
- Manual disambiguation Disambiguation using phrases
- Disambiguation using rules
- Improving categorization dictionaries
- Creating comprehensive dictionaries using the Suggest button.
- Assessing coverage using the keyword retrieval feature
SESSION XI: ADVANCED FEATURES
- Importing and exporting data
- Exportation of frequency data
We are currently adding the finishing touches to our 2023 training calendar. We therefore ask you to check our website regularly or contact us at training@tstat.eu should the dates for the course you are interested in not be published yet. You will then be contacted via email as soon as the dates are available.
Today researchers across a wide variety of fields find themselves having to analyse an increasing amount of qualitative information. The objective of this summer school therefore, is to provide participants the requisite toolkit necessary for the successful planning, conducting and subsequent statistical analysis of qualitative text. To this end, an overview of the following methodologies: qualitative analysis, quantitative content analysis and text mining, to text analysis is provided. The opening sessions focus on the fundamental role of data preparation to the analysis, before moving on to identifying themes and correlations using both the text mining and content analysis approach. The final sessions address the more advanced topics of importing and exporting data, together with document classification.