COURSE ID: I-SS14 LANGUAGE:

Summer School | Text Analysis: A Qualitative and Quantitative Approach

Today researchers across a wide variety of fields find themselves having to analyse an increasing amount of qualitative information. The objective of this summer school therefore, is to provide participants the requisite toolkit necessary for the successful planning, conducting and subsequent statistical analysis of qualitative text. To this end, an overview of the following methodologies: qualitative analysis, quantitative content analysis and text mining, to text analysis is provided. The opening sessions focus on the fundamental role of data preparation to the analysis, before moving on to identifying themes and correlations using both the text mining and content analysis approach. The final sessions address the more advanced topics of importing and exporting data, together with document classification.

 

In common with TStat’s training philosophy, the summer school takes very much a hands-on approach to qualitative and quantitative text analysis. Each individual session is composed of both a theoretical component (in which the techniques and underlying principles behind them are explained), and an applied (hands-on) segment, during which participants have the opportunity to implement the techniques using real data under the watchful eye of the course tutor. Theoretical sessions are reinforced by case study examples, in which the course tutor discusses and highlights potential pitfalls and the advantages of individual techniques. The intuition behind the choice and implementation of a specific technique is of the utmost importance. In this manner, the course leader is able to bridge the “often difficult” gap between abstract theoretical methodologies, and the practical issues one encounters when conducting text analysis on real data. Throughout the course the applied sessions are carried out using Provalis Research’s QDA Miner, WordStat and SimStat text analysis software. WordStat is a flexible text analysis software, offering both text mining tools for fast extraction of themes and trends and state-of-the-art quantitative content analysis tools, which in conjunction with SimStat (Provalis Research’s statistical data analysis tool) and QDA Miner (for qualitative data analysis) offer users an extremely powerful and flexible integrated toolkit for qualitative and quantitative text analysis.

 

At the end of the summer school participants are expected to be in a position to autonomously implement, with the aid of the routines utilized during the sessions, the theories and methodologies discussed during the course of the week. In particular, participants should be in able to identify the type of data required for their specific research topic; evaluate which methodology is more appropriate for the analysis in hand; and finally test the appropriateness and sensitivity of their estimated model and the robustness of the results obtained.

The summer school is aimed at:

  • academic researchers, evaluators, policy advisers, social workers, educators and students working in economics, public health, sociology, psychology and political science;
  • data mining and market research analysts based in the automotive, market research, logistics or transportation, telecommunications sectors, needing to analyse comments from surveys, blogs, websites, social media platforms and other textual format sources;
  • insurance analysts looking to analyse and categorize claims from customers;
  • researchers based in pharmaceutical companies and medical research laboratories required to analyse healthcare reports, notes from medical doctors, interviews and/or focus groups with patients.

SETTING THE SCENE 

 

SESSION I: THREE APPROACHES TO TEXT ANALYSIS

  1. Qualitative Analysis
  2. Quantitative Content Analysis
  3. Text Mining

 

SESSION II: QDA MINER AND WORDSTAT  – A BRIEF OVERVIEW

 

   QDA MINER

  1. Introduction and project management
  2. Codebook management and manual coding
  3. Security features and text retrieval tools
  4. Coding Frequency and Retrieval
  5. Code co-occurrence and case similarity analysis
  6. Assessing relationship between coding and variables
  7. Using the Report Manager and the Command Log
  8. Performing teamwork
  9. Miscellaneous Functions

 

   WORDSTAT

  1. Content Analysis or Text Mining
  2. Analyzing words without dictionaries – a text mining approach
  3. Content Analysis – Principles of dictionary construction
  4. Importing and exporting data
  5. Introduction to automatic document classification

 

QDA MINER

 

SESSION I: INTRODUCTION AND PROJECT MANAGEMENT

  1. Introduction to CAQDAS using QDA Miner
  2. The CASE x VARIABLE file structure
  3. The Mixed-Method approach
  4. Quick overview of the work environment
  5. The four windows – CASE, VARIABLES, CODES, and DOCUMENT
  6. The menu system
  7. Creating of a new project
  8. Creating a new project from a list of documents
  9. Creating a new project from an existing data file
  10. Creating an empty project / defining structure
  11. Using the document conversion wizard
  12. Customizing and personalizing the project
  13. The PROJECT | PROPERTIES dialog
  14. The PROJECT | NOTES command
  15. Manipulating variables
  16. Adding a variable
  17. Deleting a variable
  18. Changing the variable data type
  19. Recoding the values of a variable
  20. Reordering variables
  21. Changing variable properties
  22. Manipulating cases
  23. Add a new case
  24. Deleting cases
  25. Importing new documents in new cases
  26. Changing the case grouping and description

 

SESSION II:CODEBOOK MANAGEMENT AND MANUAL CODING

  1. Creating codes and managing the codebook
  2. Creating codes and categories
  3. Modifying an existing code
  4. Delete existing codes
  5. Moving codes in the codebook
  6. Merging codes in the codebook
  7. Splitting codes in the codebook
  8. Importing an existing codebook
  9. Manual coding of documents (versus autocoding)
  10. The four basic methods for assigning codes to text segments:
  11. Highlight text segment then drag a code
  12. Highlight text segment then double-click a code
  13. Highlight text segment then select code and button (toolbar)
  14. Drag and drop a code over a paragraph (or a sentence – press ALT)
  15. Assignment of multiple codes to the same segment (press CTRL)
  16. Modifying existing coding
  17. Working with code marks
  18. Viewing coding information
  19. Adding a comment to a coding
  20. Remove a coding
  21. Change the code assigned to a text segment
  22. Resizing a segment
  23. Consolidating codes
  24. Searching and replacing codes
  25. Hiding code marks
  26. Highlighting coded segments

 

SESSION III: SECURITY FEATURES AND TEXT RETRIEVAL TOOLS

  1. Using backup features
  2. Creating a permanent backup
  3. Restoring a backup
  4. Using the temporary session backup
  5. Text retrieval tools (4)
  6. Searching for text
  7. Performing a simple text search
  8. Performing a complex text search (using Boolean and wildcard
  9. Performing a thesaurus search
  10. Using the “search hits” table
  11. Performing manual coding and autocoding
  12. Saving to disk or printing the table
  13. Retrieving sections in structured documents
  14. Performing a query by example
  15. Finding text similar to a sample text segment
  16. Providing relevance feedback to improve search results
  17. Finding text similar to specific coded segments
  18. Performing a “fuzzy string matching”
  19. Performing a keyword search
  20. Assigning keywords to codes
  21. Performing a keyword retrieval on internal codes
  22. Performing a keyword retrieval on WordStat dictionary files

 

SESSION IV: CODING FREQUENCY AND RETRIEVAL

  1. Coding frequency
  2. Creating a frequency list of all codes
  3. Creating a barchart or a pie chart on selected codes
  4. Customizing the chart
  5. Coding Retrieval
  6. Performing a simple coding retrieval
  7. Performing a complex search
  8. Creating a text report
  9. Creating a new project from
  10. A shortcut for simple coding retrieval
  11. Saving and Retrieving Queries
  12. Retrieving a list of comments

 

SESSION V: CODE CO-OCCURRENCE AND CASE SIMILARITY ANALYSIS

  1. Analyzing codes co-occurrences
  2. Hierarchical clustering of codes
  3. 2D and 3D multidimensional scaling plots
  4. Using the Proximity plots
  5. Assessing similarity of cases
  6. Analyzing code sequences
  7. Choosing codes and setting minimum / maximum distances
  8. Using the Sequence matrix
  9. Searching and coding specific sequences

 

SESSION VI: ASSESSING RELATIONSHIP BETWEEN CODING AND VARIABLES

  1. Analyzing coding by variables
  2. Crosstabulating coding frequency by variables
  3. Setting the content and format of the table
  4. Computing correlation or comparison statistics
  5. Comparing frequencies using barcharts or line charts
  6. Creating and interpreting 2D and 3D correspondence plots
  7. Creating and interpreting heatmaps
  8. A quick overview of graphic coding features

 

SESSION VII: USING THE REPORT MANAGER AND THE COMMAND LOG

  1. Using the Report Manager
  2. Accessing the Report Manager
  3. The Report Manager interface
  4. Appending tables, graphics and quotes
  5. Moving and organizing items using the table of content
  6. Editing existing items / adding comments
  7. Adding empty documents or folders and deleting existing items
  8. Importing documents, images or tables
  9. Searching and replacing text
  10. Exporting results to HTML, Word or RTF files
  11. Using the Command Log
  12. Introduction to the command log – Filtering log entries
  13. Adding comments to log entries
  14. Undoing previously performed operations
  15. Repeating previously performed operations
  16. Exporting the log table to disk

 

SESSION VIII: PERFORMING TEAMWORK

  1. Preparing projects for teamwork
  2. Creating user accounts and setting privileges
  3. Creating new accounts
  4. Defining users access rights
  5. Forcing users to log in
  6. Creating duplicate copies of a project
  7. Sending a project by email
  8. Merging projects and assessing coding reliability
  9. Merging two or more projects
  10. Planning teamwork for assessing coding agreement
  11. Adjusting colors of code marks
  12. Computing coding agreement
  13. The codebook and segmentation problems
  14. Four levels of agreement
  15. Presence or absence (0 or 1)
  16. Frequency (0, 1, 2, etc.)
  17. Coding importance (% of words)
  18. Coding overlap
  19. Correcting (or not) for chance agreement
  20. Identifying disagreements

 

WORDSTAT

 

SESSION IX: BASIC WORD STATISTICS AND TEXT MINING

  1. Content Analysis or Text Mining
  2. Running WordStat from QDA Miner or Simstat
  3. Analyzing words without dictionaries – a text mining
  4. approach
  5. Data preparation – misspelling and control characters
  6. Basic word frequency analysis
  7. Application of text pre-processing methods
  8. Exclusion list – use with care
  9. Lemmatization and stemming – limits and benefits
  10. Setting upper and lower frequency criteria
  11. A few additional options
  12. Numeric and other non-alphabetic characters Braces and 13. square brackets
  13. Random sampling
  14. Using disk or memory as the working space
  15. Identifying themes using word co-occurrence analysis
  16. Clustering words and measuring their proximity
  17. Clustering documents based on the words they contains
  18. Correlation and comparison analysis based on word usage
  19. Performing crosstabs and computing statistics
  20. Comparing words among the sources (document or text variables)
  21. Correspondence analysis and heatmaps

 

SESSION X: CONTENT ANALYSIS PRINCIPLES OF DICTIONARY CONSTRUCTION

  1. Introduction to WordStat categorization dictionary
  2. Dictionary structure and functions
  3. Opening, saving, and creating categorization dictionaries
  4. Creating manually categories of words and phrases
  5. Principles of dictionary construction – Extracting features
  6. Identification of technical terms and proper names (persons, places, products)
  7. Identification of common misspellings
  8. Extracting phrases
  9. Creating an initial dictionary – Phrases technical terms and proper nouns words
  10. Adding words manually
  11. Adding words from tables Using the drag and drop editor
  12. Organizing the dictionary (drag and drop)
  13. Applying the dictionary
  14. Setting different levels
  15. Mixing dictionaries with words
  16. Validating the dictionary
  17. Finding words or phrases with improper meanings using the KWIC list
  18. WordStat evaluation order – how to use this at your advantage
  19. Disambiguation methods
  20. Manual disambiguation Disambiguation using phrases
  21. Disambiguation using rules
  22. Improving categorization dictionaries
  23. Creating comprehensive dictionaries using the Suggest button.
  24. Assessing coverage using the keyword retrieval feature

 

SESSION XI: ADVANCED FEATURES

  1. Importing and exporting data
  2. Exportation of frequency data

We are currently adding the finishing touches to our 2023 training calendar. We therefore ask you to check our website regularly or contact us at training@tstat.eu should the dates for the course you are interested in not be published yet. You will then be contacted via email as soon as the dates are available.

Today researchers across a wide variety of fields find themselves having to analyse an increasing amount of qualitative information. The objective of this summer school therefore, is to provide participants the requisite toolkit necessary for the successful planning, conducting and subsequent statistical analysis of qualitative text. To this end, an overview of the following methodologies: qualitative analysis, quantitative content analysis and text mining, to text analysis is provided. The opening sessions focus on the fundamental role of data preparation to the analysis, before moving on to identifying themes and correlations using both the text mining and content analysis approach. The final sessions address the more advanced topics of importing and exporting data, together with document classification.