Upcoming workshop presentation – ‘XVivo: The case for an open source QDAS’

I will be doing a presentation on the need for qualitative researchers to embrace open source software and my work on Pythia as part of the Urban Studies’ Monday workshops at the University of Glasgow on 26th November.

Abstract:

Qualitative data analysis software (QDAS) has the potential to revolutionise both the scale of qualitative research and the array of possible analysis techniques. Yet currently available software still imposes unnecessary limits that hinder and prevent this full potential from being realised. Additionally, it locks data and the analysis performed on it within proprietary file formats that makes the archiving and sharing of research difficult. Due to similar issues, open source solutions have seen increasing popularity in quantitative research, and it is perhaps time that qualitative researchers joined them. This presentation will therefore discuss both the issues of current proprietary QDAS as well as the potential of open source software for qualitative researchers. To do this, the myriad of issues experienced with NVivo by the Welfare Conditionality project will be used to exemplify the problems created by a reliance on expensive, slow, and poorly designed proprietary software. The second half of the presentation will focus on Pythia, an open source QDAS library written in Python I have been working on. Through covering the design philosophy, current progress, and long-term plans the potential of open source will be highlighted for being able to solve problems with current qualitative software, enable new creative analysis techniques, and allow researchers to reclaim control of their data.

The workshops, as far as I am aware, are open to Urban Studies’ staff and PhD students only. However, as usual I will upload a copy of my presentation slides after the event. Additionally, as part of the preparation for the presentation I will be aiming to write a few short blog posts on the design philosophy of Pythia, elaborate further on why there is a need for an open source QDAS, as well as write-ups and screenshots of progress. Unfortunately, development ground to an absolute halt during the eight months where all my spare time, energy, annual leave, mental health, hopes, dreams, and general will to live were sacrificed at the job hunting altar. I now have around 12 months before that hell begins again, so once I have taken care of the journal article writing backlog that also built up during that time the plan is to filter work on Pythia back into my weekly schedule.

How to use a Word macro to fix interview transcripts for auto-coding in NVivo

Within NVivo, and likely other QDAS packages as well, it is possible to use the structure of interview transcripts for auto-coding. Basically, what auto-coding does is go through the transcript and using criteria specified by the user assigns text to chosen nodes (further explanation of auto-coding and how to do it in NVivo is available on the NVivo help website). This can be useful to separate out the different speakers within a transcript whereby everything they say is coded to a node with their participant code number. Even in one-to-one interviews this can be worth doing so that any word frequency queries, word clouds, etc can be limited to only include sections from the transcripts where a participant is speaking. However, any mistakes in the structure of the interview transcripts can result in them being incorrectly auto-coded. Depending on the extent and nature of the errors this can be a headache to manually fix. This post briefly covers what type of errors can arise and provides a set by step guide to creating a Visual Basic macro within Microsoft Word that can automate the process of fixing the paragraph styles in transcripts so they can be auto-coded without error.

Auto-coding requires transcripts to be structured in particular ways and that this structure remains consistent throughout the whole of the document. When auto-coding one of the easiest ways to structure transcripts is to use headings in the text to signal who is speaking. For example, applying “Heading 1” to “Interviewer” and “Participant” labels before what was said. Such paragraph styles (e.g. Normal, Title, Quote, Heading 1, Heading 2, etc.) can then be used by NVivo as signals for which node to code the text that follows them. The majority of transcribers are happy to produce transcripts based on a specified style. However, within Word it is not always clear what paragraph style is applied to each section of text. Sections that look like plain text can in fact have had a heading style applied to it, then been reformatted to look like plain text but without changing the paragraph style. Similarly, empty lines can still have paragraph styles applied to them, meaning that any which accidentally have a heading style applied can lead to the text following it being coded to the wrong node. Within NVivo this results in a node with the name ‘—’ being created, with any text following an empty line with a heading being coded to it.

Read moreHow to use a Word macro to fix interview transcripts for auto-coding in NVivo

A Qualitative Computing Revolution?

The challenges of data management and analysis on a large longitudinal qualitative research project

Computer aided qualitative data analysis has the potential to revolutionise both the scale of research and possible analysis techniques. Yet, the software itself still imposes limits that hinder and prevent this full potential from being realised. This post looks at the large and complex dataset created as part of the Welfare Conditionality research project, the analytical approach adopted, and the challenges QDAS faces.

The Welfare Conditionality project has two broad research questions in setting out to consider the issues surrounding sanctions, support, and behaviour change. Firstly, is conditionality ‘effective’ – and if so for whom, under what conditions, and by what definition of effective. And, secondly, whether welfare conditionality is ‘ethical’ – how do people justify or criticise its use and for what reasons. To answer these questions, we have undertaken the ambitious task of collecting a trove of qualitative data on conditional forms of welfare. Our work across nine policy areas, each of which has a dedicated ‘policy team’ that is responsible for the research. The policy areas are: unemployed people, Universal Credit claimants, lone parents, disabled people, social tenants, homeless people, individuals/families subject to antisocial behaviour orders or family intervention projects, (ex-)offenders, and migrants. Research has consisted of 45 interviews with policy stakeholders (MPs, civil servants, heads of charities), 27 focus groups with service providers, and three waves of repeat qualitative interviews with 481 welfare service users across 10 interview locations in England and Scotland.

Our first task relating to data management and analysis, was how to deal with the logistics of storing and organising data on this scale. One of our key protocols has been the creation of a centralised Excel sheet used to collate participant information, contact details, and the stage each interview is at. It tells us, for example, when the interview recording has been uploaded to a shared network drive, transcribed, anonymised, added to our NVivo project file, case node created, attributes assigned, auto-coded, and coded & summarised in a framework matrix. On the analysis side, we have been using the server edition of NVivo. It became clear early into the fieldwork that working with multiple stand-alone project files that would be regularly merged and then redistributed would be impractical – with a high risk of merge conflicts arising due to the complexity of our data. The server project means multiple team members can access and work in the project file at the same time.

Another emerging challenge was the difficulty for team members to be involved in time-intensive fieldwork and dedicate sufficient time to analysis. We also needed to find an analytical approach which could offer information at a range of levels i.e. by individual over time; as well as across and within the policy areas and welfare domains under investigation.  There was debate amongst team members on having each policy team independently doing their own analysis versus a shared approach. Some felt a shared approach would be too time consuming compared to coding for specific outputs and that there were not enough commonalities between all the policy areas for there to be a workable shared approach. Others felt that coding for specific outputs would result in unnecessary repetition of analysis and make it difficult to reach general conclusions across the whole sample.

Read moreA Qualitative Computing Revolution?

Improving NVivo with AutoHotKey: Faster Attribute Values Input Script

The core component of the fieldwork for the Welfare Conditionality research project is an on-going three waves of qualitative interviews with 481 welfare service users sampled across nine different policy area. In order to assist with descriptive statistics and finding subgroups amongst our sample, we have a set of key attributes such as the participant’s age, household, benefits received,  etc. Furthermore, we have additional attributes specific to each policy area. Due to this, we have around fifty attributes in total that need values entered for them after each interview. By default NVivo offers three main ways to add attribute values, none of which are ideal for working with this amount of data entry.

The primary means of adding attribute data in NVivo is through the Attribute Values tab of the Node Properties dialogue window. This presents a list of drop-down menus for each of the attributes and can be laborious to work through. Similar to this is opening the Classification sheet and working along the row for the participant. In addition to having the same problem of developing RSI as the first method, this method has become nearly impossible to use as our project file has grown larger. Any change to an attribute value with the Welfare Service User classification sheet open now results in a 1-2 minute wait for NVivo to process the change. The third option is to save attribute data to an excel sheet and import it into NVivo. This introduces its own problems with ensuring values are typed correctly or setting up the excel sheet with acceptable values defined for each column, and still does not make any real time savings with the data entry process.

Read moreImproving NVivo with AutoHotKey: Faster Attribute Values Input Script