I will be doing a presentation on the need for qualitative researchers to embrace open source software and my work on Pythia as part of the Urban Studies’ Monday workshops at the University of Glasgow on 26th November.
Qualitative data analysis software (QDAS) has the potential to revolutionise both the scale of qualitative research and the array of possible analysis techniques. Yet currently available software still imposes unnecessary limits that hinder and prevent this full potential from being realised. Additionally, it locks data and the analysis performed on it within proprietary file formats that makes the archiving and sharing of research difficult. Due to similar issues, open source solutions have seen increasing popularity in quantitative research, and it is perhaps time that qualitative researchers joined them. This presentation will therefore discuss both the issues of current proprietary QDAS as well as the potential of open source software for qualitative researchers. To do this, the myriad of issues experienced with NVivo by the Welfare Conditionality project will be used to exemplify the problems created by a reliance on expensive, slow, and poorly designed proprietary software. The second half of the presentation will focus on Pythia, an open source QDAS library written in Python I have been working on. Through covering the design philosophy, current progress, and long-term plans the potential of open source will be highlighted for being able to solve problems with current qualitative software, enable new creative analysis techniques, and allow researchers to reclaim control of their data.
The workshops, as far as I am aware, are open to Urban Studies’ staff and PhD students only. However, as usual I will upload a copy of my presentation slides after the event. Additionally, as part of the preparation for the presentation I will be aiming to write a few short blog posts on the design philosophy of Pythia, elaborate further on why there is a need for an open source QDAS, as well as write-ups and screenshots of progress. Unfortunately, development ground to an absolute halt during the eight months where all my spare time, energy, annual leave, mental health, hopes, dreams, and general will to live were sacrificed at the job hunting altar. I now have around 12 months before that hell begins again, so once I have taken care of the journal article writing backlog that also built up during that time the plan is to filter work on Pythia back into my weekly schedule.
Within NVivo, and likely other QDAS packages as well, it is possible to use the structure of interview transcripts for auto-coding. Basically, what auto-coding does is go through the transcript and using criteria specified by the user assigns text to chosen nodes (further explanation of auto-coding and how to do it in NVivo is available on the NVivo help website). This can be useful to separate out the different speakers within a transcript whereby everything they say is coded to a node with their participant code number. Even in one-to-one interviews this can be worth doing so that any word frequency queries, word clouds, etc can be limited to only include sections from the transcripts where a participant is speaking. However, any mistakes in the structure of the interview transcripts can result in them being incorrectly auto-coded. Depending on the extent and nature of the errors this can be a headache to manually fix. This post briefly covers what type of errors can arise and provides a set by step guide to creating a Visual Basic macro within Microsoft Word that can automate the process of fixing the paragraph styles in transcripts so they can be auto-coded without error.
The challenges of data management and analysis on a large longitudinal qualitative research project
Computer aided qualitative data analysis has the potential to revolutionise both the scale of research and possible analysis techniques. Yet, the software itself still imposes limits that hinder and prevent this full potential from being realised. This post looks at the large and complex dataset created as part of the Welfare Conditionality research project, the analytical approach adopted, and the challenges QDAS faces.
The Welfare Conditionality project has two broad research questions in setting out to consider the issues surrounding sanctions, support, and behaviour change. Firstly, is conditionality ‘effective’ – and if so for whom, under what conditions, and by what definition of effective. And, secondly, whether welfare conditionality is ‘ethical’ – how do people justify or criticise its use and for what reasons. To answer these questions, we have undertaken the ambitious task of collecting a trove of qualitative data on conditional forms of welfare. Our work across nine policy areas, each of which has a dedicated ‘policy team’ that is responsible for the research. The policy areas are: unemployed people, Universal Credit claimants, lone parents, disabled people, social tenants, homeless people, individuals/families subject to antisocial behaviour orders or family intervention projects, (ex-)offenders, and migrants. Research has consisted of 45 interviews with policy stakeholders (MPs, civil servants, heads of charities), 27 focus groups with service providers, and three waves of repeat qualitative interviews with 481 welfare service users across 10 interview locations in England and Scotland.
The core component of the fieldwork for the Welfare Conditionality research project is an on-going three waves of qualitative interviews with 481 welfare service users sampled across nine different policy area. In order to assist with descriptive statistics and finding subgroups amongst our sample, we have a set of key attributes such as the participant’s age, household, benefits received, etc. Furthermore, we have additional attributes specific to each policy area. Due to this, we have around fifty attributes in total that need values entered for them after each interview. By default NVivo offers three main ways to add attribute values, none of which are ideal for working with this amount of data entry.
The primary means of adding attribute data in NVivo is through the Attribute Values tab of the Node Properties dialogue window. This presents a list of drop-down menus for each of the attributes and can be laborious to work through. Similar to this is opening the Classification sheet and working along the row for the participant. In addition to having the same problem of developing RSI as the first method, this method has become nearly impossible to use as our project file has grown larger. Any change to an attribute value with the Welfare Service User classification sheet open now results in a 1-2 minute wait for NVivo to process the change. The third option is to save attribute data to an excel sheet and import it into NVivo. This introduces its own problems with ensuring values are typed correctly or setting up the excel sheet with acceptable values defined for each column, and still does not make any real time savings with the data entry process.
The above video is an example of using a script I wrote in AutoHotKey in order to provide another alternative. The script translates the keypresses on the numpad into a series of keypresses that select the desired attribute value and then moves focus to the next attribute. For example, if the second value for the selected attribute is ‘Unemployed’, pressing ‘2’ on the numpad would set the value to ‘Unemployed’ and move the focus to the next attribute so the user can press another numpad key to input the next attribute value. Alongside using post-interview checklists that have the number written next to each value, it greatly reduces the amount of time required for data entry. Further details about the script and how to use it are include below. The script file and an executable version of it are available from a Github repository.