How to use a Word macro to fix interview transcripts for auto-coding in NVivo



Within NVivo, and likely other QDAS packages as well, it is possible to use the structure of interview transcripts for auto-coding. Basically, what auto-coding does is go through the transcript and using criteria specified by the user assigns text to chosen nodes (further explanation of auto-coding and how to do it in NVivo is available on the NVivo help website). This can be useful to separate out the different speakers within a transcript whereby everything they say is coded to a node with their participant code number. Even in one-to-one interviews this can be worth doing so that any word frequency queries, word clouds, etc can be limited to only include sections from the transcripts where a participant is speaking. However, any mistakes in the structure of the interview transcripts can result in them being incorrectly auto-coded. Depending on the extent and nature of the errors this can be a headache to manually fix. This post briefly covers what type of errors can arise and provides a set by step guide to creating a Visual Basic macro within Microsoft Word that can automate the process of fixing the paragraph styles in transcripts so they can be auto-coded without error.

Auto-coding requires transcripts to be structured in particular ways and that this structure remains consistent throughout the whole of the document. When auto-coding one of the easiest ways to structure transcripts is to use headings in the text to signal who is speaking. For example, applying “Heading 1” to “Interviewer” and “Participant” labels before what was said. Such paragraph styles (e.g. Normal, Title, Quote, Heading 1, Heading 2, etc.) can then be used by NVivo as signals for which node to code the text that follows them. The majority of transcribers are happy to produce transcripts based on a specified style. However, within Word it is not always clear what paragraph style is applied to each section of text. Sections that look like plain text can in fact have had a heading style applied to it, then been reformatted to look like plain text but without changing the paragraph style. Similarly, empty lines can still have paragraph styles applied to them, meaning that any which accidentally have a heading style applied can lead to the text following it being coded to the wrong node. Within NVivo this results in a node with the name ‘—’ being created, with any text following an empty line with a heading being coded to it.

Pressing Ctrl and F on the keyboard at the same time in Word brings up the Navigation panel. The first tab “Headings” shows the structure of the headings used in the document. As can be seen in the screenshot above although the text following the first IV1 looks like plain text both the paragraph “Lorem ipsum…” and the empty line are included in the Headings structure.

On the Welfare Conditionality project that I am currently part of, we currently have over 1000 interview and focus group transcripts. With each interview and focus group lasting roughly an hour we have thousands of pages of text. All transcripts were specified to use ‘Heading 1’ for the participant and interviewer labels, with the rest in plain text. Whilst there is only a small percentage of transcripts with errors in the paragraph styles they remained frequent enough to cause headaches when trying to auto-code them. The picture below shows a mock-up example of the results from auto-coding based on paragraph styles on some of our transcripts. As part of the anonymisation process we give each participant a code number; in this example ‘WSU-DU-PF-012’. For the interviewer we use ‘IV1’ so that the label is distinct from the rest of the transcript, and as will be shown later making creating a macro easier. The lorem ipsum are sections of the interview where the interviewer or participant were speaking with ‘Heading 1’ wrongly applied. It is possible to open each of the nodes and manually recode the sections. However, depending on the number of transcripts and how often errors in the paragraph styles occur this can be a lengthy process.

The second problem with auto-coding is that a researcher may only learn about this capability of QDAS once they start analysis. Their current collection of transcripts are then not in a format that lends itself to being auto-coded. Again it looks like they are either going to have to go back through their transcripts manually changing the style or not bother with it, losing the ability to do certain types of analysis. The guide below assumes that the structure of your transcript is with the Interview and Participant labels on lines above the paragraph where they speak. If your transcript has a different structure it is also possible to add steps to the macro that will modify the structure as well as fixing the Heading formatting. If you need any help with doing so, leave a comment detailing the structure of your transcript and I’ll add further details at the end of the post.

There is an easy solution to the first problem, and it is one we have used on the Welfare Conditionality project to quickly correct our transcripts before they are imported into our NVivo project file. It is possible within Word to create what is known as a ‘macro’. Similar to auto-coding, what a macro does is automate certain tasks to save the user the tedium of manually repeating repetitive actions. By saving a sequence of actions and instructions together as a macro it means the user can in future perform the whole task as one action by running the macro. Macros can be anything from a simple list of step-by-step instructions to being miniature programs in-themselves that modify the way actions are performed based on further information provided by the user. One of the few positive things I will say about Microsoft Office is that the inbuilt macro support is good. The main caveat is that macros for Office are unfortunately written in Visual Basic, a programming language developed by Microsoft. Also given the power of macros there was a trend in the late 90s and early 2000s for writing macro viruses that were distributed by attaching an innocent looking Word document containing the virus to an e-mail. Due to this, there has been numerous changes between Office versions in the way macros are supported and how to access them. The same security features added to prevent malicious use of macros, have also tended to make it a pain when trying to share macros with other people.

The reason the way macros are supported in Office remains handy despite them having to be written in Visual Basic for Applications (VBA) is that it is possible to create macros without knowing much, if anything at all, about the language. This is through the ability to record macros, which allows a user to carry out the actions they want the macro to repeat and Office translates it into the corresponding VBA. This code can then be modified to simplify it, add in variables, or extend it in other ways. To be honest, although I learnt Visual Basic at school I tend to just record macros and then modify / clean the code afterwards to save me the hassle of working with VBA. The simplicity in creating them also makes up for some of the stress encountered when trying to share macros with someone else. The instructions below are based on using Office 2016. Macro support has been improved in Office 2016 whereby message prompts are by default silenced. Message prompts are the small windows that can pop up with information such as ‘Spell check complete’ that require a user to hit OK to close them. In previous versions of Office, macros would retain these prompts in the code and any subsequent actions would be halted until the user hit OK. This is easily fixable by removing a single line from the code. Off the top of my head I cannot remember the line. However, I am happy to update the guide with additional details if anyone has difficulties with it for the version of Office that they have.

Creating a macro

In the Ribbon Bar, the collection of tabs with various toolbars that appear at the top of the window, within the View tab the Macros dropdown has the option to “Record Macro…”. After specifying a few options, which we will come to in a second, all subsequent actions the user makes until hitting ‘Stop Recording’ are saved as a macro. All that is needed then to create a macro to fix the paragraph styles in an interview transcript is for the user to perform once a series of actions to correct any errors that can be used for any transcript. A nice and easy way to achieve this when Headings have been wrongly applied is by clearing all the formatting in the document and then using search and replace to selectively change the speaker labels to the desired paragraph style. In the instructions below I am assuming this is ‘Heading 1’. (Side note: My Office UI is black rather than the default as I despise user-interfaces that overly abuse blindingly white elements)

After hitting record macro a prompt opens asking for a name, where to assign the macro, where to store it, and an optional description. For the purposes of this guide we are going to call the macro “Heading_correction”, assign it to a button, store it in “All Documents (Normal.dotm)”, and give it a short description. Choose the button setting last, as it opens a new window which after completing starts the process of recording the macro. Assigning the macro to a button means that it can be added to the Ribbon Bar or the Quick Access Toolbar, the line of buttons along the very top of the Word window. Every time the button is pressed the macro will run. In the next step, I’ll show you how to add the macro to the Quick Access Toolbar from the prompt that opens after clicking Button. Alternatively, it is possible to assign the macro to the keyboard. This can be used so every time the user presses a chosen combination of keys, say Ctrl + Shift + M, then the macro will run. Storing the macro to “All Documents (Normal.dotm)” means that the macro will be available within any document the user opens on that machine. It is also possible to save a macro to the currently open document, meaning it would not be available if another document was open, and is preferable in some use cases. It is normally best practice to give a description, especially when planning to share a macro with other users and as a reminder to yourself in situations where you might not use the macro on a frequent basis.

Hitting Button displays a new window, as seen in the screenshot below. By default it opens on the Quick Access Toolbar – though you can also add it to the ribbon by navigating to ‘Customize Ribbon’ in the list on the left. Click the name of the macro, “Normal.NewMacros.Heading_corre…” in the left hand column. Then hit “Add>>” to move it to the right-hand column, which is a list of all the buttons that appear on the Quick Access Toolbar. Don’t worry if your list is not the same as mine, I have customised mine already to include actions I use most frequently. After moving the macro to the Quick Access Toolbar list, hit OK.

Be careful after hitting OK, every input you make is now being recorded. First of all, we want to make all existing text ‘Normal’. To do this, click anywhere in the text and press Ctrl + A, which selects everything. On the Ribbon Bar, open the Home tab and select the ‘Normal’ formatting.

Now that all the formatting has been set to Normal, we want to apply Heading 1 to all labels signalling that the text following it is spoken by the interviewer. Press Ctrl + H, which opens the “Find and Replace” prompt. In this example, we are using “IV1” as the interviewer label so this is entered in the “Find what:” text box. Enter the same again in “Replace with:”. Next hit the “More>>” button on the bottom left.

This adds “Search Options” and “Replace” sections to the bottom of the prompt. In “Search Options” change “Search: Down” to “Search: All”. In “Replace” click “Format” to open a drop-down list. From the list select Style.

This opens another prompt from which it is possible to select “Heading 1”. Select it and hit OK to return to the “Find and Replace” prompt.

You should now see under the “Replace with:” text box, “Style: Heading 1”. Finally hit “Replace All”. All instances of IV1 will now be formatted as Heading 1. This is why it is beneficial to use a unique code for the label rather than simply “Interviewer”. If using the latter then any time someone says “interviewer” during the transcript will result in the whole paragraph being formatted as Heading 1. It is possible to avoid this with extra steps, such as the end of the “Find what:” text box adding a paragraph mark (which denotes a new line). This can be done by clicking the “Special” button that appears next to the “Formatting” one.

After hitting “Replace All” a message prompt should display with the number of changes made. In the main window displaying the document you should also notice any instance of IV1 change to have Heading 1 formatting. Hit OK to close the message prompt. This is one of the prompts that in earlier versions of Office would also display when running the macro, but thankfully in the latest versions is silenced by default. With the “Find and Replace” prompt still open change the “Find what:” and “Replace what:” text boxes to a unique code used at the start of all your participant code numbers. In the Welfare Conditionality project we use WSU in our transcripts to denote “Welfare Service User”. The rest of the code denotes interviewer initial, first two letters of city where the interview took place, and which number of interview for that interviewer it is. On your own project you’ll obviously want your own code number system. However, make sure it has a common unique element such as “WSU-” that will appear in all the code numbers. What is important in creating the macro is that all your actions are repeatable within any of your transcripts. This is why we only specify the common element of all participant code numbers so that the same find and replace will work in any of the transcripts. For example, searching for “WSU-” would find “WSU-DA…”, “WSU-FR..”, or any other series of characters as long as it follows “WSU-”.

Hit “Replace All” again. Another message prompt will appear to confirm the number of changes. Click OK to close it. This may be all the steps you need to fix your transcripts. If that is the case then click “Close” on the “Find and Replace” prompt. On the Welfare Conditionality project we have two further steps. All our transcripts contain metadata about the transcript at the start such as the name of the interviewer, date of interview, name of transcriber, and interview code number. Given the code number appears here as well, the find and replace operations formatted it as well as Heading 1. A quick find and replace of “Code number:” to “Code number:” with Normal formatting fixes that. Similarly, to ensure all interviews have the correct code number assigned to them, interviewers read it out at the start of the interviewer and it is included as part of the transcription. To fix this another quick find and replace of “the code number for this interview is” to the same with Normal formatting is all that is needed. As with the other steps if you need help adding any additional steps leave a comment.

Once all the steps are complete, on the Ribbon Bar go to the View tab again and select Macros and hit “Stop Recording”.

In your Quick Access Toolbar, you should see a button that looks like three squares at the points in a triangle. This is the macro we have created. Now if you open any of your transcripts and click this button all the formatting will be fixed to a structure that will work with auto-coding in NVivo.

The screenshot below shows the Heading structure from the same document as shown at the beginning after running the macro.

And that’s it! Every time you finish transcribing an interview, or a transcript comes back from the transcribers, open it in Word and hit the Marco button we have created on the Quick Access Toolbar and all of the formatting will be fixed so that there will be no errors when auto-coding it in NVivo. Macros are really that simple, and once you get the hang of thinking about how to translate repetitive tasks into a series of reproducible actions they are incredibly powerful. As always if anything is unclear or you run into any problems please leave a comment and I’ll update the guide with further info. As an advocate for open source software I plan to write up a future guide showing how to achieve the same using a short Python script – with the added bonus of being able to apply it to every text file in a folder at the same time.


Leave a comment