EdTech Tools and Privacy

Generative AI Tools & Privacy

Generative AI applications generate new content, such as text, images, videos, music, and other forms of media, based on user inputs. These systems learn from vast datasets containing millions of examples to recognize patterns and structures, without needing explicit programming for each task. This learning enables them to produce new content that mirrors the style and characteristics of the data they trained on.

AI-powered chatbots like ChatGPT can replicate human conversation. Specifically, ChatGPT is a sophisticated language model that understands and generates language by identifying patterns of word usage. It predicts the next words in a sequence, which proves useful for tasks ranging from writing emails and blogs to creating essays and programming code. Its adaptability to different writing and coding styles makes it a powerful and versatile tool. Major tech companies, such as Microsoft, are integrating ChatGPT into applications like MS Teams, Word, and PowerPoint, indicating a trend that other companies are likely to follow.

Despite their utility, these generative AI tools come with privacy risks for students. As these tools learn from the data they process, any personal information included in student assignments could be retained and used indefinitely. This poses several privacy issues: students may lose control over their personal data, face exposure to data breaches, and have their information used in ways they did not anticipate, especially when data is transferred across countries with varying privacy protections. To maintain privacy, it is crucial to handle student data transparently and with clear consent.

Detection tools like Turnitin now include features to identify content generated by AI, but these tools also collect and potentially store personal data for extended periods. While Turnitin has undergone privacy and risk evaluations, other emerging tools have not been similarly vetted, leaving their privacy implications unclear.

The ethical landscape of generative AI is complex, encompassing data bias concerns that can result in discriminatory outputs, and intellectual property issues, as these models often train on content without the original creators’ consent. Labour practices also present concerns: for example, OpenAI has faced criticism for the conditions of the workers it employs to filter out harmful content from its training data. Furthermore, the significant environmental impact of running large AI models, due to the energy required for training and data storage, raises sustainability questions. Users must stay well-informed and critical of AI platform outputs to ensure responsible and ethical use.


This article is part of a collaborative Data Privacy series by Langara’s Privacy Office and EdTech. If you have data privacy questions or would like to suggest a topic for the series, contact Joanne Rajotte (jrajotte@langara.ca), Manager of Records Management and Privacy, or Briana Fraser, Learning Technologist & Department Chair of EdTech

A.I. Detection: A Better Approach 

Over the past few months, EdTech has shared concerns about A.I. classifiers, such as Turnitin’s A.I. detection tool, AI Text Classifier, GPTZero, and ZeroGPT. Both in-house testing and statements from Turnitin and OpenAI confirm that A.I. text classifiers unreliably differentiate between A.I. and human generated writing. Given that the tools are unreliable and easy to manipulate, EdTech discourages their use. Instead, we suggest using Turnitin’s Similarity Report to help identify A.I.-hallucinated and fabricated references.  

What is Turnitin’s Similarity Report 

The Turnitin Similarity Report quantifies how similar a submitted work is to other pieces of writing, including works on the Internet and those stored in Turnitin’s extensive database, highlighting sections that match existing sources. The similarity score represents the percentage of writing that is similar to other works. 

AI Generated References 

A.I. researchers call the tendency of A.I. to make stuff up a “hallucination.” A.I.-generated responses can appear convincing, but include irrelevant, nonsensical, or factually incorrect answers.  

ChatGPT and other natural language processing programs do a poor job of referencing sources, and often fabricating plausible references. Because the references seem real, students often mistake them as legitimate. 

Common reference or citation errors include: 

  • Failure to include a Digital Object Identifier (DOI) or incorrect DOI 
  • Misidentification of source information, such as journal or book title 
  • Incorrect publication dates 
  • Incorrect author information 

Using Turnitin to Identify Hallucinated References 

To use Turnitin to identify hallucinated or fabricated references, do not exclude quotes and bibliographic material from the Similarity Report. Quotes and bibliographic information will be flagged as matching or highly similar to source-based evidence. Fabricated quotes, references, and bibliographic information will have zero similarity because they will not match source-based evidence.

Quotes and bibliographic information with no similarity to existing works should be investigated to confirm that they are fabricated.  

References

Athaluri S, Manthena S, Kesapragada V, et al. (2023). Exploring the boundaries of reality: Investigating the phenomenon of artificial intelligence hallucination in scientific writing through ChatGPT references. Cureus 15(4): e37432. doi:10.7759/cureus.37432 

Metz, C. (2023, March 29). What makes A.I. chatbots go wrong? The curious case of the hallucinating software. New York Times. https://www.nytimes.com/2023/03/29/technology/ai-chatbots-hallucinations.html 

Aligning language models to follow instructions. (2022, January 27). OpenAI. https://openai.com/research/instruction-following 

Weise, K., and Metz, C. (2023, May 1). What A.I. chatbots hallucinate. New York Times. https://www.nytimes.com/2023/05/01/business/ai-chatbots-hallucination.html 

Welborn, A. (2023, March 9). ChatGPT and fake citations. Duke University Libraries. https://blogs.library.duke.edu/blog/2023/03/09/chatgpt-and-fake-citations/ 

screenshot of a Turnitin Similarity Report, with submitted text on the left and the report panel on the right

AI Classifiers — What’s the problem with detection tools?

AI classifiers don’t work!

Natural language processor AIs are meant to be convincing. They are creating content that “sounds plausible because it’s all derived from things that humans have said” (Marcus, 2023). The intent is to produce outputs that mimic human writing. The result: The world’s leading AI companies can’t reliably distinguish the products of their own machines from the work of humans.

In January, OpenAI released its own AI text classifier. According to OpenAI “Our classifier is not fully reliable. In our evaluations on a “challenge set” of English texts, our classifier correctly identifies 26% of AI-written text (true positives) as “likely AI-written,” while incorrectly labeling human-written text as AI-written 9% of the time (false positives).”

A bit about how AI classifiers identify AI-generated content

GPTZero, a commonly used detection tool, identifies AI created works based on two factors: perplexity and burstiness.

Perplexity measures the complexity of text. Classifiers identify text that is predictable and lacking complexity as AI-generated and highly complex text as human-generated.

Burstiness compares variation between sentences. It measures how predictable a piece of content is by the homogeneity of the length and structure of sentences throughout the text. Human writing tends to be variable, switching between long and complex sentences and short, simpler ones. AI sentences tend to be more uniform with less creative variability.

The lower the perplexity and burstiness score, the more likely it is that text is AI generated.

Turnitin is a plagiarism-prevention tool that helps check the originality of student writing. On April 4th, Turnitin released an AI-detection feature.

According to Turnitin, its detection tool works a bit differently.

When a paper is submitted to Turnitin, the submission is first broken into segments of text that are roughly a few hundred words (about five to ten sentences). Those segments are then overlapped with each other to capture each sentence in context.

The segments are run against our AI detection model, and we give each sentence a score between 0 and 1 to determine whether it is written by a human or by AI. If our model determines that a sentence was not generated by AI, it will receive a score of 0. If it determines the entirety of the sentence was generated by AI it will receive a score of 1.

Using the average scores of all the segments within the document, the model then generates an overall prediction of how much text (with 98% confidence based on data that was collected and verified in our AI innovation lab) in the submission we believe has been generated by AI. For example, when we say that 40% of the overall text has been AI-generated, we’re 98% confident that is the case.

Currently, Turnitin’s AI writing detection model is trained to detect content from the GPT-3 and GPT-3.5 language models, which includes ChatGPT. Because the writing characteristics of GPT-4 are consistent with earlier model versions, our detector is able to detect content from GPT-4 (ChatGPT Plus) most of the time. We are actively working on expanding our model to enable us to better detect content from other AI language models.

The Issues

AI detectors cannot prove conclusively if text is AI generated. With minimal editing, AI-generated content evades detection.

L2 writers tend to write with less “burstiness.” Concern about bias is one of the reasons for UBC chose not to enable Turnitins’ AI-detection feature.

ChatGPT’s writing style may be less easy to spot than some think.

Privacy violations are a concern with both generators and detectors as both collect data.

Now what?

Langara’s EdTech, TCDC, and SCAI departments are working together to offer workshops on four potential approaches: Embrace it, Neutralize it, Ban it, Ignore it. Interested in a bespoke workshop for your department? Complete the request form.


References
Marcus, G. (2023, January 6). Ezra Klein interviews Gary Marcus [Audio podcast episode]. In The Ezra Klein Show. https://www.nytimes.com/2023/01/06/podcasts/transcript-ezra-klein-interviews-gary-marcus.html

Fowler, G.A. (2023, April 3). We tested a new ChatGPT-detector for teachers. If flagged an innocent student. Washington Post. https://www.washingtonpost.com/technology/2023/04/01/chatgpt-cheating-detection-Turnitin/

AI Detection Tool Testing — Initial Results

We’ve limited our testing to Turnitin’s AI detection tool. Why? Turnitin has undergone privacy and risk reviews and is a college-approved technology. Other detection tools haven’t been reviewed and may not meet recommended data privacy standards.

What We’ve Learned So Far

  • Unedited AI-generated content often receives a 100% AI-generated score, although more complex writing by ChatGPT4 can score far less than 100%.
  • Adding typos and grammar mistakes or prompting the AI generator to include errors throughout a document canchange the AI-generated score from 100% to 0%. 
  • Adding I-statements throughout a document has a dramatic impact in lowering the AI score. 
  • Interrupting the flow of text by replacing one word every couple of sentences with a less likely word, increases the perplexity of the wording and lowers the AI-generated percentage.  AI text generators act like text predictors, creating text by adding the most likely next work. If the detector is perplexed by a word because the word is not the most likely choice, then it’s determined to be human written.
  • Unlike human-generated writing, AI sentences tend to be uniform. Changing the length of sentences throughout a document, making some sentences shorter and others longer and more complex, alters the burstiness and lowers the generated-by-AI score. 
  • By replacing one or two words per paragraph and modifying the length of sentences here and there throughout a chunk of text — i.e. by doing minor tweaks of both perplexity and burstiness — the AI-generated score changes from 100% to 0%. 

To learn more about how AI detection tools work, read AI Classifiers — What’s the problem with detection tools?

AI tools & privacy

ChatGPT is underpinned by a large language model that requires massive amounts of data to function and improve. The more data the model is trained on, the better it gets at detecting patterns, anticipating what will come next and generating plausible text.

Uri Gal notes the following privacy concerns in The Conversation:

  • None of us were asked whether OpenAI could use our data. This is a clear violation of privacy, especially when data are sensitive and can be used to identify us, our family members, or our location.
  • Even when data are publicly available their use can breach what we call contextual integrity. This is a fundamental principle in legal discussions of privacy. It requires that individuals’ information is not revealed outside of the context in which it was originally produced.
  • OpenAI offers no procedures for individuals to check whether the company stores their personal information, or to request it be deleted. This is a guaranteed right in accordance with the European General Data Protection Regulation (GDPR) – although it’s still under debate whether ChatGPT is compliant with GDPR requirements.
  • This “right to be forgotten” is particularly important in cases where the information is inaccurate or misleading, which seems to be a regular occurrencewith ChatGPT.
  • Moreover, the scraped data ChatGPT was trained on can be proprietary or copyrighted.

When we use AI tools, including detection tools, we are feeding data into these systems. It is important that we understand our obligations and risks.

When an assignment is submitted to Turnitin, the student’s work is saved as part of Turnitin’s database of more than 1 billion student papers. This raises privacy concerns that include:

  • Students’ inability to remove their work from the database
  • The indefinite length of time that papers are stored
  • Access to the content of the papers, especially personal data or sensitive content, including potential security breaches of the server

AI detection tools, including Turnitin, should not be used without students’ knowledge and consent. While Turnitin is a college-approved tool, using it without students’ consent poses a copyright risk (Strawczynski, 2004).  Other AI detection tools have not undergone privacy and risk assessments by our institution and present potential data privacy and copyright risks.

For more information, see our Guidelines for Using Turnitin.

Turnitin and Student Privacy

Turnitin is a text matching tool that compares students’ written work with a database of student papers, web pages, and academic publications. The two main uses for Turnitin are: 1) for formative or low-stakes assessment of paraphrasing or citation; and 2) for prevention and identification of plagiarism.

Privacy Concerns

When an assignment is submitted to Turnitin for a text matching report, the student’s work is saved as part of Turnitin’s database of more than 1 billion student papers. This raises privacy concerns that include:

  • Students’ inability to remove their work from the database
  • The indefinite length of time that papers are stored
  • Access to the content of the papers, especially personal data or sensitive content, including potential security breaches of the server

Copyright Concerns

In addition, saving a student’s work on Turnitin’s database without their consent may put an institution at risk for legal action based on Canadian copyright law (Strawczynski, 2004). 

Guidelines for Using Turnitin

To mitigate these concerns, we recommend the following guidelines for all instructors using Turnitin:

  1. Be clear and transparent that you will be using Turnitin. Even if a course outline includes a statement indicating that Turnitin will be used in a course, we recommend not relying on that statement alone. Ideally, instructors should also explain to students that their papers will be stored on the company’s database and ask for their consent. If they don’t provide consent, have an alternate plan (see below).
  2. Decide whether or not students’ work needs to be saved on Turnitin’s database. The default is for all papers to be saved, but this can be changed. Not saving papers to the database means that those papers can’t be used to generate future similarity reports, but it does remove the privacy and copyright concerns.
  3. Coach students to remove identifying details. If the students’ submissions will be added to Turnitin’s database, make sure you get them to remove any personal information from their assignment, including their name, student number, address, etc. Meta-data that is embedded should also be removed (e.g. in track changes or file properties). If you’re having them submit to an assignment folder on Brightspace, their name will be with their submission so it shouldn’t be a problem if it’s not on the paper itself.
  4. Don’t run a similarity report for an individual student without their knowledge. Ethical use of Turnitin occurs when it is transparently and equally used for all students. Running a report only on a specific student’s work without their knowledge or consent is not transparent or equal.
  5. Consider whether or not the assignment is appropriate for Turnitin. If the students need to include personal or sensitive information in the assignment, Turnitin should probably not be used. If you do decide to use it, the students’ papers should not be stored in the database.
  6. If contacted by another institution, be cautious about revealing student information. If at some point in the future there is a match to one of your student’s papers in Turnitin’s database, Turnitin does not give the other institution access to the text of the paper but will provide the instructor at the other institution with your email. If you are contacted about a match, consider carefully before forwarding the paper or any identifying details about the student to the other institution. If you do want to forward the paper, you should obtain the student’s consent.

Alternatives to Confirm Authorship When Turnitin is Not Used

If a student objects to having their paper submitted to Turnitin, or if the assignment is not appropriate for submission to Turnitin because it includes personal or sensitive content, you can increase confidence that the students are doing their own work in other ways. For example, an instructor can require any or all of the following:

  • submission of multiple drafts
  • annotation of reference lists
  • oral defence of their work

Requiring students to complete any or all of these will increase the student’s workload which would mean that students who opt out of Turnitin aren’t at an advantage over students who opt in.

Helping Students Make Turnitin Work for Them

If you’re using Turnitin, it’s highly recommended that you adjust the settings to allow the students to see their similarity reports. You may need to teach students how to interpret the reports if they haven’t learned how to do so from a previous course. Turnitin’s website has resources if you need them (https://help.turnitin.com/feedback-studio/turnitin-website/student/student-category.htm#TheSimilarityReport) and you can also point your students to the Turnitin link on Langara’s Help with Student Learning Tools iweb (https://iweb.langara.ca/lts/brightspace/turnitin/). Finally, remember that these reports won’t be helpful to a student if they’re not given the chance to revise and resubmit after they see the report. In Brightspace, we recommend that instructors set up two separate assignment folders with Turnitin enabled: one for their draft and one for the final submission.

Have questions?

If you need support with Turnitin, please contact edtech@langara.ca

References

Strawczynski, J. (2004). When students won’t Turnitin: An examination of the use of plagiarism prevention services in Canada. Education & Law Journal 14(2), 167-190. 

Vanacker, B. (2011). Returning students’ right to access, choice and notice: a proposed code of ethics for instructors using Turnitin. Ethics & Information Technology, 13(4), 327-338.

Zaza, C., & McKenzie, A. (2018). Turnitin® Use at a Canadian University. Canadian Journal for the Scholarship of Teaching and Learning, 9(2). https://doi.org/10.5206/cjsotl-rcacea.2018.2.4

Zimmerman, T.A. (2018). Twenty years of Turnitin: In an age of big data, even bigger questions remain. The 2017 CCCC Intellectual Property Annual. Retrieved from https://prod-ncte-cdn.azureedge.net/nctefiles/groups/cccc/committees/ip/2017/zimmerman2017.pdf

Turnitin Now Available

Turnitin

Langara has purchased a campus-wide license for Turnitin to support faculty in teaching research and writing skills to their students while also encouraging academic integrity. Turnitin is a similarity checker which allows students and faculty to check assignments for matches in Turnitin’s database of papers, articles, and websites.

All Langara faculty have access to Turnitin through their Brightspace courses.

We hope that Turnitin will be used as an instructional tool to help students understand the College’s expectations for academic integrity and to practice their skills in summarizing, paraphrasing, quoting and citing their sources appropriately.

While Turnitin is a useful tool, it cannot detect all forms of plagiarism. However, if used in well-designed assignments and learning activities, Turnitin can play a valuable role in educating our students and emphasizing the importance of academic integrity.

Register for an information session: Turnitin Brown Bag  Sept.14, 2017 1:00-1:45 pm

More sessions will be scheduled throughout the fall semester.

Thanks to members of the Langara School of Management, EdTech, and IT for piloting, implementing and administering this new tool.

For more information about Turnitin and suggestions for its use, see https://iweb.langara.ca/edtech/learning-tools-and-technologies/turnitin/

For instructions on using Turnitin with Brightspace, see https://iweb.langara.ca/edtech/learning-tools-and-technologies/turnitin/using-turnitin-with-d2l/

For help designing assessments to encourage academic integrity, contact tcdc@langara.ca.

For setting up assignments with Turnitin in Brightspace, contact edtech@langara.ca.