k-Anonymity and database encryption

In ADAM, we have a parent portal which allows parents to see some of the academic information we store about their children. This information needs to be secure but allow untrained individuals easy access to their portal. We want to have minimal interaction with the school to lower their support burden.

The task is made easier by the fact that we already know a great deal of information about our parents. Our role in identifying them is based on whether they are able to tell us information that we know that, because it was given to us by them, they should also know. Our approach uses a three-prong registration process. They are asked for their national identification number and their cellphone number. If we are able to match this information to a parent, we send a password reset-link to the email address stored on record. Subsequent logins require their ID number and this password. Forgotten passwords require reset via email.

While that is all good, it does mean that we are storing their ID numbers in our database. It would be prudent to encrypt all of these, but that leaves us with a problem when parents log in: decrypting all of the ID numbers to find out which one matches is a computationally expensive task.

I am looking at using a k-anonymity model which I learned about from Troy Hunt in his pwned passwords API. In simple terms, k-anonymity uses the principle of safety in numbers.

An example of this: instead of decrypting every ID number in the database, let’s find those that match the first four digits and just check those. To allow this to happen, we would need to have the ID number stored in encrypted format plus an additional field that just stored the first four digits of the ID number in plain text. We could search for those and see which possible ID numbers are likely matches. If someone were to discover the first four digits, the argument is that they still wouldn’t know the whole ID number.

That decreases the number of ID numbers we need to decrypt by a factor of about 400-fold (you might expect it to reduce the number by a factor of 104 = 10000 but there is limited variation in the first four digits of an ID number of a parent: most of whom will be born in the 60s and 70s meaning that the majority will have a first digit of 6 or 7. The third digit in an ID number is the leading 0 or 1 of the month: 2 x 10 x 2 x 10 = 400).

Instead of having to decrypt, say, 1000 ID numbers, we need only decrypt, on average, 2.5 on each login.

However, storing the first 4 characters in plain text still reveals the year and month of birth for a parent. This is potentially still leaking personal information and so we need to go further. Enter hashing.

Hashing is a one-way operation which allows us to verify the original information without necessarily knowing what it was. When people hear “one-way mathematical operation”, they tend to respond in disbelief. All of algebra, for example, operates on the idea of reversible operations! To illustrate a very simple example of one-way operations, consider a four digit number: 4371. We could “hash” this in many different ways. Let’s use the trivial method of adding all the digits: 4 + 3 + 7 + 1 = 15. If you knew just that the hash was “15”, there is no way that you could know for certain what the original number was. (An academic aside: this is meant to illustrates a one-way operation only and is not meant to suggest a good option for hashing! It is left as an exercise for the reader to decide why this algorithm is bad).

A commonly used hashing algorithm, such as SHA1, could be used to hash the ID numbers. Because the resulting hash is hexadecimal, this means that based on the hash, just by storing the first three digits of the hash, we can reduce the number of ID numbers that need to be checked by approximately 16 x 16 x 16 = 4096-fold. This, mostly, means that we will probably need to only decrypt one ID number at the most to check if it’s valid.

We would only store the first three characters of the hash. There are limited ways of using this information to reverse engineer the ID numbers without generating a list of every ID number and its hash. Even then, with only 4096 three-digit hashes and 1,28 billion (realistic) ID numbers possible, this means that approximately 310 000 ID numbers would likely generate a hash with the same first 3 characters. So even if we knew the truncated hash, that means that our ID number is one of 310 000 possible ID numbers that would reduce to the same hash. Our original ID number is a proverbial needle in a hay-stack with no way to tell what it is.

With most schools only having a few thousands of parents in their database, the three-character hash is an effective way to filter the number of possibilities to an acceptably low level.

Using Google Docs as a Web Publishing Platform

When a system becomes complex enough, documentation is unavoidable. I used to believe that our product was simple enough for anyone to understand, but with continued development comes complexity.

To start with, I created a MS Word document that would serve as the manual, but this suffered problems of compatibility and even saving in PDF wasn’t ideal: asking a person to page to page 63 of a PDF when their copy was out of date is not ideal and lead to one or two near disasters.

The second iteration was a WordPress site – not unlike this one – on which I could publish the updated help information. However, at that point, including screenshots and the like was a tedious affair: save, upload, insert… This only meant that the work was slow to update and was often out of date.

Iteration three was progression of the Word Document in that it was simple enough to convert that document to a Google Document and use that platform and a read-only shared copy as the documentation. This had a few advantages. Editing it is remarkably simple and it could be downloaded and, even, printed in a sensible format if required. Importantly, I could send people a URL that would jump them to a specific heading in the document, so it was easy to refer people to the specific help that they needed.

The Google Doc also allows for intra-document linking, referring to other headers in the document and being able to link to them so that the user can jump to those links. The links are web-standard #-anchors, but in a Google Doc, they are a random series of characters, so you wouldn’t know where you’re jumping to if you saw the URL on its own.

That was one thing when the document was 100 pages long, but we’re now approaching 300 pages. The document loads reasonably quickly on my 100Mbps line but setting one’s browser to “3G” network emulation is painful to say the least.

G Suite has a “Publish to Web” feature which trims out all the JS required for editing and allows a much quicker download and rendering of the document. Its formatting is a bit off, but it works. The static web document is automatically updated whenever changes occur in the original. Frustratingly, it includes things like the page numbers in the table of contents. If one hits print from this web view, the page numbers don’t lead anywhere close. The images still take forever to load on a slow connection because we’re loading from a single document.

One irritation with both the Google options is that the URLs I would typically email to people are very long and very ugly and don’t provide any context to what the recipient is about to click on.

This led me to write my own script that fetches this static document and splits it up into sections, spits out a host of smaller HTML files.

Step 1: Define a bunch of constants. This will make life easier later. Importantly is the link to the documentation, the templates and the output folder. Also, I’ve included a sitemap which can be generated, and this needs a URL prefix.

 define ("DOCUMENTATION_URL", "https://docs.google.com/document/d/e/XXXXXXXXXX/pub");
define ("TEMPLATE_FOLDER", __DIR__ . DIRECTORY_SEPARATOR . '..' . DIRECTORY_SEPARATOR . 'template' . DIRECTORY_SEPARATOR);
define ("OUTPUT_FOLDER", __DIR__ . DIRECTORY_SEPARATOR . '..' . DIRECTORY_SEPARATOR . 'html' . DIRECTORY_SEPARATOR);
define ("SITEMAP_URL", "https://help.adam.co.za/");

Now that we have the constants, we can fetch the HTML. We do a bit of error checking to make sure that we aren’t about to stuff up the whole site.

 $html = @file_get_contents (DOCUMENTATION_URL);

if ($html === false)
{
echo "Could not retrieve URL " . DOCUMENTATION_URL . ". Aborting.";
exit ();
}

Broadly speaking, the HTML is formatted as follows:

<html>
<head>
<title>Document Title</title>
<style>/* Some arb styling */</style>
</head>
<body>
<div id="header">Document Title</div>
<div id="contents">
<style>/* All Google's CSS styling needed */</style>
<!-- Everything here your document needs... →
</div>
<div id="footer"><!-- Some stuff... -->
<script>/* some JS used to render and sanitize links */</script>
</body>
</html>

I’ll refer to this later, but for now let’s just read this into a DOMDocument object which allows for much easier manipulation of the DOM.

 $dom = new DOMDocument();
if (@$dom->loadHTML ('<?xml encoding="utf-8" ?>' . $html) === false)
{
echo "Could not parse HTML. Aborting.";
exit ();
}

Three important things are happening. The first is that we are prepending an encoding tag which forces DOMDocument to interpret the characters as UTF-8. Without this, any extended characters, including curly-quotes, will look horrific. We are @suppressing the warning messages here. This document seems to cause problems for DOMDocument, but browsers are happy with this. The issues come in while reading the JavaScript that Google appends to the document, so the errors, in my experience, are not important. Lastly, we check to make sure that the HTML was read successfully. If it wasn’t, we abort!

 $xPath = new DOMXPath ($dom);

$style = $xPath->query ('/html/body/div[@id="contents"]/style')->item (0)->nodeValue;
$styleHash = md5 ($style);
file_put_contents (OUTPUT_FOLDER . 'default.css', $style);

Using a DOMXPath object, we navigate the DOM in order to extract the “<style> tag within the div#contents tag. These are stored in the $style variable. I generated a hash so that if the style contents change, we can signal this to browsers who might otherwise cache the CSS files.

The CSS styles are all class based and are named c0, c1, c2 and so on. A small change in the document seems to change this numbering arbitrarily and so it is important that if a document is fetched that depends on a new style sheet that it is prompted to get it!

Finally, I write the styles to a CSS file in the output folder which I’ll make use of later.

 $menuchange = [];
$outline = [];
$h1id = null;
$h2id = null;
$h3id = null;
$nodes = $xPath->query ('/html/body/div[@id="contents"]/*');

One uglyness of the Google document is that its anchors are not intuitively named. I am going to store the existing anchors in the $menuchange array, generate new anchors and use this array as a lookup to replace the ugly with the readable!

The $outline variable is going to be used to store our document in a usable format. I’ll talk about this in some detail just now.

The three heading ID variables are used to keep track of which headings we are under. This is important for document hierarchy and bears some additional discussion at this point.

An HTML document is ultimately flat in its structure but we interpret that differently. An HTML document might be structured as follows:

  • <h1>Heading</h1>
  • <p>Paragraph 1</p>
  • <p>Paragraph 2</p>
  • <h2>Subheading</h2>
  • <p>Paragraph 3</p>
  • <h1>Second Heading</h1>
  • <h2>Subheading 2</h2>
  • <p>Paragraph 4</p>
  • <h3>Sub Sub Heading</h3>
  • <p>Paragraph 5</p>

The structure does not convey the hierarchy. We interpret the hierarchy as follows:

  • <h1>Heading</h1>
    • <p>Paragraph 1</p>
    • <p>Paragraph 2</p>
    • <h2>Subheading</h2>
      • <p>Paragraph 3</p>
  • <h1>Second Heading</h1>
    • <h2>Subheading 2</h2>
    • <p>Paragraph 4</p>
      • <h3>Sub Sub Heading</h3>
      • <p>Paragraph 5</p>

Each non-heading element is a child of the last heading element that came before it. Headings are added to the first lower numbered heading that we have, working up.

I want to treat <h1>s differently and for each to be in its own file to reduce the amount of data that gets downloaded.

In the Google HTML, each heading had its own anchor assigned to it. In my remapping, <h1>s would get their own file, and <h2> and <h3> tags would get the most recent <h1> file with a newly defined anchor for them.

All this leads me to the central worker of this process; where we iterate over the nodes.

 foreach ($nodes as $node)
{
if ($node->nodeName == 'h1' && $node->attributes->getNamedItem ('id') instanceof DOMNode)
{
$h1id = $node->attributes->getNamedItem ('id')->textContent;
$filename = count ($outline) == 0 ? "index.html" : sanitiseTextForLink ($node->textContent) . ".html";
  $outline [$h1id] ['heading'] = $node->textContent;
$outline [$h1id] ['newlink'] = $filename;
  $outline [$h1id] ['content'] = [];

  $menuchange [$h1id] = $outline [$h1id] ['newlink'];
  $h2id = null;
  $h3id = null;
  }
  elseif ($node->nodeName == 'h2' && $node->attributes->getNamedItem ('id') instanceof DOMNode)
  {
  $h2id = $node->attributes->getNamedItem ('id')->textContent; $outline [$h1id] [$h2id] ['heading'] = $node->textContent;
$outline [$h1id] [$h2id] ['id'] = sanitiseTextForLink ($node->textContent);
$outline [$h1id] [$h2id] ['newlink'] = $outline [$h1id] ['newlink'] . "#" . $outline [$h1id] [$h2id] ['id'];
$outline [$h1id] [$h2id] ['content'] = [];

$menuchange [$h2id] = $outline [$h1id] [$h2id] ['newlink'];
$h3id = null;
}
elseif ($node->nodeName == 'h3' && $node->attributes->getNamedItem ('id') instanceof DOMNode)
{
$h3id = $node->attributes->getNamedItem ('id')->textContent;
$outline [$h1id] [$h2id] [$h3id] ['heading'] = $node->textContent;
$outline [$h1id] [$h2id] [$h3id] ['id'] = sanitiseTextForLink ($node->textContent);
$outline [$h1id] [$h2id] [$h3id] ['newlink'] = $outline [$h1id] ['newlink'] . "#" . $outline [$h1id] [$h2id] [$h3id] ['id'];
$outline [$h1id] [$h2id] [$h3id]['content'] = [];

$menuchange [$h3id] = $outline [$h1id] [$h2id] [$h3id] ['newlink'];
}
elseif ($h1id === null)
{
// do nothing.
}
elseif ($h2id === null)
{
$outline [$h1id] ['content'] [] = $node->ownerDocument->saveHTML ($node);
}
elseif ($h3id === null)
{
$outline [$h1id] [$h2id] ['content'] [] = $node->ownerDocument->saveHTML ($node);
}
else
{
$outline [$h1id] [$h2id] [$h3id] ['content'] [] = $node->ownerDocument->saveHTML ($node);
}
}

Let me try and explain what is going on in this block. It’s a lot!

The first ‘if’ block is interested in <h1> tags with an ID. We need to remember this ID so that we can replace it if there are any other references to it in the document. We read the ID into the variable $h1id.

Because we’re dealing with an <h1> tag, we know that this is going to be in a new file. If there are no previous <h1> tags, then this one must be called “index.html” to provide a default page for our site. If it’s not the first <h1> it’s come across, then we create a filename based on a sanitised version of its name. I’ve included the sanitising function later.

We remember the heading text, the new filename and initialise an array for any content that might appear beneath it.

The lookup for the old to the new anchor link is added to the $menuchange array.

Finally, because we have just seen an <h1> tag, we know that we are not “under” an <h2> or <h3> tag and so we set those values to null.

The next two blocks are similar and deal with <h2> and <h3> tags. There are some differences.

In both these blocks, our $newlink is set to the <h1> filename plus the (new) anchor for the heading. This is so that if we click on a link it will automatically reference anchors in a different file.

The second difference is that we record, additionally, the new sanitised ID. This probably could be extracted from the “newlink” property of the array. I’ve just saved it while we had it.

The last difference is that in the <h2> block, we set the $h3id variable to not reference anything, but we don’t need to do this in the <h3> block since we aren’t worried about any heading levels lower than this.

You will, of course, have noticed the additional levels in the arrays for the in the latter two blocks. This is what generates our hierarchy that we wanted.

Those first three if-blocks take care of the heading tags. Every other tag can now be substituted under it.

The fourth if-block checks to see if $h1id is null. As the comment suggests, this block does nothing; effectively discarding any content in the document that appears before the first <h1> tag. Such as the cover page and table of contents.

We we check the fifth if-block, we are at a point where we have an <h1> but not yet an <h2>. This must therefore belong to <h1> content. As such we add it at that level in our $outline.

Similarly, we repeat the process if $h3id is null, adding content to the current <h2> tag.

Finally, we add whatever content is left to the <h3> tag.

This other leftover content could include other heading tags (<h4>, <h5>, etc.) but these are not important to our menu structure and so we ignore them. It does mean that any links to heading 4s and 5s will not be managed, but I’m ok with that for this particular project.

Now that we have our document hierarchy, we can begin generating our HTML documents.

 $menu = getMenuStructure ($outline, "");

$sitemap = [];
$template = file_get_contents (TEMPLATE_FOLDER . 'template.html');

foreach ($outline as $h1 => $content)
{
$filename = $outline [$h1] ['newlink'];
$sitemap [] = SITEMAP_URL . $filename;

$html = str_replace ("#title", $outline [$h1] ['heading'], $template);
$html = str_replace ("#cssfile", "default.css?{$styleHash}", $html);
$html = str_replace ("#menu", $menu, $html);

$content = "
" . $outline [$h1] ['heading'] . "
";
foreach ($outline [$h1] ['content'] as $htmlSnip)
{
$content .= relink ($htmlSnip);
}
foreach ($outline [$h1] as $h2 => $subheadings)
{
if (substr ($h2, 0, 2) == 'h.')
{
$content .= "" . $outline [$h1] [$h2] ['heading'] . "
";
foreach ($outline [$h1] [$h2] ['content'] as $htmlSnip)
{
$content .= relink ($htmlSnip);
}
foreach ($outline [$h1] [$h2] as $h3 => $subheadings)
{
if (substr ($h3, 0, 2) == 'h.')
{
$content .= "" . $outline [$h1] [$h2] [$h3] ['heading'] . "
";
foreach ($outline [$h1] [$h2] [$h3] ['content'] as $htmlSnip)
{
$content .= relink ($htmlSnip);
}
}
}
}
}
$html = str_replace ("#body", $content, $html);

$file = fopen (OUTPUT_FOLDER . $filename, "w");
fwrite ($file, $html);
fclose ($file);
}

We being here by creating a menu structure. This function is described later, but essentially generates a list of links to include in the file.

We reset our sitemap and being by reading in our template file. Then we loop through out outline.

We get the filename from our outline property and add the file, with the site URL, to the future site map.

We do a bunch of substitutions into our template of the title (which, because each <h1> has its own file, is the content of the <h1>), and link this file to the Google CSS file, including the hash so that browsers get a hint when its time to refresh their cached version. We also substitute the menu structure into the template file.

I was considering customising the menu for each file, but I’ve relied on JavaScript instead to format the menu depending on which file it is in. It would, of course, be more efficient to trim it down since each file has the entire menu structure. This might give way to expandable menus in the future. For now, they are just hidden.

The $content variable has the <h1> tag added, followed by any <h1> content and finally, followed by any <h2> tags. There is possible a bit of recursion that I could have employed here since there are some fairly similar repeated processes for the three heading levels.

The relink function, also described later, does some regex magic to substitute old anchors with new anchors and update any links to those anchors. In addition, it reformats the external URLs (so that they don’t go via Google’s “you’ve clicked on a link… are you sure” service). This also finds YouTube videos and replaces the paragraph they’re in with an embedded video. That’s quite smart, I thought!

We end this block of code by substituting our body content into our template and then writing the file into the output folder. Next!

 copy (TEMPLATE_FOLDER . 'custom.css', OUTPUT_FOLDER . 'custom.css');
copy (TEMPLATE_FOLDER . 'display.js', OUTPUT_FOLDER . 'display.js');
copy (TEMPLATE_FOLDER . 'logo.png', OUTPUT_FOLDER . 'logo.png');

file_put_contents (OUTPUT_FOLDER . "sitemap.txt", implode ("\n", $sitemap));

The last important step of this process is to move the assets into the main folder, including the JavaScript file that is ultimately responsible for the menu formatting.

function relink ($html)
{
global $menuchange;
$matches = [];
preg_match_all ('/<a [^>]+href="#(h\.[a-zA-Z0-9]+)"/s', $html, $matches);
foreach ($matches [1] as $match)
{
if (isset ($match) && isset ($menuchange [trim ($match)]))
{
$html = str_replace ("#{$match}", $menuchange [trim ($match)], $html);
}
}

$matches = [];
preg_match_all ('/<a [^>]*href="(https?:\/\/www.google.com\/url\?q=([^"]*)&sa=D&ust=[0-9]+)"[^>]*>/s', $html, $matches);
foreach ($matches [0] as $key => $match)
{
if (isset ($matches [1] [$key]) && isset ($matches [2] [$key]))
{
$html = str_replace ($matches [1] [$key], $matches [2] [$key], $html);
}
}

$matches = [];
preg_match_all ('/<p[^>]*>.+?<a [^>]*href="https?:\/\/(www\.)?youtu(be\.com|\.be)\/(watch\?v=)?([a-zA-Z0-9-_]+)"[^>]*>.+?<\/p>/', $html, $matches);
foreach ($matches [0] as $key => $match)
{
if (isset ($matches [4] [$key]) && !empty ($matches [4] [$key]))
{
$html = "<iframe width=\"640\" height=\"360\" src=\"https://www.youtube.com/embed/{$matches [4] [$key]}\" frameborder=\"0\" allow=\"accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture\" allowfullscreen></iframe>";
}
}

return $html;
}

This block of code does my link substitution. I’m a bit worried about the middle part – the substitution of Google’s links – since that could change without notice by Google. The YouTube embedding should be a little more predictable.

The anchor linking is done in the first part, using a global (yuck) variable to access our substitution lookup.

function getMenuStructure ($outline, $current, $level = 1)
{
if ($level > 3)
{
return '';
}

$menu = "";
foreach ($outline as $heading1 => $details)
{
if (isset ($outline [$heading1] ['heading']))
{
$menu .= '<li><a href="' . $outline [$heading1] ['newlink'] . '">' . $outline [$heading1] ['heading'] . "</a></li>\n";
}
if (is_array ($outline [$heading1]))
{
$menu .= getMenuStructure ($details, $current, $level + 1);
}
}

if ($menu == '')
{
return '';
}

$final = "<ul class='menu{$level}'>" . trim ($menu) . "</ul>\n";

return $final;
}

Here, I am looking through the outline to get any ‘heading’ elements of the current array. There is a recursive call to get headings further down the hierarchy. A nested, unordered list (<ul>) takes care of the indentations for me, which are styled using CSS.

Finally, the sanitiseTextForLink function:

function sanitiseTextForLink ($text)
{
return strtolower (str_replace ('--', '-', str_replace (' ', '-', preg_replace ('/[^a-zA-Z0-9 ]/', '', $text))));
}

I’m not sharing the CSS or JS with you for the final product since this post has gone on for long enough!

I am also aware that this is not pleasant code to look at, but it seems to work really well. Any changes that I make are updated in the help site within 10 minutes with no additional work from me.

Are you a teacher registered on the SACE CPTD website?

UPDATE: The SACE website has undergone some changes which include having the site under HTTPS. There is no indication that the password and username situation on the site has improved. Please use a random password generated by a password manager.

Yesterday, I logged in to the South African Council of Educators (SACE) Continuous Professional Training and Development (CPTD) website to check the status of my CPTD points. I was sure I had registered with them before, but couldn’t find a record of my password in my password manager.

I clicked on the “Forgotten Password” link and was prompted for my mobile number. After several minutes of waiting, I had not received the SMS I imagined would follow. The only other option was to see if I could re-register as an educator on their website. I was asked for my names, my SACE registration number and my South African ID number. Immediately, the website responded to say that my username was my SACE registration number, and displayed my password to me – my surname.

Slightly confused, I wondered how my password was ever set to something like that. Perhaps filling in the form had reset it? I nevertheless began the process of changing the password.

Changing a password on the SACE site is not straight forward – or at least does it very differently to the generally accepted ways of going about it. Unlike nearly every website on the internet, one has to log out before one can change one’s password. On the log in screen, enter your username and password and then, just above the “Log in” button, is some text with the link “Change password”. Clicking on that text reveals two new password fields which can be completed with one’s new password. You thus conduct a login and change your password at the same time.

I immediately went to my password manager to have it generate a random 20 character password for me. I pasted this into the new password field and clicked on the “Log in” button. An error was displayed that my password had to be between 8 and 12 characters long.

Now most people don’t use password managers (that is a problem!) and so it is unlikely that they would often run into the problem of a password not being long enough. But when a website does complain about it, it usually means that they are storing your password in plain text.

Websites have a number of options when storing passwords. But the best way to store a password is not to store it at all. But, you’ll say, if a website doesn’t store your password, how can it know if you’ve provided the correct password when you log in? The answer is technical – and mathematically too sophisticated for my mind – but is based around the idea of an irreversible algorithm. Something that can happen in only one direction. Now, in this instance, we’re talking about mathematical operations, but there are some very obvious real world analogies: unringing a bell, descrambling an egg, unbaking a cake… These all represent things that, once done, can’t be undone without a certain amount of guess work.

The result of putting something through an algorithm like this is known as a “hash”. If we put the same password through the same algorithm, we must end up with the same hash. Thus when you log in, the password is put through the algorithm, its hash is generated and compared to the hash that is stored on file. If that matches, then you must have provided the same password.

Back to the SACE site, I was now disturbed. A number of issues started jumping out at me.

The site is not HTTPS protected. In fact, the Google Chrome web browser reports the login screen as “Not secure”. In coming months, this browser will display this warning in an alarming red colour. This will be displayed whenever you are about to send a password over a non-encrypted channel. This means that your password is transmitted in plain text for anyone to intercept. The first rule of hacking is that people don’t like to remember different passwords. If you find out someone’s password, you probably have their password to more than just the SACE site. With a bit of investigation, they may have access to your email, your banking and who knows what else.

I logged out again and returned to the educator registration function on the front page. By supplying my SACE registration number and ID number, I returned to the login screen to have my recently set password made visible to me. This is horrific.

This means that your (potentially favourite) password is sitting in plain text in a database on a server that does not have adequate protections on it. While there is no evidence of foul play here, it does mean that if this database is ever compromised (and their current security practices indicate that it might be easier than we imagine to do so), teachers’ email addresses, passwords, and schools that they teach at are compromised.

In addition to having their online profiles put at risk, many schools who use online administration systems (such as the one I develop!) may well use their same passwords for those systems. And access to those systems means access to confidential and personal information about the minors entrusted into their care.

Conclusions

  1. If you’ve registered on the SACE CDTP website and have used a password that you use anywhere else, change it now!
  2. SACE needs a serious wake up call in dealing with information that has the potential to jeopardise the safety of teachers and pupils.
  3. Teachers and schools who use online systems (whether for school administration or just email) need proper online safety training. Schools should insist on unique, complex passwords for their staff.
  4. While we are forced to use passwords, use a password manager!

Distance Learning in the Classroom

I went Matric marking earlier this month and used the time to speak to some very forward thinking teachers. One such discussion linked closely with the idea of a “cyber school” day that we began talking about at Beaulieu College. The idea was that teachers would post work online for pupils who would stay home to complete it. The thought was to allow teachers uninterrupted time for professional development.

With the advent of MOOCs, such as Coursera, EdX, and others, it is apparent that many of today’s students will have exposure and experience with some form of distance learning before they are too old. Thus I believe it is imperative that we start getting our pupils ready for this reality.

But how could one do this in a Maths classroom? The idea of self-study is attractive given an already pressurised syllabus.

One such idea is to use a tool such as Geogebra to set up simulations – and it does a number of things exceptionally well, allowing us to work through quite a range of the syllabus – which pupils can then play with. Here is my first attempt. Can you guess the lesson here?

That last question is not an idle one. As an educator – or at least someone who has gone through the school system – it’s all well and good to say that “the angle at the centre is twice the angle at the circumference”, but a Grade 10 pupil is not going to come up with that by themselves.

This is where Google Classroom comes in. Using an assignment, everyone receives a copy of their own Google Doc. We can conduct a guided tour through the simulation and we have effectively got an electronic worksheet to fill in. First fiddle with D, now fiddle with C. Do the same things happen?

The goal here is to then provide a longer time for them to work through the content – breaking it up into bite-sized chunks. There aren’t too many right and wrong answers and so when it comes time to assess the work done, I plan to take a leaf out of Carol Dweck’s research and use a scale of “Not yet”, “Achieved” and, for a very few, “Outstanding”.

The work won’t be discussed in class, but I do plan to have, towards the end of the section, a somewhat more rigorous assessment of the work covered. Again, this will be graded on the same “not yet”, “achieved” and “outstanding” scale. I would like it to be all covered online and so, ideally, they won’t have any exposure to the work in class until the mid-year examinations – just like it happens in real life.

These are my ideas for potential areas to include self-study in the CAPS syllabus:

  • Grade 8
    • Learning to use a scientific calculator (specifically, the Casio 991)
    • Data handling
    • Parallel line geometry
  • Grade 9
    • Data handling
    • Functions (straight line)
    • Finance (simple interest, annual compound interest, depreciation, HP calculations)
  • Grade 10
    • Functions (vertical and horizontal shift)
    • Circle geometry
    • Trigonometry intro (definitions of ratios, right-angled triangles)
  • Grade 11
    • Geometry
    • Data handling
  • Grade 12
    • Data handling

Can you tell I don’t enjoy teaching data handling? What are your ideas?

An Ode to Google Drive

The last two weeks have been quite an interesting two weeks in the world of Google Apps for Education with the release of their newest product, Google Classroom.

That there is a very tight integration with Google Drive that I love. And so, in spite of Classroom being the flavour of the month, it is Drive that is the hero here.

Google Drive’s sharing model means that I can link to a document in the middle of a folder full of other documents that I don’t want students to see without having to panic about sharing permissions and the integrity of the rest of the folder. Sharing it in Classroom is easy and so instead of pupils getting a myriad of links to manage and documents to find, one of Classroom’s primary functions is to create an interface where those links and documents can be centrally stored.

One of my favourite features of the mobile app version of Google Drive is that it has a great document scanner. This has meant that uploading worked solutions to the notes and worksheets that I distribute using Classroom is as easy as taking a few photos with my phone and then simply pointing to the PDF that Drive created from within Classroom.

I will confess that I’ve not yet made use of the Assignments feature in Classroom. I haven’t rushed to this yet because it is apparent that the pupils need to be brought on slowly. I’ve been surprised by some of the technical obstacles that the students have faced, but mostly these have been minor insecurities and have quickly adapted to the system. As an effective pilot class, I want their experience to be positive. This will help when it comes to encouraging and driving staff and other classes to use the system.

Because Maths doesn’t lend itself to pupils typing up their homework, I intend to make use of the Drive scanner for assignment handins. Pupils will effectively submit a PDF for marking. With iPad apps such as PDFExpert having built-in Google Drive capabilities, I can annotate and synchronise back marked documents, and then “grade” and return the assignment in Classroom. If I were in charge, the next feature that Google would bring to Drive is the online annotation of PDF files…

One issue that has caught me out more than a few times is the fact that I have multiple Google accounts and share folders between them. Because much of the content originates from my home PC, this often happens to be my home PC which is signed in using a personal account. This means that while the file is accessible in my school Google Drive folders, my school account doesn’t own the document. Classroom won’t share a document that isn’t owned by someone within the school. I guess that this has to do with the sharing model and the fact that document ownership cannot be transferred between domains. The solution isn’t complex, but it is irritating: make a copy of the document and share that. My workflow is to right-click and copy, delete the “personally” owned version and then rename the “school” owned version.

While people are raving about Classroom, I’m quietly aware that without Google Drive, Classroom would be nothing. Tools like Moodle provide much more comprehensive features (including linking directly to – although not automatic sharing of – Google Drive documents), but it is power of Drive that makes the workflow in Classroom so much more appealing.