## Course structure

- For students satisfying the statistics prerequisites
- For students satisfying the computer science prerequisites
- For students who satisfy both the statistics and the computer science prerequisites

This course is available for entry:

- Semester 1 (February)

### 200 credit points taken over 2 years full-time.

This course is available as full or part-time for domestic students.

Your course will comprise of:

- Compulsory subjects (125 points); four core subjects in statistics (50 points), four core subjects in computer science (50 points) and a 25 point capstone project
- Elective subjects (25 points)

*Students can enter the degree with a background in data science, or with either a statistics or computer science background. Students with either a statistics or computer science background will need to enrol in up to 50 points of prerequisite subjects to acquire the background skills in the other discipline.*

## For students satisfying the statistics prerequisites

Year 1 - Semester 1 | |
---|---|

Prerequisite Subjects - Computer Science | Points |

Programming and Software DevelopmentCore | 12.5 |

## Programming and Software DevelopmentAIMS The aims for this subject is for students to develop an understanding of approaches to solving moderately complex problems with computers, and to be able to demonstrate proficiency in designing and writing programs. The programming language used is Java. INDICATIVE CONTENT Topics covered will include: Java basics Console input/output Control flow Defining classes Using object references Programming with arrays Inheritance Polymorphism and abstract classes Exception handling UML basics Interfaces Generics. Detailed Information COMP90041Type Core | |

Algorithms and ComplexityCore | 12.5 |

## Algorithms and ComplexityAIMS The aim of this subject is for students to develop familiarity and competence in assessing and designing computer programs for computational efficiency. Although computers manipulate data very quickly, to solve large-scale problems, we must design strategies so that the calculations combine effectively. Over the latter half of the 20th century, an elegant theory of computational efficiency developed. This subject introduces students to the fundamentals of this theory and to many of the classical algorithms and data structures that solve key computational questions. These questions include distance computations in networks, searching items in large collections, and sorting them in ord... Detailed Information COMP90038Type Core | |

Core Subjects - Statistics | Points |

Mathematical StatisticsCore | 12.5 |

## Mathematical StatisticsThe theory of statistical inference is important for applied statistics and as a discipline in its own right. After reviewing random samples and related probability techniques including inequalities and convergence concepts the theory of statistical inference is developed. The principles of data reduction are discussed and related to model development. Methods of finding estimators are given, with an emphasis on multi-parameter models, along with the theory of hypothesis testing and interval estimation. Both finite and large sample properties of estimators are considered. Applications may include robust and distribution free methods, quasi-likelihood and generalized estimating equations. ... Detailed Information MAST90082Type Core | |

Statistical ModellingCore | 12.5 |

## Statistical ModellingStatistical models are central to applications of statistics and their development motivates new statistical theories and methodologies. Commencing with a review of linear and generalized linear models, analysis of variance and experimental design, the theory of linear mixed models is developed and model selection techniques are introduced. Approaches to non and semiparametric inference, including generalized additive models, are considered. Specific applications may include longitudinal data, survival analysis and time series modelling. Detailed Information MAST90084Type Core |

Year 1 - Semester 2 | |
---|---|

Prerequisite Subjects - Computer Science | Points |

Knowledge TechnologiesCore | 12.5 |

## Knowledge TechnologiesAIMS Much of the world's knowledge is stored in the form of unstructured data (e.g. text) or implicitly in structured data (e.g. databases). In this subject students will learn algorithms and data structures for extracting, retrieving and analysing explicit knowledge from various data sources, with a focus on the web. Topics include: data encoding and markup, web crawling, regular expressions, document indexing, text retrieval, clustering, classification and prediction, pattern mining, and approaches to evaluation of knowledge technologies. INDICATIVE CONTENT Introduction to Knowledge Technologies; String search; Genomics; Text processing and search; Web search and retrieval; Introduction... Detailed Information COMP90049Type Core | |

Database Systems & Information ModellingCore | 12.5 |

## Database Systems & Information ModellingAIMS The subject introduces key topics in modern information organization, particularly with regard to structured databases. The well-founded relational theory behind modern structured query language (SQL) engines, has given them as much a place behind the web site of an organization and on the desktop, as they traditionally enjoyed on corporate mainframes. Topics covered may include: the managerial view of data, information and knowledge; conceptual, logical and physical data modelling; normalization and de-normalization; the SQL language; data integrity; transaction processing, data warehousing, web services and organizational memory technologies. This is a core foundation subject for b... Detailed Information INFO90002Type Core | |

Core Subjects - Statistics | Points |

Computational Statistics and Data MiningCore | 12.5 |

## Computational Statistics and Data MiningComputing techniques and data mining methods are indispensible in modern statistical research and applications, where “Big Data” problems are often involved. This subject will introduce a number of recently developed statistical data mining methods that are scalable to large datasets and high-performance computing. These include regularized regression such as the Lasso; tree based methods such as bagging, boosting and random forests; and support vector machines. Important statistical computing algorithms and techniques used in data mining will be explained in detail. These include the bootstrap, cross-validation, the EM algorithm, and Markov chain Monte Carlo methods including the Gibbs s... Detailed Information MAST90083Type Core | |

Multivariate Statistical TechniquesCore | 12.5 |

## Multivariate Statistical TechniquesMultivariate statistics concerns the analysis of collections of random variables that has general applications across the sciences and more recently in bioinformatics. It overlaps machine learning and data mining, and leads into functional data analysis. Here random vectors and matrices are introduced along with common multivariate distributions. Multivariate techniques for clustering, classification and data reduction are given. These include discriminant analysis and principal components. Classical multi-variate regression and analysis of variance methods are considered. These approaches are then extended to high dimensional data, such as that commonly encountered in bioinformatics, mot... Detailed Information MAST90085Type Core |

Year 2 - Semester 1 | |
---|---|

Core Subjects - Computer Science | Points |

Advanced Database SystemsCore | 12.5 |

## Advanced Database SystemsAIMS Many applications require access to very large amounts of data. These applications often require reliability (data must not be lost even in the presence of hardware failures), and the ability to retrieve and process the data very efficiently. The subject will cover the technologies used in advanced database systems. Topics covered will include: transactions, including concurrency, reliability (the ACID properties) and performance; and indexing of both structured and unstructured data. The subject will also cover additional topics such as: uncertain data; Xquery; the Semantic Web and the Resource Description Framework; dataspaces and data provenance; datacentres; and data archiving. ... Detailed Information COMP90050Type Core | |

Web Search and Text AnalysisCore | 12.5 |

## Web Search and Text AnalysisAIMS The aims for this subject is for students to develop an understanding of the main algorithms used in natural language processing and text retrieval, for use in a diverse range of applications including search engines, cross-language information retrieval, machine translation, text mining, question answering, summarisation, and grammar correction. Topics to be covered include text normalisation, sentence boundary detection, part-of-speech tagging, n-gram language modelling, and text classification. The programming language used is Python. INDICATIVE CONTENT Topics covered will include: Document classification, including gender detection, topic detection and language identification... Detailed Information COMP90042Type Core | |

Core Subjects - Computer Science | Points |

Cluster and Cloud ComputingCore | 12.5 |

## Cluster and Cloud ComputingAIMS The growing popularity of the Internet along with the availability of powerful computers and high-speed networks as low-cost commodity components are changing the way we do parallel and distributed computing (PDC). Cluster and Cloud Computing are two approaches for PDC. Clusters employ cost-effective commodity components for building powerful computers within local-area networks. Recently, “cloud computing” has emerged as the new paradigm for delivery of computing as services in a pay-as-you-go-model via the Internet. These approaches are used to tackle may research problems with particular focus on "big data" challenges that arise across a variety of domains. Some examples of scient... Detailed Information COMP90024Type Core | |

Capstone Subjects | Points |

Data Science Project Part 1Capstone | 12.5 |

## Data Science Project Part 1Detailed Information MAST90106Type Capstone |

Semester 2 - Semester 2 | |
---|---|

Core Subjects - Computer Science | Points |

Statistical Machine LearningCore | 12.5 |

## Statistical Machine LearningAIMS With exponential increases in the amount of data becoming available in fields such as finance and biology, and on the web, there is an ever-greater need for methods to detect interesting patterns in that data, and classify novel data points based on curated data sets. Learning techniques provide the means to perform this analysis automatically, and in doing so to enhance understanding of general processes or to predict future events. Topics covered will include: supervised learning, semi-supervised and active learning, unsupervised learning, kernel methods, probabilistic graphical models, classifier combination, neural networks. This subject is intended to introduce graduate students... Detailed Information COMP90051Type Core | |

Elective Subjects | Points |

Analysis of High Dimensional DataElective | 12.5 |

## Analysis of High Dimensional DataDetailed Information MAST90110Type Elective | |

Advanced Statistical ModellingElective | |

## Advanced Statistical ModellingDetailed Information MAST90111Type Elective | |

Capstone Subjects | Points |

Data Science Project Part 2Capstone | 12.5 |

## Data Science Project Part 2Type Capstone |

## For students satisfying the computer science prerequisites

Semester 1 | |
---|---|

Core Subjects - Computer Science | Points |

Web Search and Text AnalysisCore | 12.5 |

## Web Search and Text AnalysisAIMS The aims for this subject is for students to develop an understanding of the main algorithms used in natural language processing and text retrieval, for use in a diverse range of applications including search engines, cross-language information retrieval, machine translation, text mining, question answering, summarisation, and grammar correction. Topics to be covered include text normalisation, sentence boundary detection, part-of-speech tagging, n-gram language modelling, and text classification. The programming language used is Python. INDICATIVE CONTENT Topics covered will include: Document classification, including gender detection, topic detection and language identification... Detailed Information COMP90042Type Core | |

Advanced Database SystemsCore | 12.5 |

## Advanced Database SystemsAIMS Many applications require access to very large amounts of data. These applications often require reliability (data must not be lost even in the presence of hardware failures), and the ability to retrieve and process the data very efficiently. The subject will cover the technologies used in advanced database systems. Topics covered will include: transactions, including concurrency, reliability (the ACID properties) and performance; and indexing of both structured and unstructured data. The subject will also cover additional topics such as: uncertain data; Xquery; the Semantic Web and the Resource Description Framework; dataspaces and data provenance; datacentres; and data archiving. ... Detailed Information COMP90050Type Core | |

Prerequisite Package - Statistics | Points |

Methods of Mathematical StatisticsCore | 25 |

## Methods of Mathematical StatisticsType Core |

Semester 2 | |
---|---|

Core Subjects - Computer Science | Points |

Statistical Machine LearningCore | 12.5 |

## Statistical Machine LearningAIMS With exponential increases in the amount of data becoming available in fields such as finance and biology, and on the web, there is an ever-greater need for methods to detect interesting patterns in that data, and classify novel data points based on curated data sets. Learning techniques provide the means to perform this analysis automatically, and in doing so to enhance understanding of general processes or to predict future events. Topics covered will include: supervised learning, semi-supervised and active learning, unsupervised learning, kernel methods, probabilistic graphical models, classifier combination, neural networks. This subject is intended to introduce graduate students... Detailed Information COMP90051Type Core | |

Elective Subjects | Points |

ElectiveElective | 12.5 |

## ElectiveType Elective | |

Prerequisite Package - Statistics | Points |

A First Course in Statistical Learning Core | 25 |

## A First Course in Statistical LearningDetailed Information MAST90104Type Core |

Semester 3 | |
---|---|

Capstone Subjects | Points |

Data Science Project Part 1Capstone | 12.5 |

## Data Science Project Part 1Type Capstone | |

Core Subjects - Computer Science | Points |

Cluster and Cloud ComputingCore | 12.5 |

## Cluster and Cloud ComputingAIMS The growing popularity of the Internet along with the availability of powerful computers and high-speed networks as low-cost commodity components are changing the way we do parallel and distributed computing (PDC). Cluster and Cloud Computing are two approaches for PDC. Clusters employ cost-effective commodity components for building powerful computers within local-area networks. Recently, “cloud computing” has emerged as the new paradigm for delivery of computing as services in a pay-as-you-go-model via the Internet. These approaches are used to tackle may research problems with particular focus on "big data" challenges that arise across a variety of domains. Some examples of scient... Detailed Information COMP90024Type Core | |

Core Subjects - Statistics | Points |

Mathematical StatisticsCore | 12.5 |

## Mathematical StatisticsThe theory of statistical inference is important for applied statistics and as a discipline in its own right. After reviewing random samples and related probability techniques including inequalities and convergence concepts the theory of statistical inference is developed. The principles of data reduction are discussed and related to model development. Methods of finding estimators are given, with an emphasis on multi-parameter models, along with the theory of hypothesis testing and interval estimation. Both finite and large sample properties of estimators are considered. Applications may include robust and distribution free methods, quasi-likelihood and generalized estimating equations. ... Detailed Information MAST90082Type Core | |

Statistical ModellingCore | 12.5 |

## Statistical ModellingStatistical models are central to applications of statistics and their development motivates new statistical theories and methodologies. Commencing with a review of linear and generalized linear models, analysis of variance and experimental design, the theory of linear mixed models is developed and model selection techniques are introduced. Approaches to non and semiparametric inference, including generalized additive models, are considered. Specific applications may include longitudinal data, survival analysis and time series modelling. Detailed Information MAST90084Type Core |

Semester 4 | |
---|---|

Capstone Subjects | Points |

Data Science Project Part 1Capstone | 12.5 |

## Data Science Project Part 1Type Capstone | |

Core Subjects - Statistics | Points |

Computational Statistics and Data MiningCore | 12.5 |

## Computational Statistics and Data MiningComputing techniques and data mining methods are indispensible in modern statistical research and applications, where “Big Data” problems are often involved. This subject will introduce a number of recently developed statistical data mining methods that are scalable to large datasets and high-performance computing. These include regularized regression such as the Lasso; tree based methods such as bagging, boosting and random forests; and support vector machines. Important statistical computing algorithms and techniques used in data mining will be explained in detail. These include the bootstrap, cross-validation, the EM algorithm, and Markov chain Monte Carlo methods including the Gibbs s... Detailed Information MAST90083Type Core | |

Multivariate Statistical TechniquesCore | 12.5 |

## Multivariate Statistical TechniquesMultivariate statistics concerns the analysis of collections of random variables that has general applications across the sciences and more recently in bioinformatics. It overlaps machine learning and data mining, and leads into functional data analysis. Here random vectors and matrices are introduced along with common multivariate distributions. Multivariate techniques for clustering, classification and data reduction are given. These include discriminant analysis and principal components. Classical multi-variate regression and analysis of variance methods are considered. These approaches are then extended to high dimensional data, such as that commonly encountered in bioinformatics, mot... Detailed Information MAST90085Type Core | |

Elective Subjects | Points |

ElectiveElective | 12.5 |

## ElectiveType Elective |

## For students who satisfy both the statistics and the computer science prerequisites

Semester 1 | |
---|---|

Core Subjects - Computer Science | Points |

Web Search and Text AnalysisCore | 12.5 |

## Web Search and Text AnalysisAIMS The aims for this subject is for students to develop an understanding of the main algorithms used in natural language processing and text retrieval, for use in a diverse range of applications including search engines, cross-language information retrieval, machine translation, text mining, question answering, summarisation, and grammar correction. Topics to be covered include text normalisation, sentence boundary detection, part-of-speech tagging, n-gram language modelling, and text classification. The programming language used is Python. INDICATIVE CONTENT Topics covered will include: Document classification, including gender detection, topic detection and language identification... Detailed Information COMP90042Type Core | |

Advanced Database SystemsCore | 12.5 |

## Advanced Database SystemsAIMS Many applications require access to very large amounts of data. These applications often require reliability (data must not be lost even in the presence of hardware failures), and the ability to retrieve and process the data very efficiently. The subject will cover the technologies used in advanced database systems. Topics covered will include: transactions, including concurrency, reliability (the ACID properties) and performance; and indexing of both structured and unstructured data. The subject will also cover additional topics such as: uncertain data; Xquery; the Semantic Web and the Resource Description Framework; dataspaces and data provenance; datacentres; and data archiving. ... Detailed Information COMP90050Type Core | |

Core Subjects - Statistics | Points |

Mathematical StatisticsCore | 12.5 |

## Mathematical StatisticsThe theory of statistical inference is important for applied statistics and as a discipline in its own right. After reviewing random samples and related probability techniques including inequalities and convergence concepts the theory of statistical inference is developed. The principles of data reduction are discussed and related to model development. Methods of finding estimators are given, with an emphasis on multi-parameter models, along with the theory of hypothesis testing and interval estimation. Both finite and large sample properties of estimators are considered. Applications may include robust and distribution free methods, quasi-likelihood and generalized estimating equations. ... Detailed Information MAST90082Type Core | |

Statistical ModellingCore | 12.5 |

## Statistical ModellingStatistical models are central to applications of statistics and their development motivates new statistical theories and methodologies. Commencing with a review of linear and generalized linear models, analysis of variance and experimental design, the theory of linear mixed models is developed and model selection techniques are introduced. Approaches to non and semiparametric inference, including generalized additive models, are considered. Specific applications may include longitudinal data, survival analysis and time series modelling. Detailed Information MAST90084Type Core |

Semester 2 | |
---|---|

Core Subjects - Computer Science | Points |

Statistical Machine LearningCore | 12.5 |

## Statistical Machine LearningAIMS With exponential increases in the amount of data becoming available in fields such as finance and biology, and on the web, there is an ever-greater need for methods to detect interesting patterns in that data, and classify novel data points based on curated data sets. Learning techniques provide the means to perform this analysis automatically, and in doing so to enhance understanding of general processes or to predict future events. Topics covered will include: supervised learning, semi-supervised and active learning, unsupervised learning, kernel methods, probabilistic graphical models, classifier combination, neural networks. This subject is intended to introduce graduate students... Detailed Information COMP90051Type Core | |

Elective Subjects | Points |

ElectiveElective | |

## ElectiveType Elective | |

Core Subjects - Statistics | Points |

Computational Statistics and Data MiningCore | 12.5 |

## Computational Statistics and Data MiningComputing techniques and data mining methods are indispensible in modern statistical research and applications, where “Big Data” problems are often involved. This subject will introduce a number of recently developed statistical data mining methods that are scalable to large datasets and high-performance computing. These include regularized regression such as the Lasso; tree based methods such as bagging, boosting and random forests; and support vector machines. Important statistical computing algorithms and techniques used in data mining will be explained in detail. These include the bootstrap, cross-validation, the EM algorithm, and Markov chain Monte Carlo methods including the Gibbs s... Detailed Information MAST90083Type Core | |

Multivariate Statistical TechniquesCore | 12.5 |

## Multivariate Statistical TechniquesMultivariate statistics concerns the analysis of collections of random variables that has general applications across the sciences and more recently in bioinformatics. It overlaps machine learning and data mining, and leads into functional data analysis. Here random vectors and matrices are introduced along with common multivariate distributions. Multivariate techniques for clustering, classification and data reduction are given. These include discriminant analysis and principal components. Classical multi-variate regression and analysis of variance methods are considered. These approaches are then extended to high dimensional data, such as that commonly encountered in bioinformatics, mot... Detailed Information MAST90085Type Core |

Semester 3 | |
---|---|

Capstone Subjects | Points |

Data Science Project Part 1Capstone | 12.5 |

## Data Science Project Part 1Type Capstone | |

Core Subject - Computer Science | Points |

Cluster and Cloud ComputingCore | 12.5 |

## Cluster and Cloud ComputingAIMS The growing popularity of the Internet along with the availability of powerful computers and high-speed networks as low-cost commodity components are changing the way we do parallel and distributed computing (PDC). Cluster and Cloud Computing are two approaches for PDC. Clusters employ cost-effective commodity components for building powerful computers within local-area networks. Recently, “cloud computing” has emerged as the new paradigm for delivery of computing as services in a pay-as-you-go-model via the Internet. These approaches are used to tackle may research problems with particular focus on "big data" challenges that arise across a variety of domains. Some examples of scient... Detailed Information COMP90024Type Core | |

Elective Subjects | Points |

ElectiveElective | |

## ElectiveType Elective | |

ElectiveElective | |

## ElectiveType Elective |

Semester 4 | |
---|---|

Capstone Subjects | Points |

Data Science Project Part 2Capstone | 12.5 |

## Data Science Project Part 2Type Capstone | |

Elective Subjects | Points |

ElectiveElective | |

## ElectiveType Elective | |

ElectiveElective | |

## ElectiveType Elective | |

ElectiveElective | |

## ElectiveType Elective |