mathematics

Data Mining assignment

The main aim of this coursework is to critically analyse data sources and data sets, critically evaluate possible data analytics challenges and solutions, choose, design and implement data mining algorithms to the chosen data, and apply the data mining techniques to specific case studies. The coursework is worth 100 marks, and the distribution of marks is detailed on the marking scheme.

You are expected to explore one or two chosen data set(s) of your choice from open data mining/machine learning (re)sources, to develop case studies and apply data mining techniques on the data set(s) for supervised and/or unsupervised learning, as motivated and decided by which is suitable (depending on the data set characteristics). Tasks A, B, and G are compulsory, and you must choose 2 tasks from C, D, E, and F:

Task A. [20 marks] Data Choice.

Name the chosen data set(s) (from module resources, UCI ML Repository or other open data sources or own collection) and describe the data (e.g. attribute types and values, source of data)

[5 marks]

Describe the data mining problem (and background) you will address e.g. as a classification, prediction, association, clustering, or text mining related exercise

[5 marks]

Introduce the specific data mining question(s) related to the problem, with specific reference to the dataset(s) and the expected or proposed outcome of the data mining task upon completion

[10 marks]

Task B. [20 marks] Data Analysis

Analyse the data. Describe the context and content of the data in light of the chosen data mining task and proposed outcomes, discussing characteristics of the data that will present opportunities/challenges for the task.

[10 marks]

Add functionality/ies for descriptive statistics in the view of this problem: details of contextual programming and usage of graphical representation and analysis of data are expected. For example, sort the data by class, line or bar plot each of the features individually if applicable; for each feature compute characteristics like its minimum, maximum, mean, mode and standard deviation, and study the correlation between features for each class or the distance matrix. If attributes are not normalized then this step will be also considered.

[10 marks]

Task C. [20 marks] Classification Task

Pick (and motivate your choice) an appropriate classifier (e.g. multilayer perceptron, decision tree, linear regression, Naïve Bayes etc.). Choose and motivate some default parameters for the classifier: the choice of classifier and parameters, training/testing strategy (e.g. percentage split, cross-validation etc.) could be chosen by the user of your application. Add functionality for training the classifier using your training/testing strategy

[5 marks]

Develop and describe your application by making use of existing data mining algorithms (available for example as Weka classes or R scripts) for the classification/prediction problem based on the chosen data set. Justify choices for the chosen approach such as the software used, nodes/scripts etc. and their appropriateness for the chosen task.

[10 marks]

Add one functionality for visualisation of the performance of the model and briefly describe the obtained results (e.g. using performance metrics and/or comparison with literature)

[5 marks]

Task D. [20 marks] Clustering Task

Pick (and justify your choice of) an appropriate clustering algorithm to be developed (e.g. using Weka classes, R scripts, KNIME nodes) with reference to the dataset and its characteristics.

[5 marks]

Develop your application by making use of the clustering algorithm. Add functionality/ies for descriptive statistics and/or visualisations in the view of this problem including the produced clusters.

[10 marks]

Interpret results based on setting choices, initial data types and range

[5 marks]

Task E. [20 marks] Association Rule Mining Task

Discuss the applicability of association rule mining (e.g. within the context of data types, size, and type of problem) and the pre-processing steps required to use the chosen dataset. Select and algorithm and justify the choice.

[5 marks]

Develop and describe your application by making use of existing algorithms (e.g. Weka classes, R scripts, KNIME nodes) for association rule mining, including choice of parameters. Present results in an appropriate format.

[10 marks]

Interpret the results and provide a critical analysis of key findings including interpretation of significant rules.

[5 marks]

Task F. [20 marks] Text Mining Task

Discuss the applicability of text mining to the chosen data (e.g. within context of data types, and type of problem) and propose an appropriate approach/algorithm.

[5 marks]

Develop your application by making use of existing algorithms (e.g. Weka classes, R scripts, KNIME workflows) for text mining. Describe results and present them in an appropriate format..

[10 marks]

Discuss and analsyse results within the context of the chosen dataset and the insight they provide into the data.

[5 marks]

Task G. [20 marks] Critical Review

Describe difficulties using the tools/techniques as above. Provide reflections focused on technical, interpretational and functional issues, and the steps/approach you took to overcome these challenges.

[5 marks]

Document your observations on the case studies and results. Present conclusions deduced/induced from each result. Describe and explain which techniques were most helpful to evaluate, explore and analyse the dataset. Describe how techniques were compared. Document techniques/activities that you could/should perform in the future to continue this work.

[10 marks]

Discuss the results of the case studies you developed for each technique and any interesting observations by comparing them with published work (use journal papers, conference proceedings, books and online resources available).

[5 marks]

Order now and get 10% discount on all orders above $50 now!!The professional are ready and willing handle your assignment.

ORDER NOW »»