Sample of college admission paper: Data mining

This is an accounting calculation, followed by the application of a threshold. However, predicting the profitability of a new customer would be data mining. Dividing the customers off company according to their profitability. Yes, this is a data mining task because it requires data analysis to determine who the costumers are that brings more business to the company. Computing the total sales of the company. No, this is not a data mining task because there Is not analysis involve, this information can be pull out of any booking program. Sorting a student database based on student ID numbers.No, this Is not a data milling activity because sorting by ID numbers doesn't Involved any data mining task. This is a simple database query Predicting the future stock price of a company using historical records. Yes. We would attempt to create a model that can predict the continuous value of the stock price. This is an example of the area of data mining known as predictive modeling. We could use regression for this modeling, although researchers in many fields have developed a wide variety of techniques for predicting time series. Monitoring the heart rate of a patient for abnormalities. Yes.We would build a model of the normal behavior of heart rate and raise an alarm when an unusual heart behavior occurred. This would involve the area of data mining known as anomaly detection. This could also be considered as a classification problem If we had examples of both normal and abnormal heart behavior. For each of the following, identify the relevant data mining task(s): The Boston Celtic would like to approximate how many points their next opponent will score against them. A military intelligence officer is interested in learning about the captives proportions of Sunnis and Shies in a particular strategic region. A NORA defense computer must decide immediately whether a blip on the radar is a flick of geese or an incoming nuclear missile. A political strategist is seeking the b est groups to canvass for donations in particular county. A homeland security official would like to determine whether a certain sequence of financial and residence moves implies a tendency to terrorist acts. A Wall Street analyst has been asked to find out the expected change in stock price for a set of companies with similar price/earnings ratios.Question 3 For each of the following meetings, explain which phase in the CRISP-DIM process is represented: Managers want to know by next week whether deployment will take place. Therefore, analysts meet to discuss how useful and accurate their model is. This is the Evaluation phase in the CRISP-DIM process. In the evaluation phase the data mining analysts determine if the model and technique used meets business objectives established in the first phase. The data mining project manager meets with data warehousing manager to discuss how the data will be collected. This is theData Understanding phase in the CRISP-DIM process. The data wareh ouse is identified as a resource during the Business Understanding phase; however the actual data collection takes place during the Data Understanding Phase. In this phase data is collected and accessed from the resources listed and identified in the Business Understanding phase. The data mining consultant meets with the vice president for marketing, who says that he would like to move forward with customer relationship management. The main objective of business is to review during the Business Understanding Phase.So, therefore after the meeting it seems the data mining consultant gained success in convincing UP of marketing to provide approval for performing data mining on the customer relationship management system. The data mining project manager meets with the production line supervisor to discuss implementation of changes and improvements. The discussion of implementation of changes and improvements in the project whether specific improvements or process changes are required to ensure that all important aspects of the business are accounted is performed under the Evaluation Phase.The meeting held with business objective to collect and cleanse the data to ensure the quality of data. The analysts meet to discuss whether the neural network or decision tree model should be applied Question 4 [10 points] Describe the possible negative effects of proceeding directly to mine data that has not been preprocessed. Before data mining algorithms can be used, a target data set must be assembled. As data mining can only uncover patterns actually present in the data, the target data set must be large enough to contain these patterns while imagining concise enough to be mined within an acceptable time limit.A common source for data is a data mart or data warehouse. Pre-processing is essential to analyze the multivariate data sets before data mining. The target set is then cleaned. Data. Question 5 [1 5 points] Which of the three methods for handling missing values do you prefer? Which method is the most conservative and probably the safest, meaning that it fabricates the least amount of data? What are some drawbacks to this method? Methods for replacing missing field values with: User defined constants Means or modesRandom draws from the distribution of the variable Question 6 Describe the differences between the training set, test set, and validation set. The training set is used to build the model. This contains a set of data that has fricasseed target and predictor variables. Typically a hold-out dataset or test set is used to evaluate how well the model does with data outside the training set. The test set contains the fricasseed results data but they are not used when the test set data is run through the model until the end, when the fricasseed data are compared against the model results.The model is adjusted to minimize error on the test set. Another hold-out dataset or validation set is used to evaluate the adjusted model in step #2 where, a gain, the validation set data is run against the adjusted model and results compared to the unused fricasseed data. The training set (seen data) to build the model (determine its parameters) and the test set (unseen data) to measure its performance (holding the parameters constant). Sometimes, we also need a validation set to tune the model (e. G. , for pruning a decision tree). The validation set can't be used for testing (as it's not unseen). Data mining This is an accounting calculation, followed by the application of a threshold. However, predicting the profitability of a new customer would be data mining. Dividing the customers off company according to their profitability. Yes, this is a data mining task because it requires data analysis to determine who the costumers are that brings more business to the company. Computing the total sales of the company. No, this is not a data mining task because there Is not analysis involve, this information can be pull out of any booking program. Sorting a student database based on student ID numbers.No, this Is not a data milling activity because sorting by ID numbers doesn't Involved any data mining task. This is a simple database query Predicting the future stock price of a company using historical records. Yes. We would attempt to create a model that can predict the continuous value of the stock price. This is an example of the area of data mining known as predictive modeling. We could use regression for this modeling, although researchers in many fields have developed a wide variety of techniques for predicting time series. Monitoring the heart rate of a patient for abnormalities. Yes.We would build a model of the normal behavior of heart rate and raise an alarm when an unusual heart behavior occurred. This would involve the area of data mining known as anomaly detection. This could also be considered as a classification problem If we had examples of both normal and abnormal heart behavior. For each of the following, identify the relevant data mining task(s): The Boston Celtic would like to approximate how many points their next opponent will score against them. A military intelligence officer is interested in learning about the captives proportions of Sunnis and Shies in a particular strategic region. A NORA defense computer must decide immediately whether a blip on the radar is a flick of geese or an incoming nuclear missile. A political strategist is seeking the b est groups to canvass for donations in particular county. A homeland security official would like to determine whether a certain sequence of financial and residence moves implies a tendency to terrorist acts. A Wall Street analyst has been asked to find out the expected change in stock price for a set of companies with similar price/earnings ratios.Question 3 For each of the following meetings, explain which phase in the CRISP-DIM process is represented: Managers want to know by next week whether deployment will take place. Therefore, analysts meet to discuss how useful and accurate their model is. This is the Evaluation phase in the CRISP-DIM process. In the evaluation phase the data mining analysts determine if the model and technique used meets business objectives established in the first phase. The data mining project manager meets with data warehousing manager to discuss how the data will be collected. This is theData Understanding phase in the CRISP-DIM process. The data wareh ouse is identified as a resource during the Business Understanding phase; however the actual data collection takes place during the Data Understanding Phase. In this phase data is collected and accessed from the resources listed and identified in the Business Understanding phase. The data mining consultant meets with the vice president for marketing, who says that he would like to move forward with customer relationship management. The main objective of business is to review during the Business Understanding Phase.So, therefore after the meeting it seems the data mining consultant gained success in convincing UP of marketing to provide approval for performing data mining on the customer relationship management system. The data mining project manager meets with the production line supervisor to discuss implementation of changes and improvements. The discussion of implementation of changes and improvements in the project whether specific improvements or process changes are required to ensure that all important aspects of the business are accounted is performed under the Evaluation Phase.The meeting held with business objective to collect and cleanse the data to ensure the quality of data. The analysts meet to discuss whether the neural network or decision tree model should be applied Question 4 [10 points] Describe the possible negative effects of proceeding directly to mine data that has not been preprocessed. Before data mining algorithms can be used, a target data set must be assembled. As data mining can only uncover patterns actually present in the data, the target data set must be large enough to contain these patterns while imagining concise enough to be mined within an acceptable time limit.A common source for data is a data mart or data warehouse. Pre-processing is essential to analyze the multivariate data sets before data mining. The target set is then cleaned. Data. Question 5 [1 5 points] Which of the three methods for handling missing values do you prefer? Which method is the most conservative and probably the safest, meaning that it fabricates the least amount of data? What are some drawbacks to this method? Methods for replacing missing field values with: User defined constants Means or modesRandom draws from the distribution of the variable Question 6 Describe the differences between the training set, test set, and validation set. The training set is used to build the model. This contains a set of data that has fricasseed target and predictor variables. Typically a hold-out dataset or test set is used to evaluate how well the model does with data outside the training set. The test set contains the fricasseed results data but they are not used when the test set data is run through the model until the end, when the fricasseed data are compared against the model results.The model is adjusted to minimize error on the test set. Another hold-out dataset or validation set is used to evaluate the adjusted model in step #2 where, a gain, the validation set data is run against the adjusted model and results compared to the unused fricasseed data. The training set (seen data) to build the model (determine its parameters) and the test set (unseen data) to measure its performance (holding the parameters constant). Sometimes, we also need a validation set to tune the model (e. G. , for pruning a decision tree). The validation set can't be used for testing (as it's not unseen). Data Mining Determine the benefits of data mining to the businesses when employing 1. Predictive analytics to understand the behavior of customers Predictive analytics is business intelligence technology that produces a predictive score for each customer or other organizational element. Assigning these predictive scores is the job of a predictive model, which has, in turn been trained over your data, learning from the experience of your organization. Predictive analytics optimizes marketing campaigns and website behavior to increase customer responses, conversions and clicks, and to decrease churn. Each customer's predictive score informs actions to be taken with that customer. 1. Associations discovery in products sold to customers The way in which companies interact with their customers has changed dramatically over the past few years. A customer's continuing business is no longer guaranteed. As a result, companies have found that they need to understand their customers better, and to quickly respond to their wants and needs. In addition, the time frame in which these responses need to be made has been shrinking. It is no longer possible to wait until the signs of customer dissatisfaction are obvious before action must be taken. To succeed, companies must be proactive and anticipate what a customer desires. For an example in the old days, the storekeepers would simply keep track of all of their customers in their heads, and would know what to do when a customer walked into the store. TodayÃ¢â‚¬â„¢ store associates face a much more complex situation, more customers, more products, more competitors, and less time to react means that understanding your customers is now much harder to do. A number of forces are working together to increase the complexity of customer relationships, such as compressed marketing cycles, increased marketing costs, and a stream of new product offers. There are many kinds of models, such as linear formulas and business rules. And, for each kind of model, there are all the weights or rules or other mechanics that determine precisely how the predictors are combined. In fact, there are so many choices, it is literally impossible for a person to try them all and find the best one. Predictive analytics is data mining technology that uses the companyÃ¢â‚¬â„¢s customer data to automatically build a predictive model specialized for the business. This process learns from the organization's collective experience by leveraging the existing logs of customer purchases, behavior and demographics. The wisdom gained is encoded as the predictive model itself. Predictive modeling software has computer science at its core, undertaking a mixture of number crunching, trial, and error. 2. Web mining to discover business intelligence from Web customers The fast business growth has made both business community and customers face a new situation. Due to intense competition on the one hand and the customer's option to prefer from a number of alternatives, the business community has realized the essential of intelligent marketing strategies and relationship management. Web servers record and accumulate data about user relations whenever requirements for resources are received. Analy zing the Web access logs can help understand the user behavior and the web structure. From the business and applications point of view, knowledge obtained from the web usage patterns could be directly applied to efficiently manage activities correlated to e-business, e-services and e-education. Accurate web usage information could help to attract new customers, retain current customers, improve cross marketing/sales, effectiveness of promotional campaigns, tracking leaving customers etc. The usage information can be exploited to improve the performance of Web servers by developing proper perfecting and caching strategies so as to decrease the server response time. User profiles could be built by combining users? navigation paths with other data features, such as page viewing time, hyperlink structure, and page contentÃ¢â‚¬ , according to Sonal Tiwari. 3. Clustering to find related customer information Clustering is a typical unsupervised learning technique for grouping similar data points. A clustering algorithm assigns a large number of data points to a smaller number of groups such that data points in the same group share the same properties while, in different groups, they are dissimilar. Clustering has many applications, including part family formation for group technology, image segmentation, information retrieval, web pages grouping, market segmentation, and scientific and engineering analysis. Many clustering methods have been proposed and they can be broadly classified into four categories such as partitioning methods, hierarchical methods, density-based methods and grid-based methods. Customer clustering is the most important data mining methodologies used in marketing and customer relationship management (CRM). Customer clustering would use customer-purchase transaction data to track buying behavior and create strategic business initiatives. Companies want to keep high-profit, high-value, and low-risk customers. This cluster typically represents the 10 to 20 percent of customers who create 50 to 80 percent of a company's profits. A company would not want to lose these customers, and the strategic initiative for the segment is obviously retention. A low-profit, high-value, and low-risk customer segment is also an attractive one, and the obvious goal here would be to increase profitability for this segment. Cross-selling (selling new products) and up-selling (selling more of what customers currently buy) to this segment are the marketing initiatives of choice. Assess the reliability of the data mining algorithms. Decide if they can be trusted and predict the errors they are likely to produce. Most methods for validating a data-mining model do not answer business questions directly, but provide the metrics that can be used to guide a business or development decision. There is no comprehensive rule that can tell you when a model is good enough, or when you have enough data. Accuracy is a measure of how well the model correlates an outcome with the attributes in the data that has been provided. There are various measures of accuracy, but all measures of accuracy are dependent on the data that is used. In reality, values might be missing or approximate, or the data might have been changed by multiple processes. Particularly in the phase of exploration and development, you might decide to accept a certain amount of error in the data, especially if the data is fairly uniform in its characteristics. For example, a model that predicts sales for a particular store based on past sales can be strongly correlated and very accurate, even if that store consistently used the wrong accounting method. Therefore, measurements of accuracy must be balanced by assessments of reliability. Reliability assesses the way that a data-mining model performs on different data sets. A data-mining model is reliable if it generates the same type of predictions or finds the same general kinds of patterns egardless of the test data that is supplied. For example, the model that you would use to generate for the store that used the wrong accounting method would not generalize well to other stores, and therefore would not be reliable. Analyze privacy concerns raised by the collection of personal data for mining purposes. 1. Choose and describe three (3) concerns raised by consumers. Recent surveys on privacy show a great concern about the use of personal data for purposes other than the one for which data has been collected. The handling of misinformation can cause serious and long-term damage, so individuals should be able challenge the correctness of data about themselves, such as personal records. The last concern is granulated access to personal information, such as personal information about someoneÃ¢â‚¬â„¢s health when applying for a job. 2. Decide if each of these concerns is valid and explain your decision for each. These concerns are valid, the first concerned mentioned caused an extreme case to occurred in 1989, collecting over $16 million USD by selling the driver-license data from 19. million Californian residents, the Department of Motor Vehicles in California revised its data selling policy after Robert Brado used their services to obtain the address of actress Rebecca Schaeffer and later killed her in her apartment. While it is very unlikely that KDDM tools will reveal directly precise confidential data, the exploratory Knowledge Discovery and Data Mining (KDDM), tools may correlate or dis close confidential, sensitive facts about individuals resulting in a significant reduction of possibilities. The second concern is valid due to incident happening in Washington; Cablevision fired an employee James Russell Wiggings, on the basis of information obtained from Equifax, Atlanta, about Wiggings' conviction for cocaine possession; the information was actually about James Ray Wiggings, and the case ended up in court. This illustrates a serious issue in defining property of the data containing personal records. The third issue is For example, employers are obliged to perform a background check when hiring a worker but it is widely accepted that information about diet and exercise habits should not affect hiring decisions. . Describe how each concern is being allayed. KDDM revitalizes some issues and possess new threats to privacy. Some of these can be directly attributed to the fact that this powerful technique may enable the correlation of separate data sets in other to significantly reduce the possible values of private information. Other can be more attributed to the interpretati on, application and actions taken from the inferences obtain with the tools. While this raises concerns, there is a body of knowledge in the field of statistical databases that could potentially be extended and adapted to develop new techniques to balance the rights to privacy and the needs for knowledge and analysis of large volumes of information. Some of these new privacy protection methods are emerging as the application of KDD tools moves to more controversial datasets. Provide at least three (3) examples where businesses have used predictive analysis to gain a competitive advantage and evaluate the effectiveness of each businessÃ¢â‚¬â„¢s strategy. The first advantage analysis helps when it comes to validity of a product by making a distinction between the positioning of a product and its ability to satisfy customer requirements. Another important attributes include ease of use, innovation, how well the product integrates with other technologies that customers need. The second advantage is the technology provides to customers. Even if a product is well designed, it must be able to help businesses achieve their business goals. Goals range from gaining insight about customers in order to be more competitive, to using the technology to increase revenue. A key attribute that is measured in this dimension is how well the product supports companies in meeting their objectives. The third advantage is the strength of the companyÃ¢â‚¬â„¢s strategy. It is not enough to simply have a good vision; a company must also have a well-designed road map that can support this vision. Vision attributes also include more tactical aspects of the companyÃ¢â‚¬â„¢s strategy such as a technology platform that can scale, well-articulated messaging, and positioning. A key component of this dimension is clarity: it must be clear what business problem the company is solving for which customer.ReferencesAlexander, D. (2012). Data Mining. Retrieved from: http://www.laits.utexas.edu/~anorman/BUS.FOR/course.mat/Alex/#8Josh, K. (2012). Analysis of Data Mining Algorithms. Retrieved from: http://www-users.cs.umn.edu/~desikan/research/dataminingoverview.html Exforsys. (2006). Execution for System: Connection between Data Mining and Customer Interaction. Retrieved from: http://www.exforsys.com/tutorials/data-mining/the-connection-between-data-mining-and-customer-interaction.html Frand, J. (1996). Data Mining: What is Data Mining? Retrieved from: http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/index.htm Pupo, E. (2010). HIMSS News: Privacy and Security Concerns in Data Mining. Retrieved from: http://www.himss.org/ASP/ContentRedirector.asp?type=HIMSSNewsItem&ContentId=73526 Stein, J. (2011). Data Mining: How Companies Now Know Everything About You. Retrieved from: http://www.time.com/time/magazine/article/0,9171,2058205,00.html#ixzz25MwYNhuh

Sample of college admission paper

.

Thursday, August 1, 2019

Data mining

No comments:

Post a Comment