If you pick almost anyone on the street and ask what Machine Learning (ML) is, their first response would usually be in the line of saying “Machines learning to do stuff”. Be it an oversimplified definition, it is however very correct. If we wanted a more complete explanation, we would say something in the line of ML being an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience or past data without being explicitly coded. So, we can say that ML is a subset of artificial intelligence; in fact, it’s simply a technique for realizing AI. ML focuses on the development of computer programs that can access past data and use it to learn for themselves. It’s a means of training algorithms such that they can understand how to make choices. Training an ML program involves giving a lot of data to the algorithm and allowing it to learn more about the processed information. It’s should be noted that the system is not specifically programmed to perform the given task plus it is given a large enough data set, which it uses to figure out the right (or best) answer to a problem.
You might have noticed, we said: “given a large enough data set”. This was very intentional. Take humans, for example, we do not learn in a vacuum. Growing up we learned various things simply by interacting with them through trial and error, the more we repeat a certain task, the more we become familiar with it and the better we get at performing said task. The same principle is applicable when we are talking about Machine Learning. The process of learning for our ML program begins with observations or data, such as examples, direct experience, or instruction, in order to look for patterns in data and make better decisions based on the examples that we provide. The main aim here is to allow the computer to learn automatically without human or external intervention or assistance and adjust its actions accordingly.
Types of ML Algorithms
Now we know that Machine learning needs both an end goal and data for it to be of any use to us and to go about doing this we can employ various types of the ML algorithms such as:
Supervised ML algorithms refer to taking previously learned data sets and matching them against the current raw data set and filtering or producing a result to match. This is great if you are trying to figure out upcoming trends based on previous data. A few key points to note here is that there must be previously known data to work from, the algorithm already knows what the right answer is and is only running through the current problem to match into this mould.
Unsupervised ML algorithms are like searching in the dark and seeing what patterns come out. No prior information or classified/labelled data is given to the algorithm from which to make an inference from but here we try to let the program figure out what type of patterns can be gleaned from seemingly random data. Usually, this results in otherwise unexplored or unexpected outcomes. It should be noted here that there is no fixed or right answer in this type of ML algorithm, just an exercise in generating correlation in seemingly random data sets
Semi-supervised ML algorithms let's say you have a large stockpile of data, but you have been able to label or classify a small subset of it into meaningful data. IN the algorithm you indicate to the ML program what this subset is and it uses this information to try and classify the rest of the data. Show it a few pictures of cats and dogs, then feed it with lots of images of animals and it should be able to figure out which are cats, and which are dogs. This method usually has a high degree of accuracy but still needs a fair bit of processing power to get through large volumes of raw data.
Reinforcement machine learning algorithms This is a more hands-on form of ML programming where you reward or generate an error or a correct or incorrect answer. Hence over time, the algorithm strives to gain more correct answers until it reaches a point where it can very accurately generate the right result every time. Obviously, this is much more involved than the other types of Machine Learning, but it has its own advantages being that once it has a good grasp of what the developer wants, it tends to make very few errors.
Most of the above type of machine Learning algorithms might seem familiar. This is no coincidence; most ML are programmed the same way we humans learn. Self-discovery, Trial and Error, Guided tutorials, these are all methods by which humans learn so if we are building Intelligent programs, we are obviously going to model then after the only intelligent life-form we know “ourselves”. But let's not forget that not all types of ML are suitable for every situation. Proper understanding of the problem or requirements usually dictates what type of Machine Learning algorithms are to be implemented.
Pros of Machine Learning
Yet Ml comes with a lot of advantages, such as allowing us to review large volumes of data and discover specific trends and patterns that would not be apparent to humans. An example of this would be the e-commerce website Amazon, ML algorithms are used to understand the browsing behaviours and purchase histories of its users to help generate the right products, deals, and reminders relevant to each visitor. It uses this to then serve relevant advertisements to them. Also, with ML, you don’t need to babysit your project every step of the way. By giving it the ability to learn, it lets them make predictions and also improve the algorithms on their own. Same as you would find in anti-virus application, which allows them to detect new threats.
ML technology typically improves accuracy and efficiency over time thanks to the ever-increasing amounts of data that are processed and made available. This gives the algorithm an acquired “experience,” which can, in turn, be used to make better decisions or predictions. Weather Predictions, for example, are made by looking at past weather patterns and events; this data is then used to determine what’s most likely to occur in a scenario. The more data you have in your data set, the greater the accuracy of a given forecast. The same concept holds true for algorithms that are used to make decisions or recommendations. It would not be a bad idea to use Machine Learning in predicting stock prices and fashion trends and there are already a lot of people using it for just that.
Cons to Machine Learning
This brings us to talk about the drawbacks of using Machine Learning systems. Obviously, the first thing one would point out here is the ability for ML to go through a large amount of data much faster than humans could while saving on both time and resources yet providing decent results. But once you take a closer look, you would notice that there is a problem. To get a well-adjusted ML system, you must train it. It must go through a large amount of data to be able to make correct choices and getting relevant data is the major challenge. Based on different algorithms, data need to be processed before providing input to respective algorithms. This has a significant impact on results which would then need to be achieved. But that’s not all, another major challenge is the ability to accurately interpret results generated by the algorithms. Hence why you must also carefully choose the algorithms that best matches your purpose. Understanding of results is also a major challenge to determine the effectiveness of machine learning algorithms. Getting all the answers right is good but the right answers are not always the best answers.
There’s also a high level of Error Susceptibility in ML systems. An error can cause havoc within a machine learning interface, as all events subsequent to the error may be flawed, skewed or just plain undesirable. Errors do occur and it’s a susceptibility that developers have thus far been unable to premeditate and negate consistently. These errors come in a variety of forms, which vary according to the way in which you’re using machine learning technology. For instance, you might have a faulty sensor that generates a flawed data set. The incorrect data may then be fed into the machine learning program, which uses it as the basis of an algorithm update. This would cause skewed results in the algorithm’s output, when you put in junk you get out junk, this is a basic computer response. In reality, such a result could create a situation where related product recommendations are not actually related or similar. So, you might have fresh vegetables, Detergent and Sporting goods included in the same batch of “related” product recommendations when you do a simple search for a leather wallet. But for the more mission-critical environment, this simple error could be potentially deadly. A computer lacks the ability to grasp that these items are not in any way related; hence requiring human intelligence to clarify or correct.
As Machine Learning stands, it still has a long way to go and one of the avenues of advancement is what we call Deep Learning. Simply put, Deep learning is a subset under Machine Learning; in fact, it’s simply a technique for realizing machine learning. In other words, Deep Learning is what we consider to be the next evolution of machine learning. Deep Learning algorithms are roughly inspired by the information processing patterns found in the human brain, imagine taking the process by which our brain works and using that to construct a Deep Learning Algorithm. Just as we use our brains to identify objects, patterns and classify various types of information, deep learning algorithms can be taught to accomplish the same tasks for machines. So just as our brain usually tries to decipher the information it receives by achieving it through labelling and assigning the items into various categories, Deep Learning Algorithms are poised to function the same way.
For now, we should note that Machine Learning is making its way into our everyday lives and as technology grows and is able to recognize and access more data, we will be able to do much more with it. But while Machine Learning can be very powerful if used appropriately and in the right places (where massive training data sets are available), it certainly isn’t for everyone. Hence proper considerations are required before implementing it in a project or business.
For further reading you might want to take a look at the following reference materials:
1. Kevin P. Murphy's "Machine Learning A probabilistic Perspective": https://amzn.to/2oYZCYS
2. Review dataset: http://ai.stanford.edu/~amaas/data/sentiment/
3. Google regex Tutorial: https://developers.google.com/edu/python/regular-expressions
4. Natural Language Toolkit: http://www.nltk.org
5. Google Word2Vec: https://code.google.com/p/word2vec/
6. A. Aizawa. An information-theoretic perspective of tf–idf measures. Information Processing & Management: http://www.sciencedirect.com/science/article/pii/S0306457302000213