My Journey as a Data Scientist / IT Engineer

It was long ago, when I was a curious kid on how the world works.  I like the science experiments in school, watched the Beakman’s world.  There are times that I’m thinking what will it be if no one has invented the light bulb, the engine motor, and many other things like the cellphone and computers.

On my journey, I had met a computer when I was in Elementary level.  My uncle left me a computer with Windows 3.0 on it, I tried exploring it, used Microsoft Office, read a book about DOS (Disk Operating System) and saved tons of diskettes with Microsoft Word Documents, Powerpoints, and played with Visual Basic for Application.  During my highschool, I met two programmers.  One who taught me and showed me DOS-based programming languages like C, Pascal, QBASIC, DOS and Visual Studio especially Visual Basic.  I read a book that they are using Teach Yourself Visual Basic in 21 Days, it’s fun and easy to understand so I tried creating some of its contents.  Wrote a couple of DOS batch files program.

On my 4th year highschool, I tried to take the examination for IT Engineers.  I just tried it, but since I have no mentor, I got a fairly almost passing score for IT Engineer examination, a Japanese standard certification given by JITSE (Japanese IT Standards Examination) now PhilNITS (Philippine National Information Technology Standards) which is under ITPEC (IT Professional Examination Council) headed by JITEC (Japan IT Engineers Examination Center).  One of the youngest taker at the age of 16.  After some years after taking up BS IT (college) I successfully passed the exam at the age of 20.

I was sent in Yokohama, Japan to undergo for the Training Program for IT Engineers in Asian Countries in 2007.  The youngest in the batch at the age of 20.  My expertise is COBOL (Common Business Oriented Language).  Learned Linux, JSP and other technologies including the Japanese culture.

I’ve helped OLPC (One Laptop Per Child) community to grow here in the Philippines as a Project Manager, today e-Kindling is handling the project, and they have deployed a pilot OLPC in some parts of the Philippines.

On 2010 I was sent again to undergo for Bridge Systems Engineers training (coordinating Systems Engineer) in Tokyo.

In the past, I used to do the full Software Development Life Cycle as a Systems Engineer.  I have done some work as a Network Surveillance engineer too in a contact center.  Lately, I worked with a social media site Friendster.com too as Systems Administrator.  My stay in Friendster have helped me learn more of monitoring systems, automating, testing and Quality Assurance work on systems.

I’ve helped some organizations as a consultant too in the past.

I’ve learned how to cook servers using Chef and other tools, linux, python, perl.  From there I started to love data, from MySQL to MongoDB, to CouchDB, using logs, AWS cloud services, Munin, Nagios.

After some time I’ve been interested to learn R, but then it led me to learn the Data Science Specialization from Johns Hopkins University through Coursera.  I took several courses on Data Analysis, Analytics, Process Mining and more.

Today, my learnings, combined with my experience with systems engineering, data, consulting and things I’ve learned from Marketing and Business school (MBA) helped me to become a Data Scientist / IT Engineer.  I also learned Online Marketing and Virtual Profession, so I know how to work remotely well from a guru in the Philippines named Jomar Hilario.

Nowadays, I’m exploring to learn more Mathematics from Pure to Applied, doing some research, Genomics, BioInformatics and other applications of Data Science, Science and Technology.

https://sway.com/MDvT9CFp9a1bvIvu – Sway Data Scientist Ri
https://sumry.me/wenmi01 – Sumry.me Rowen

  • Rowen Remis R. Iral

Is it OK To Allow Data Scientists / IT Engineers to Work from Home

hexane-631755_640[1]

Operating a company or having a business needs some technical expertise that you cannot otherwise seek around.  The technological talents and skills are sometimes hard to find or at some point need to be done when hiring cycles would not help fix your problem.  Through technology, work can be done remotely from anywhere, work from home, Virtual Assistant (VA) jobs, road warriors and mobile workers or telecommuters are a few to name.  With the equal opportunity of flattening the world through technological advancement on communications and collaboration these can help your business or company to help you and support you, also minimizing overhead on office space costs.

Selecting the best talent is the best guide to make your search for the remote worker or remote expert that you are looking.  Consultants on these field are mostly checked by their resume and output of their work portfolios so you know what quality of work you can expect.  Remote working really helped most businesses including online marketers and others.  Now, even works on appointment setting, writing, graphics design, social media management, mail, booking, telemarketing, programming or development, systems engineering, systems administration or DevOps can be done remotely.

Through the trend of needs for analytics talents, Data Science field have a lot of scientists out there that are also ready to be tapped, they are just around, but mostly are in a remote areas.  It is hard to find experts like Data Scientist, Analysts, and IT / Systems Engineers, but if you would look around, you can even find them online.  The quality of work can also be good, and if you want them to be better for helping you more, let them take courses, pay for their education, since online courses made quality education reachable as long as you are connected to the internet and submit your work online.

Allow knowledge workers to work online, they give them the flexibility, cognitive ability and quality time and freedom, which makes it good to let Data Scientist / Engineers remotely or let them work from home.

Do you need some Data Science for Day Service? Contact me now.

Finding the Unicorn Data Scientist

Nowadays, as more shifters or career shifters are around there are people who declared themselves as Data Scientist.  These abundance are somewhat we can call as raw skills.  They may possess skills at some level but it is not yet enough.  In order for us to beat the uncertainty, let us see what it takes to be a data scientist. In order to be a Data Scientist, there are 3 major skills, the Computer Science / Hacking skills, Mathematics & Statistics skills and Substantive knowledge like research, business, medicine, biostatistics and other domain skills and knowledge. Machine learning also come into play which is part of Artificial Intelligence, it is where we model the world in order to be able to forecast and predict information.  Knowledge of programming, especially functional programming, database systems and big data is necessary for the technical skills. Statistics and Mathematics skills and knowledge are also needed in order to be a Data Scientist to have the expertise on looking and solving problems and expressing them in figures and numbers.  With these the data scientist can model the world in numbers or they are able to quantify the world that we are living in in order to infer analysis. Substantive knowledge on the other hand will come into skills with research, business, biostatistics or medicine and other fields which may need some knowledge of the problem being observed and analyzed through the use of data. Other skills needed for Data Scientist is the ability to present his work to non-technical people and visualize the findings so as to make the relay of information in simpler forms so that everyone can get a grasp of the information being communicated. Unicorn Data Scientists are those that have accidentally gained different skills.  In my case, I am a certified IT engineer, have been skilled in Marketing and Digital Marketing, took up Master’s Degree in Business Administration (MBA), been doing Solutions Architecture (Cloud) and other sort of researches like in the biotechnology research on BioPhotonics, bioinformatics and more.  A vast array of experience and exposure to different fields and being able to integrate or having a look at different stages of the industry, business, medical field, sciences, arts and technology is needed in order to be a Unicorn Data Scientist. It is somewhat hard to find them, and most them are either Master’s Degree or Doctor’s Degree (or PhD) holders.  They have a mixed nut in marketing and economics too, and have been around in Research and Development projects, some have experienced multi-shifted career experience, and others are life-long learners that have immersed oneself in Computer Science, Math & Stats and Substantive Knowledge. Unicorn Data Scientists are products of experience, sometimes not of a well lined career path but a path wherein these people have skills in mixed and different fields.  Most of them are fast learners and have even looked into Systems or processes which is one of the applications of data science.  Even in business processes not just data can we apply data science and engineering. If you are lucky to find a Unicorn Data Scientist, be sure to take care of them as they are mostly rare and hard to find.

Getting and Cleaning Data – A Data Scientist’s Perspective

Getting and Cleaning Data – A Data Scientist’s Perspective

Getting cleaning is easy with the available tools that we have right now, and the available technology on sharing datasets. We have tools such as git, svn, zip file and more. However, although these tools are available, they are to be installed on the machine in order to read data.

Most resources that we have today are mostly compatible, but if you are to get data that is large, or something that is in the realm of security, you will have to authenticate to access data. After having the data, you will then need to inspect if the corresponding data dictionary is available, or the data was labeled if it is a csv file. There are cases where data is labeled on a per item per field, which is easy to read in some quick linear view, without having to look at the header, you will be able to know what the data is. The hard part on this that it will give you a toll later is, you will have to clean the label part. For example this line:
name: bob pet_type: cat age: 3

You will have to clean up by removing first the label, there are cases that you will have to transform the cleaned data to make use of it at later stage of analysis.

If you will write some function to process data, make sure to use low-level tools and functions. This will greatly reduce your time to process data. Also it is better to have a separate script or tool dedicated on the cleaning part, before analyzing data to save time. After that you can then create another intermediate data for use on the visualization part.

At the end of your analysis, you should be able to present your analytics report in some form such as a PDF or a presentation available to your audience. When writing reports, make sure you assume the audience is not technical and needs an easy to understand format and that there is adequate data for seeing that same insight that you want to communicate.

5 D’s of Data Science

Here are the 5D

5 Ds of Data Science

  1. Data
  2. Digitalization
  3. Description
  4. Depiction
  5. Discovery

 

Data

In data science, the most needed is the data, the observations or examples.  With this, we can describe how much, how strong, what are the value or measurement there is about a situation or a thing.  Data existed when define the description of an event or if we measure something.  This is the most important building block that we need to have in doing Data Science tasks.  With data, we are able to show quantity and quality, and this will be the basis of our equations and statistics.  We observe or sometimes use instruments or probe in order to gather data for our analysis or research.

 

Digitalization

We cannot process raw data when it is not digitized or put into a computer system or encoded into forms that can be processed.  The format is not limited to text, graphics, spreadsheets, vectors, audio, video, we can use any digital format that we like.  Through digitization, we can speed up the process of analysis and procedures being applied to gather the measures in statistics.  We can then infer from the findings of things, and we can create more insight.  Digitization makes the sharing of information easier as the data can be stored and retrieved for future use.

 

Description

Through the tools that we have, mathematical equations and statistics, we can describe the data that we have.  We can determine if assumptions are right or wrong through hypotheses that we formulate.  We can then deduce from what we have gathered, and those will help us understand more, and can guide us on the next steps on what we can do with data in order to solve a problem or understand a situation or use it to teach machines/computers. These machines in return will be put into practical use which can aid the human ability in different aspect of our lives, not limited to traffic, medicine, marketing, economics, planning, production, operations, understanding behaviors and many more.

 

Depiction

In Data Science, where use to do machine learning, we mine information, create training and testing sets, we can then depict or predict the future.  Also with visualization, we can explain what we have just found out through insights.  We can share the information available for consumption at a wide range of audience from academe, profession, medicine, science and the like.  With depiction/visualization we can help different people understand what we have just found out.  This is where data science becomes an art, a place of creativity and targeting with mass consumption.

 

Discovery

At the end of most research of a Data Scientist, a discovery from different insights is mostly been found or through the process clarity comes as the prize of hard work.  The discovery from the tasks conducted can help to predict reality, give warnings and inform the people.  Most stakeholders are the pharmaceutical company, doctors of medicine through BioStatistics and analysis, and some business or entreprise.  The information uncovered can be a great help in making future decision on improving medicine, process, product or strategy such as those used in marketing campaign, designing educational things and also providing new products/services for the benefit of the people.

Machine Learning, A Look in the Past

Before the Big Data become popular, there were at the back of Web 1.0 the machine learning of the past which utilizes Market Basket Analysis. These are very dominant in advanced e-commerce stores and online shops. The Job sites also utilized these technology before, and how did they implement it? Cookies, not those in your kitchen jar, but those text files that remembers your preferences, your visited sites and the things that you’ve clicked on the internet.

And what was that? Machine Learning, a part of the task of a so-called Data Scientists of today. Facebook analyzes all of our likes, shares, streams today, Twitter can also do it, I have even tried to do sentiment analysis of tweets using python. Google with their intelligent algorithms, Yahoo the early adopter of Hadoop for HDFS (a Big Data System). A lot of other database management systems like SQL are there used widespread. In those days, MatLab is a mostly used software, SPSS, SAS, S-Plus, and now R. Nowadays there is Pig to simplify MapReduce, the language for Hadoop management.

But who are those that have benefit from data science in the past? Amazon, the online book store have utilized data science, data mining, data analysis in order to show you the most relevant product that you can buy, they are now an online store and have even adopted into Cloud Service Provider company. Their algorithms can help upsell and show you related items to what you have already bought.

The most successful in utilizing BIg Data and Data Science is Walmart, they know how much to display on store, they know how much to carry on their inventory and they even know when you will buy your next coffee beans, sugar and even the infant milk and cereals that you consume and buy on your scheduled shopping. The likes of forecasting sales, that is why Walmart grew because of this so called business intelligence, it is data science, they use algorithms, mathematical equations, operations research tools in order to manage and understand the consumer behavior.

So the realization of Data Scientists today are thing of the past, but now, a successful e-scientist must have the skills in diverse fields (multidisciplinary-skilled) like business / marketing, economics, mathematics, statistics, operations research, some IT skills, big data and creativity. Yes, creativity, without it there will be no spark of wisdom, and this is mostly part intuition, insight and looking the world/data at a different angle to predict, to deduce and to induce.

Becoming a Data Scientist

Most probably most of you are looking into becoming a data scientist or e-scientist.   With the advent of technological advancement the way we manage data is now digital, we use computers and large storage systems to store the data that we have.  In an era of information we are very well informed that Big Data or the large data are now handle by servers in the cloud or a cluster of many computers storing and processing data.

Many of those in the BioStatistics field, Informatics, Statistics and Mathematics have the edge on the core part of the field of Data Science.  Numbers as we know are quantitative descriptions of our environment and the world we lived in.  We also used to quantify qualitative data as it was held true in the past and suggested so that we can apply mathematics in everything that we do.  Being a data scientist is a part of work wherein you have to have skills in statistics, a little of basic mathematical foundations and also the love of insights and intuition.

The creative part of being a data scientist is on the insights, data exploration and intuition.  You cannot explore an unknown data without being creative and that is part of which tells that data science is.  In data science, you are a scientist dealing with data and have the goal of achieving insights or ideas from the given set of information.  The hard part there is being able to clean out the bad part of the data and making it neat so that you can further process the information.

Also programming skill is needed, which will help you to automate some parts of your work, like applying functions or summation or complex formulas to be applied to a million data or your big data.  You have to be able to be familiar with Information Technology which is, most of your tools in e-science or data science will involve working with both commercial and open-source statistical software, programming languages, database systems and other storage systems that handles massive amount of information.

We aren’t done yet, you must be versed in systems or the topic or field you are doing research.  Most of it will not be limited to genomics, linguistics, disease prediction, medical field, and many other fields wherein you are also asked to predict.  To predict properly you must have understanding of statistics and machine learning which will give any system with the power to be an artificial intelligence power house.  Most of the current big data that we have can be used to power new robots connected to a cloud powered computer which are the works of data scientists on super computers.

Business skills is also part of data science as this will relate more to visualizations and also the profit for the stakeholders supporting your work as a data scientist.  Overall, data science is a diverse field wherein a mixture of skills is needed.

This article is helpful for looking into yourself of what type of data scientist are you, or are you a data scientist with a future since you have the basic skills needed to be into the world of data analysis, mining and science.

I fall evenly with the skills of a Data Scientist.

Related article: http://data-magnum.com/how-to-become-a-data-scientist/

analyzing the analyzers