Date with Freedom 2017 (Virtual Careers Summit)

22424369_10204417086102968_8451422142662391219_o

 

I was invited by my Marketing, Virtual Profession and Success mentor Jomar Hilario to his event Date with Freedom: Virtual Careers Summit.  It happened on October 14, 2017.  I saw a lot of Virtual Professionals and aspiring remote workers.

Yes I work remotely, but securely.  I talked about “How to Negotiate with 5 Clients”.  Everyone got interested on that topic, but the question I got is more about “What is Data Science and Analytics”.

IMG_20170919_080434

Above is my secure remote machine located on my client’s office.  I used that computer to login to my client’s client. It seems like an unmanned computer that when you turn on the screen you will see what I’m working on it remotely.

To simplify data science please see the diagram below:

This Venn Diagram (a diagram that shows overlaps) shows that data science is the intersection of Mathematics/Statistics, Computer Science / Information Technology and a Domain Expertise: Business, Research, Sciences, or Marketing.  Yes, Marketing, sounds familiar?

Let me show you what an Analyst or Data Scientist does for digital marketing industry:

I know most of you does those above.  The secret with these data is to measure or keep track of the values over time.  Be in Spreadsheets (Excel), text file, or tools like Instagram Insights, FB Insights and Google Analytics can help you understand your client’s customer.

I know most of you will say, I’m not techie, I’m not a geek nor a nerd.  You can, “Yes you can!”.  I said, “You can!”.  Did you know that businesses would like reports not in Math or numbers?  Did you know that clients would love to hear stories from their data?  Did you know that visuals, graphs, charts and INFOGRAPHICS simplifies the dissemination of information?  Does all those sounds familiar to you?  I know you do. Jomar taught those in his skills courses, story telling courses and other courses.

My analyst / data scientist colleague used to laugh about presenting a mathematical formula versus a well defined story with some visuals.  The presentation with a story and visualization or graphs are well appreciated by the client.

I heard some people say SQL is hard.  You don’t need that geek language, there are tools that can help you build that, yes like point and click, drag and drop.  Nothing is easy nor hard.  I went to the hard part since it’s my up-bringing in the past, I accidentally was exposed to this due to IT Engineering which is another story.

In the Philippines, we started out with Data Science Philippines on Google Groups, then on a Facebook Page.  Today we have Data Science Philippines on Meetup.  I’m a regular attendee of R Users Group Philippines because of the values of people there: “Humility”.  They know how hard the numbers are, but are still willing to help others to get started.

URL List of Live Meetups:
https://www.meetup.com/Data-Science-Philippines/
https://www.meetup.com/Manila-Analytics-Freelancers/
https://www.meetup.com/Visayas-Analytics-Freelancers/
https://www.meetup.com/R-Users-Group-Philippines/
https://www.meetup.com/Manila-Excel-Ninjaz-Meetup/

Thank you for all the attendees of Jomar Hilario’s 2017 Date with Freedom.

Jomar Hilario Courses: Virtual Careers Academy 

Advertisements

R BioConductor Package Basics

R BioConductor package is a packaged used in genomics, computational biology or bioinformatics.  The sequences of DNA, proteins and other information are handled by a wide array of BioConductor package.

BioConductor is a collection of packages used in BioInformatics, it is almost the same as those used in BioPython, and BioPerl.

Here is my rpubs page for R BioConductor Basics: http://rpubs.com/wenmi01/r_bioconductor_basics

 

Finding the Unicorn Data Scientist

Nowadays, as more shifters or career shifters are around there are people who declared themselves as Data Scientist.  These abundance are somewhat we can call as raw skills.  They may possess skills at some level but it is not yet enough.  In order for us to beat the uncertainty, let us see what it takes to be a data scientist. In order to be a Data Scientist, there are 3 major skills, the Computer Science / Hacking skills, Mathematics & Statistics skills and Substantive knowledge like research, business, medicine, biostatistics and other domain skills and knowledge. Machine learning also come into play which is part of Artificial Intelligence, it is where we model the world in order to be able to forecast and predict information.  Knowledge of programming, especially functional programming, database systems and big data is necessary for the technical skills. Statistics and Mathematics skills and knowledge are also needed in order to be a Data Scientist to have the expertise on looking and solving problems and expressing them in figures and numbers.  With these the data scientist can model the world in numbers or they are able to quantify the world that we are living in in order to infer analysis. Substantive knowledge on the other hand will come into skills with research, business, biostatistics or medicine and other fields which may need some knowledge of the problem being observed and analyzed through the use of data. Other skills needed for Data Scientist is the ability to present his work to non-technical people and visualize the findings so as to make the relay of information in simpler forms so that everyone can get a grasp of the information being communicated. Unicorn Data Scientists are those that have accidentally gained different skills.  In my case, I am a certified IT engineer, have been skilled in Marketing and Digital Marketing, took up Master’s Degree in Business Administration (MBA), been doing Solutions Architecture (Cloud) and other sort of researches like in the biotechnology research on BioPhotonics, bioinformatics and more.  A vast array of experience and exposure to different fields and being able to integrate or having a look at different stages of the industry, business, medical field, sciences, arts and technology is needed in order to be a Unicorn Data Scientist. It is somewhat hard to find them, and most them are either Master’s Degree or Doctor’s Degree (or PhD) holders.  They have a mixed nut in marketing and economics too, and have been around in Research and Development projects, some have experienced multi-shifted career experience, and others are life-long learners that have immersed oneself in Computer Science, Math & Stats and Substantive Knowledge. Unicorn Data Scientists are products of experience, sometimes not of a well lined career path but a path wherein these people have skills in mixed and different fields.  Most of them are fast learners and have even looked into Systems or processes which is one of the applications of data science.  Even in business processes not just data can we apply data science and engineering. If you are lucky to find a Unicorn Data Scientist, be sure to take care of them as they are mostly rare and hard to find.

Getting and Cleaning Data – A Data Scientist’s Perspective

Getting and Cleaning Data – A Data Scientist’s Perspective

Getting cleaning is easy with the available tools that we have right now, and the available technology on sharing datasets. We have tools such as git, svn, zip file and more. However, although these tools are available, they are to be installed on the machine in order to read data.

Most resources that we have today are mostly compatible, but if you are to get data that is large, or something that is in the realm of security, you will have to authenticate to access data. After having the data, you will then need to inspect if the corresponding data dictionary is available, or the data was labeled if it is a csv file. There are cases where data is labeled on a per item per field, which is easy to read in some quick linear view, without having to look at the header, you will be able to know what the data is. The hard part on this that it will give you a toll later is, you will have to clean the label part. For example this line:
name: bob pet_type: cat age: 3

You will have to clean up by removing first the label, there are cases that you will have to transform the cleaned data to make use of it at later stage of analysis.

If you will write some function to process data, make sure to use low-level tools and functions. This will greatly reduce your time to process data. Also it is better to have a separate script or tool dedicated on the cleaning part, before analyzing data to save time. After that you can then create another intermediate data for use on the visualization part.

At the end of your analysis, you should be able to present your analytics report in some form such as a PDF or a presentation available to your audience. When writing reports, make sure you assume the audience is not technical and needs an easy to understand format and that there is adequate data for seeing that same insight that you want to communicate.

ScienceOps or SciOps Tasks

Helping other businesses grow in terms of technology is mostly a need, in some start-ups and also to the growing business that wants business continuity.  Technology nowadays makes creation of services delivered at a faster rate due to the services that we have like Cloud services.  Amazon Web Services, OpenStack and many more have offered their infrastructure as a service for those technology business to help deliver high quality applications.

Users of the apps are now mobile on their laptops, they are now on smart phones, laptops, tablets, and many other gadgets ready to run apps.  Also, Apple devices are around and all those user apps are best paired with data on some server, the cloud.  The cloud works as an authentication, knowledge-base and even the data storage of these infrastructure.  Regardless of the application a good system and network infrastructure is best to have.

These infrastructures are built on top of on-premise computers/servers and even with the cloud.  Cloud services reduces costs is such a way that you don’t have to spend time waiting for arrival of servers, you can scale up and down and you can even scale automatically, yes automatic scaling.  But what is automatic scaling?  Your infra if properly managed can be designed to grow its instances/servers when there is a high need of resources and to shrink down in numbers when only few instances are needed.  All these are beneficial.

With all of these things, we have infra security.  Securing, maintaining and keeping up-to-date with the latest software, fixes and patches are needed on your cloud infra.  However, we are not sure where we are doing right or where we are missing if don’t audit.

Designing a good secured infra needs some more.  We have to monitor things in order to know downtimes, also we have to measure.  In measuring the infra performance we create metrics that checks the status or measurement of every parameters for the server.  We then have to collect this and increase the capability to analyze it.

The data scientist for this work are ScienceOps. They are mostly building good infrastructure, but that is not all.  Even after seeing what’s good in your infra, you will need to know what’s in for your business.  ScienceOps might need some background in business, marketing and strategic planning, and also the apps that you are targeting for measurements.  SciOps will then need to create analysis of your users data aside from the server logs.  With the users data, they can show you how you are fairing with the market.  They can show you if you are growing over time or just having stagnant users no longer interacting with your app.

Your marketing success can be measured and a lot more on the choices of products, services and behavior of your customers or users of your app.  With proper metrics, right analysis and more insight gains, the management team and product team can design, update and direct the product / service that you have in the right direction that satisfies your customers and service users.

5 D’s of Data Science

Here are the 5D

5 Ds of Data Science

  1. Data
  2. Digitalization
  3. Description
  4. Depiction
  5. Discovery

 

Data

In data science, the most needed is the data, the observations or examples.  With this, we can describe how much, how strong, what are the value or measurement there is about a situation or a thing.  Data existed when define the description of an event or if we measure something.  This is the most important building block that we need to have in doing Data Science tasks.  With data, we are able to show quantity and quality, and this will be the basis of our equations and statistics.  We observe or sometimes use instruments or probe in order to gather data for our analysis or research.

 

Digitalization

We cannot process raw data when it is not digitized or put into a computer system or encoded into forms that can be processed.  The format is not limited to text, graphics, spreadsheets, vectors, audio, video, we can use any digital format that we like.  Through digitization, we can speed up the process of analysis and procedures being applied to gather the measures in statistics.  We can then infer from the findings of things, and we can create more insight.  Digitization makes the sharing of information easier as the data can be stored and retrieved for future use.

 

Description

Through the tools that we have, mathematical equations and statistics, we can describe the data that we have.  We can determine if assumptions are right or wrong through hypotheses that we formulate.  We can then deduce from what we have gathered, and those will help us understand more, and can guide us on the next steps on what we can do with data in order to solve a problem or understand a situation or use it to teach machines/computers. These machines in return will be put into practical use which can aid the human ability in different aspect of our lives, not limited to traffic, medicine, marketing, economics, planning, production, operations, understanding behaviors and many more.

 

Depiction

In Data Science, where use to do machine learning, we mine information, create training and testing sets, we can then depict or predict the future.  Also with visualization, we can explain what we have just found out through insights.  We can share the information available for consumption at a wide range of audience from academe, profession, medicine, science and the like.  With depiction/visualization we can help different people understand what we have just found out.  This is where data science becomes an art, a place of creativity and targeting with mass consumption.

 

Discovery

At the end of most research of a Data Scientist, a discovery from different insights is mostly been found or through the process clarity comes as the prize of hard work.  The discovery from the tasks conducted can help to predict reality, give warnings and inform the people.  Most stakeholders are the pharmaceutical company, doctors of medicine through BioStatistics and analysis, and some business or entreprise.  The information uncovered can be a great help in making future decision on improving medicine, process, product or strategy such as those used in marketing campaign, designing educational things and also providing new products/services for the benefit of the people.