Thursday, January 11, 2018

Handling Data And What To Take Note - Survival Guide (Part 2)



Let me remind everyone that data engineering is not equal to data science but both are part of "Big Data". This article mainly focuses on data engineering and how to store data to be more useful for analysis.



The process of storing data into a single place is called warehousing and data warehousing is within the scope of data engineers. Preparing data so it gets served when it's needed is crucial for data-driven companies, thus, making sure nothing is missed and messed up is the top priority in performing data engineering.

The realm of databases:
Modern applications are relying heavily on databases to store information.

Before even starting to build your platform, it is necessary for developers to evaluate which database they should be using. As database plays an important role not only for data storage but also integration, missing the key element (which is optimization and scalability) on the implementation will bring a negative impact to the entire operation.

After the selection process, what you need to be aware of?
  • Normalize Data - Not unless you have enough computing power, always normalize the data before you store it in the database. This is to make sure that records are "uniform" as it gets warehoused.
  • Treat "numbers" correctly - When you store "numbers" in the database, make sure to classify the representation. Numbers can be in a form of money, coordinates, occurance and etc.. Set the data types accordingly.
  • Single format for "dates" - There will be times that on a database, multiple tables handle "dates" and if those tables are not created at the same time, setting the "date" data type might be different (normally caused by negligence, lack of documentation or wrong documentation).
  • Using TEXT over VARCHAR(xxx) - While there's nothing wrong with using VARCHAR, it's important that one should be aware of limitations and usage. Say you have a field named "notes" or "reminder" and you set it to VARCHAR(255). If a user is very explicit about his "writing" and jots all the details, since your field only accepts 255 characters -- who's to be blamed for the lost data? The user who's too keen on the details? The application that only receives 255 character? The developer who sets the limitation to the field?
  • Be careful of using "id"- While "id" is human-readable, the time it gets stored in the database -- it then causes someone confusion. Be more specific when you store IDs (name it like internal_app_id or app_id and avoid the naming convention id1 or id_1).
  • Follow the basics - Do not name any field with a "reserved keyword". Guidelines are set to be followed and not to be ignored. This is the best part in  learning your database of choice.
  • Support JSON - modern databases including SQL supports JSON. This is powerful when used properly and accordingly. At the same time, if data is not well presented/structured, it'll make your record a whole lot of junk.

The realm of files/documents:
Support legacy systems too! Flexibility is the key to every modern application.

Storing data in a file/doc format is still rampant even in today's web era. Not because of lack of innovation but because of complexity and overheads. Every data is valuable and so every source (ie. CSV, EXEL, XML) should be supported.

If you plan to support these on your platform, what you need to be aware of?
  • Increased limit - Most legacy systems keep everything in a single file. The file might be around >10G in size and your app might timeout while in the process of acquiring the data from the source "file/doc".
  • Parsing Mechanism - It's vital to have a safe play when supporting files. Tell your application when to treat a "blank" as null and "blank" as "double quotes". Tell your application when to remove a space and when to leave it as is.

The realm of data analytics:
While databases are a great place to store data, it's not substantial to address massive query being batched by "analytics tools". Especially those that supports real-time data streaming.

Setting up a database engine to run data analytics tasks, has been made easier by cloud providers. Players like Google's Spanner, Amazon's Redshift, and Azure's Data Warehouse have been widely used by many and has been widely supported by most of the analytics tool providers.

In using a cloud solution, what you need to be aware of?

  • Check for data integrity - Upon syncing the data from the database engine (ie. Postgres/MySQL) to these cloud services, it's important that data remains as it is -- in terms of structure, size, and format. Using native applications like what Amazon's Data Migration Service to migrate RDS data to Redshift is an advantage (rather doing it manually or via 3rd party tools).

I've seen many lapses as I perform data migration. I hope these pointers will matter to you, the next time you setup a data storage for your application.

    Saturday, January 6, 2018

    It's All About Self Motivation

    What can I become with what I have?

    This is the question that keeps me rolling since the very moment my eyes opened to the world of -- hardships, struggle, short-comings, pain, rejections, disrespect and humilation.

    Everyone has their story to tell, their story to sugar coat, their story to mask, their story to embrace. Every time I look back and see myself in the mirror, I can't hide the fact that I still feel the "impostor syndrome". Almost 9 years in the IT Industry; 2 years of call center experience, 4 years of system administration and 3 years of devops -- sums up my skill sets. While on the other hand, 25 years of hustling; 3 years of staying in an orphanage, 1 year of schooling (ended a dropout), 7+ years of learning the street language and 5 years of being a father -- sums up my attitude.

    No matter how you look at life, I say the proper way of looking it will always be moving forward. Which tells everyone that life doesn't care of what you are today and who you were way back, because by tomorrow it'll all be part of the past. Life is all about what you want to become and how bad you want it.

    People always appreciate you when they are (1) amazed, (2) thankful, (3) motivated, (4) inspired by your acts, your words or your ideas. I have no rights to tell you which is the path to greatness but I know how you could get started with your journey. I personally applied this myself, so if you see me as someone who is successful (which means it somehow works), then it should work for you too.

    Again, life is all about what you want to become and how bad you want it. On top of this, you should be aware that there's no such thing as "something for nothing". So you should be willing to sacrifice whatever it takes to achieve what you've been wanting.



    Don't give unacceptable reasons
    Excuses is a big "no no" but most of us loves to reason-out.

    Here's one scenario I hope will open your eyes to opportunities. The "→" represents me telling you what your option is.

    I cannot learn programming because:
    1. I have no internet connection on our place → Use office resources
    2. I don't have a computer/laptop → Use your smart phone
    3. I have no smart phone → Read books
    4. I don't have a book / can't borrow → Print out e-books
    5. I have no money for the print out → Go back to #1 (Use office resources)
    NOTE: If you can't use the company resources, ask someone to do a printout for you. Out of your friends, I am sure there's someone who can pledge for that.

    Resources for learning "programming" is already in the internet. Utmost of them are feel. It only requires you to invest one thing to learn programming and that's your "time".


    Imagination is the key
    Einstein said "Logic gets you from A to Z, imagination gets you everywhere".

    As you start reading, there will be times that you see yourself lost rather than being enlighten. Well, that's normal! Because the things your reading is something you don't have any experience.

    Prepare all your theory, gather as many as you can because in the "application" state, that's where you test which one is right and from those correct  theories which one is best. In the application state, that's where you ask people about your questions and doubts.


    Simplify what you've learn
    Einstein said "If you cannot simply explain it, you don't understand it well".

    The test of knowledge will not be what you've read or what you've known. Acquiring the learning is just a half of knowledge, imparting it is the other. Well you might ask, why should I simplify what I know for others? It's for them to understand the "thing" the way you understand it. Verify your knowledge from those whom you teach.


    Always give back
    Einstein said "Don't be a man of success, be a man of value"

    For what you've learn is yours, it's always good to empty your cup all the time. Give your learning to those who wanted it, invest in others so they will do the same. Learning is a continuous process and you'll never runout of topic in your lifetime. 


    Don't be the guy who knows-it-all
    When Einstein was asked "How does it feel to be the smartest person alive?". He replied "Ask Nicola Tesla".

    Even the smartest will not claim that he is. Don't think your smart enough, don't think your good enough, don't think your tough enough. Life has so many aspect and there's always that "someone" who is better than you.



    I hope this gives you the fuel you need to jumpstart your career growth, personal development and goal-centric life for 2018. Be better each day, not by comparing yourself to others but who you were yesterday.

    Monday, January 1, 2018

    Reminders And Updates - A UX/UI Note On Security

    The predection for 2018 is all about security and the implementation of AI/ML/DL to enhance the counter measures for different exploits and threats.

    While security is a broad topic and is been innovated throughout the entire web2.0 era. For this modern web days, it is vital that "users" takes part of the responsiblity of securing their information in the public domains (internet/web).

    Providers should be aware of this...
    Not only limited to acquiring the information from the users but also making sure that the data on the records are up-to-date.

    What are some of the approach you can use?


    Experiment #1:
    Trying not to bug and annoy your users, it's important that you flag an alert or notification in an event basis. This way, you'll be able to send your users your personal note and envelop a probing question.



    Credits to: https://dribbble.com/shots/1315388-Dashboard-Web-App-UI-Job-Summary


    If you're somewhat minimalist, you can try adding a symbol that catches ones attention. Applying a "mouse hover" function that enables a bubble text to pop-up asking the same question.



    Credits to: https://dribbble.com/shots/1315388-Dashboard-Web-App-UI-Job-Summary


    Experiment #2:
    Amazon is very good at this. If you've seen the AWS Dashboard (Console), under IAM Service, there's a section where they tell you how old is the credential(s) attached to a particular account -- giving users/admin the heads-up of what needs to be done.






    Experiment #3:
    Who says email is dead? For something this important, sending an email to users will be more appreciated than none. Just make sure you use proper wordings and explain briefly what the email is all about.





    In the modern web, security is a shared responsibility. Providers should be the one initiating what needs to be done and making sure users will take part of it. There's no perfect system but there is a -- somehow perfect security protocol.

    Remember the basics in security "A chain is only as strong as its weakest link."