Wednesday 3 October 2018

Outreachy experience and application tips

One of the best experiences of my student life was to make it to this list:
https://wiki.gnome.org/Outreachy/2015/DecemberMarch#Accepted_Participants
(This is the list of accepted participants for the Outreachy program from December- March 2015.)

Around this time of the year, the applications have just opened for Outreachy's 2018-2019 program and I've been getting queries on how to apply and what to look for in the projects.

So, here are some quick tips and approaches you can have for the program.
PS: These are simply suggestions, go with your will and way eventually.

Before we begin with the tips, if you don't know what is Outreachy, please check out:
https://www.outreachy.org/
It is a wonderful platform to encourage those in minority representation in tech to get acquainted with open source world under the guidance of highly skilled mentors.

Open source is not just an "access specifier" for codebase, it brings along a whole different culture and attitude within the community programmers.
A culture which is inclusive and open. People you usually find here won't be snobs or the ones highly defensive of their own ideas and code, being very protective of the ideas they possess. There are no rivalries, no hidden intentions and no competition. Programmers here come together with a common motive of building on the source code they have, being open about discussions and suggestions, taking help from the community and giving it back to others in the community as the need comes.
It is one of the most liberal communities I have ever stumbled on. This is a must have life experience.
Whether via Outreachy, GSoC or just on your own, do explore this culture once.




Having said that, here are the answers to most asked questions while applying for Outreachy.

Q: What kind of project should I take up? Should I target on more than two projects?
This is highly dependent on your skill set and interests. The projects that are available are spread across different domains, some are simply frontend, plugins using JavaScript, HTML, CSS, jQuery, AngularJS, React etc, if you are someone who enjoys and understands these languages picking up projects from Mozilla etc would be great. Usually the codebase of frontend projects is not that huge, so if you are a beginner or someone who has no experience in handling huge codebases, I'd suggest you to go for it. Then there are projects highly dependent on distributed computing ,say Mesos, these require an insight into the language they are using and network programming, one should not dig deep into them if they lack interest and have no prior knowledge, as these end up consuming a lot of time and usually difficult to understand. Selection of your project can also be aligned with what is it that you want to learn or get your skills up in.
Given the time frame the program operates in, more than two projects is usually not feasible.
Understand what you're trying to do here: understand a codebase, solving bugs and contributing back to it. This is a very time consuming task and requires your time, dedicated focus and patience, in my opinion, two are already on higher side, and are enough.


Q: How should I go about contributing? Should I start by solving "newbie's" bugs?
Answer is a big NO. This is a very common mistake we are prone to do given the deadlines and rush. Before touching the codebase, build the app and "USE THE APP". Use the software, just like the end user, explore what all functionalities it has, what all it supports. Once you have played with app enough then trace your way back from feature to code. Go through the code. "Read" the code, comments, try to understand what it is trying to do. Now, you are ready to look for the bugs. Issues on Git repos are usually labelled. Pick the ones that suit you, take up newbie issues. Try to solve them, If you don't understand, ask for clarifications on the thread or catch up the community folks on IRC.


Q: Should I be in explicit contact with the mentor?
No need, people usually end up contacting mentors a lot telling them how much interested they are in the project. In my opinion an initial email is enough. Your interest should be reflected in terms of how active you are on the codebase and IRC. Ask questions, solve issues. If you have a good understanding of the codebase and have made enough contributions the mentor will recognize that.

Q: Are there any prerequisites?
If you are a student while taking part in Outreachy, it is preferable if you can take up the Outreachy project as your academic project. They have strict rules around the academic credits you can earn along with Outreachy, please get in touch with the required folks from Outreachy and get your issues clarified beforehand.

Q: What are the outcomes of this program?
Being Indians we are almost always concerned with this. (Pun intended)
Let me put it this way, outcomes include exposure to the world of open source, network of amazing folks, mentorship by highly skilled/renowned folks in the industry, and  generous stipend of $5500


Links:
Outreachy : https://www.outreachy.org/
My Outreachy log blog: https://outreachypb.wordpress.com/

In case you have any queries or questions that I can answer, please feel free to comment or reach out to me, I'd be glad to help.

All the best!
Keep hacking!

Love & Cheers








Thursday 21 June 2018

HIGH FIVE ~ Programming tips [C#]


Hello World!

Over my tenure as an engineer I got a lot of insights into the C# language (which BTW is Microsoft's own language :D )

This blog post is an effort to share *five* of those tips with all you folks in order to write more efficient and readable programs ! :D


If you work with C# as your main programing language or if you happen to work with any oops language, I hope these tips and tricks can prove beneficial :

1. Use XMLDocs:

These are the stubs you write over functions/classes describing what they do, usually xml docs for public members in the API is advised (especially would be important for functions we expect users to implement).

Read more about this here: XMLDocs



2. Be Lazy :

The concept of Lazy Initialization is initializing only when the object is being used for the first time in the program. This saves us a lot of time and memory and makes programs more efficient,

Lazy initialization is primarily used to improve performance, avoid wasteful computation, and reduce program memory requirements. These are the most common scenarios:


  • When you have an object that is expensive to create, and the program might not use it. For example, assume that you have in memory a Customer object that has an Orders property that contains a large array of Order objects that, to be initialized, requires a database connection. If the user never asks to display the Orders or use the data in a computation, then there is no reason to use system memory or computing cycles to create it. By using Lazy<Orders> to declare the Orders object for lazy initialization, you can avoid wasting system resources when the object is not used.


  • When you have an object that is expensive to create, and you want to defer its creation until after other expensive operations have been completed. For example, assume that your program loads several object instances when it starts, but only some of them are required immediately. You can improve the startup performance of the program by deferring initialization of the objects that are not required until the required objects have been created.
  • Apart from the performance benefits, lazy initializations are also thread safe
Read more here: Lazy Initialization

3. Use LINQ and a LOT OF LINQ:


LINQ is Language Integrated Query, these are usually inherently optimized codes that you can use to perform queries and operations on data.

Advantages of LINQ:
Familiar language: Developers don’t have to learn a new query language for each type of data source or data format.
Less coding: It reduces the amount of code to be written as compared with a more traditional approach.
Readable code: LINQ makes the code more readable so other developers can easily understand and maintain it.
Standardized way of querying multiple data sources: The same LINQ syntax can be used to query multiple data sources.
Compile time safety of queries: It provides type checking of objects at compile time.
IntelliSense Support: LINQ provides IntelliSense for generic collections.
Shaping data: You can retrieve data in different shapes.



4. Do justice to your Exceptions: 


Throwing exceptions at the right time  with the right message and variable information proves immensely beneficial rather than failing silently. Make sure you catch all the possible exceptions in any code implementation with an appropriate message that provides me values of variables or outputs of functions that would have messed up in the stack trace, so as to debug and resolve the issue better.

5. DRY your code:


DRY implies for Do not Repeat Yourself
In The Pragmatic Programmer, DRY is defined as “every piece of knowledge must have a single, unambiguous, authoritative representation within a system”.
This emphasizes on code reusability, it's better to contain a logic at one place and calling it from different places rather than have that logic re-written everywhere you want to use it. It improves code readability and saves a lot of time while refactoring or modifying the logic.
You can read more about it here: 
So these are the five tips for now. Topics like lazy initialization and LINQ needs there own separate blog posts in detail that I'd be covering hopefully soon along with more fun tips on programming in upcoming HIGH-FIVES.
Till then enjoy programming, and share in comments below if you have some really cool tips/tricks/suggestions for efficient programming, or which one is your latest learnt trick?
Would love to hear back!

Take care! Keep hacking!
See you in next post soon!
Cheers!

Sunday 15 April 2018

Duh - Saves you the trouble to correct your command


Duh.

This is no more an expression for me but a command now. Thanks to the hack I have been doing for past couple of days.

What's it about? Well, here it goes.

How many times it happens that we screw up commands on terminal?
A typo, a syntax mistake or jumbled up arguments. The command doesn't run and then we spend time retyping it ensuring everything is in place this time.
Quite time consuming, eh?




My laziness simply denied me such a behaviour. So I coded up a powershell cmdlet which can do this for me.
Now if I mess up a command, I just have to type 'Duh' and the right command will be displayed on the prompt for you to check and execute (press ENTER).

How Duh operates internally?
Well, guess what. Answer lies in the "tries".
We have a trie and we do closest match using Leveinshtein Distance.
In short, how to figure out how close two strings are?  Find the no of letters you need to remove/insert/replace in order to attain string 2 from string 1.
This is what's being used in finding the closest correct command to the input command.
Hey but wait, what are the right set of commands constituting the tries? Where do we get these from?
For now I am using two things:
1.  A list of standard git commands (with placeholders for params)
scraped from a website.
2. The history of currently opened powershell.
Why to use the history of commands on a powershell you may ask.
Here's the catch, while in a session there are a lot of commands that we use and reuse , using the latest history of powershell commands will help us correct the cases where we mess up a recently pre-used command (along with right set of params values) with ease since they exist in trie.

How to get history? Use "get-history". Lol.
But this command gives me history of all the commands typed on powershell, including the wrong ones. And we for sure don't want to put them in the trie. After all the programs is not: "We will return another wrong command in exchange with the wrong command you wrote, well just because it's chic, also beauty lies in imperfection".

There's a need to filter out these commands and pick only the right ones.
How to figure out which ones are right from the dumped output? I thought about drawing some heuristics from sample data.
Is there a pattern in execution time of failed commands? Is the time fixed? Or is it the least taken?
Stupid questions as I think of them now. There were definitely commands that were much faster to execute. Also the logic that fails the execution of any command is not the same and takes it's own time. Most importantly I believe the execution time of the same command may also differ based on how much is your CPU occupied.
So, how to filter out? Presently I am using the assumption that the wrong commands are more likely to be present in infrequent numbers. Especially because the chances of repeating a command wrongly in the same manner are really low. Well this is not very definitive and for sure there lies a good amount of drawbacks in this approach : we might lose out on many of the potentially correct commands and also, in case the wrong command is repeated again we will add it to trie and offer it as a suggestion the next time- a totally corrupted experience.
But well this is what I picked up for now.
There's one more better approach where we maintain a map of standard commands with their params and then find the closest match , this support will also help us in handling flags and new params with grace. So will be adding this up soon.

So that is what runs in the cmdlet code ( in c#).
For using this command you need to import the dll as a module and also set the alias duh for ease of usage. I have added a powershell profile.ps1 to ease that up.
We need to place that file in systems32 powershell directory (C:\Windows\System32\WindowsPowerShell\v1.0) and we should be all set.

Haah, so that's what the hack was all about.
I did it for powershell only, for now since I am using that these days but the code can be extended to operate with the other shells too, essentially the IO logic will change.
If you get interested, you can check out the source code at:
https://github.com/nextLane/Duh

There are a lot of issues and TODOs in this task, will open them up on GitHub shortly!
Feel free to drop by any comments/queries/suggestions, would be glad to get the conversation going.

On a parallel note, how this exactly differs from thefuck, a popular project on GitHub which seems to do the same:
thefuck is a "magnificent app", which is totally rule based and quite robust, I believe. It handles cases like if your git push fails, it will set an upstream and then do git push which is an additional layer of intelligence added. But it's an overkill for my specific use, it needs the rules to be explicitly coded up, secondly it takes decent amount of time evaluating the right matches and then generating the command. Duh isn't doing anything distantly resembling this, it's just a 300 lines of code not dependent on any third party libs, that gives me the closest match from a trie. The distinguishing feature is it takes into consideration the user behaviour on the terminal. If I use a set of 10-15 commands for a powershell session, I am likely to mess up one of those commands, now since I have a corresponding trie for that I can get the right one faster without coding any explicit rule to the repo. Thus the number of commands you can check is not constrained.



See yu in the nex\t postt soooon!
Duh

See you in the next post soon! :)

Till then,
Happy hacking
Adieu


A secret love message ~

Hmm, so the trick of putting up a catchy seasonal title worked out, we got you here! Folks, we will be talking about a really cool tech...