I am thankful 2020 is behind us. I am thankful my family and friends are healthy. And I am thankful for being lucky to have a job, to have the time to reflect back on the year that has gone by and to be able to learn from it.
I completed one full year at my new job. Since I joined my current role before the pandemic started, I got the opportunity to meet my team members in person. There are many new team members who joined remotely. I hope to see them in person sometime this year.
The last year has been extraordinary in terms of work. I learned new technologies and new programming languages, mentored new members and built new functionalities. It brought me out of my comfort zone and stretched my abilities. But I am grateful for being given these opportunities. The famous quote by Nelson Mandela sums up my last year’s experience very well – “It always seems impossible until it is done”. I hope to remember it in future.
I also realized the importance of taking care of your own health and the necessity of reaching out and keeping alive the friendships and ties that gives you warmth. Hope you have a good 2021. Happy new year!
A lot actually. But before I go into all that, let me tell you why I am writing this post. A friend of my partner started sending year-end emails to their friends and encouraged them to write and share their own as well. After a few years, I too became a recipient of such emails and it was enriching reading the author’s reflection on the year that passed. For one, I am always genuinely curious about peoples’ life stories and the lessons therein. And second, it urged me to think and retrospect. Somewhere I read, life is a series of wins, draws and losses. I don’t believe that. I believe life is a series of experiences that can potentially make us better persons if we choose to learn from them. So, without further ado, here’s what happened in my life this year.
I changed my job. Well, it’s more of a change of a company than a job. I am still an engineer building software. But now I am building a high performance file system service on cloud instead of an emulator for hardware chip designs. It’s difficult to decide the change of impact of my work, but I think I am now working on a product that is used by more customers than the one I was working before. The change like most changes has been mixed with discomfort and uncertainty. But it has been both exciting and fulfilling at the same time so far. Everyone, I believe, should ask these two questions about their work life – Are they learning something new everyday? Are they enjoying their work? And I have been feeling positive about both these answers. I expect, I’ll take some more time to become as impactful as I was in my past job. But that’s a journey I am excited to take and perhaps will be the subject of the next year’s post.
The other most exciting opportunity that landed on me was my talk at cppcon. I decided to try speaking at technical conferences almost four years back and I have been sending talk proposals to different conferences each year since then. And every one of them has been seeing rejection until the last one. For some reason, I never thought of (or dared to) sending a proposal to cppcon even though I have been working all my career on that very programming language the conference is all about. The fact that I didn’t stop submitting talks to conferences even after all those rejections speaks of my perseverance and strong desire for public speaking. And I believe if there’s anything at all to learn from here, it is that. I would also like to add “course correction” to that list. But to be true, many other talk submissions before this talk got rejected where I applied all that I learned is necessary to make a submission strong. And this particular talk which was accepted was in fact written hurriedly and submitted at the last minute. But something extra-ordinary (at least for me) happened this time. The conference chair wrote back and asked me to improve the talk proposal by adding more details to it. The program committee accepted after I reworked my talk submission and that’s how I got to speak at cppcon.
Some people are prone to think they don’t deserve what they got and they are a fraud. I too am guilty of entertaining such thoughts. After the initial euphoria after hearing back that my talk has been selected, I started doubting if they made a mistake by selecting my talk. But then something wonderful happened at the conference. I was invited to speak at a panel on Diversity and Inclusion in the C++ community and I shared my little inner conflict when somebody asked about imposter syndrome. The cppcon program committee chair was present in the audience and he assured me after the panel discussion that they were confident of each and every talk at the conference. I am boundlessly thankful for my experience at the conference and my interaction with the people there.
Another fulfilling achievement of this year has been at my previous job where I designed and coded a significant performance improvement. There were times when I felt alone and unsure of the outcome. But eventually the tests showed optimistic results. It’s often difficult to imagine the future and easy to get disheartened at the present. But experiences like these keep a good reminder of never losing sight of the end.
You already might have noticed a pattern in all the experiences I shared here. And it’s not particularly an earth-shattering one. If you persevere, you might reach where you want to be.
If you are a C++ programmer, I encourage you to attend the cppcon, one of the largest gatherings of the C++ community. The sessions range from the cutting edge of C++ to how to do the old C++ better. If you are using C++ to build games, write compilers or develop performance sensitive applications, you will find something useful in the conference. It doesn’t matter if you are using pre-modern C++ or are already in the modern C++ land or are thinking of pushing your team to use the modern C++ effectively, you should attend in any case.
I wanted to attend cppcon since a long time. But, it’s an expensive conference and I am yet to work at a company which provides travel grants for conference attendance. cppcon is one of the nicer conferences that provides financial support to their speakers and volunteers. Hence, I had only two options – either speak at cppcon or volunteer at cppcon. Volunteering wasn’t an option for me since I couldn’t commit the required number of days, I decided to try my luck at speaking.
I had apprehensions about my experience at the conference, but I had a wonderful time. The people I interacted with were genuinely good people. The conference saw a healthy mix of junior and senior programmers which was refreshing. A good part of the community is trying to make the C++ community more inclusive and diverse and they are willing to hear what others have to say.
Overall it was a great experience and I’d certainly like to attend it next year too. And I hope to see more people of colour, more women and more people from the LGBTQ community next year at the conference. I also hope, the conference becomes a more welcoming place for all the attendees next year.
Here’s a list of a few helpful links if you wish to attend cppcon
The above are what I did. You can additionally take part on Reddit or join the Slack channel. And, there are many more C++ conferences. Find something near where you live and make a plan to attend the next one.
I recently chased down a crash in a software running on a large test case. The fix surprisingly is a one-liner in an area of code which has been touched only four times in last 8 years. It was a fairly complex piece of code and a completely unfamiliar territory for me. There was no help available as the person who wrote it and last worked on it left the company a few years back and there’s no design document they left behind. I think this is a pretty common scenario in large products having legacy code.
Fortunately, the bug was a deterministic one and was occurring in a monolithic program which ran sequentially. The program is actually a binary compiled and linked out of code written in C++ and runs on linux.
Where was the bug seen?
The crash happened inside a function sitting somewhere in the middle of some spaghetti code and I had no clue what that function did. It took more than an hour and a half to reproduce the crash after putting some effort to shorten the duration of the run before the crash. Although it was not a very long running program (I’ve seen people debugging bugs that took days to reach), it was long enough to make it slightly more challenging and difficult to fix than other bugs. I did fix it eventually using a few strategies which I came to adopt over the years. And these strategies can also be successfully applied to solve some of the difficult bugs in unknown code bases.
Strategy No. 1. Stop thinking “I can’t solve this bug”.
This is a slippery slope and should be avoided at all cost. Most of the bugs can be solved with methodical debugging and perseverance even though it occurs in someone else’s code. And sometimes some of the non-deterministic bugs can be solved too. It may take time and it may require you to learn new things, but it’s not impossible.
If you’re working in a team, chances are you often have to fix bugs in somebody else’s code. And, trying to enjoy the process is helpful. I like to discuss it with other people who I know are interested in chatting about technical topics even though they don’t work on it directly. Also, it boosts my self-confidence that I am good at solving difficult bugs and that in turn boosts other team members’ confidence in me as well. And if nothing else, I almost always learned something new once I solved a difficult bug.
Strategy No. 2. Throw away all the assumptions.
Don’t start with an assumption on the cause of the bug. It can help if you know the code in and out. But more often than not the starting assumptions about the cause of the bug turn out to be wrong. For instance, some of assumptions about this bug were “Oh, it’s a memory error” or “Oh, no other programs using this piece of code are not crashing, hence the real bug might be in this particular program”. They were all wrong and they don’t help. Your debugging gets influenced by such assumptions and you tend to miss out other clues that present to you during debugging.
I’m prone to making such assumptions. I’ve learned not to act upon them. Instead I follow a set of steps which are mentioned in the following sections.
Strategy No. 3. Try to reproduce the crash in a shorter running program.
This is helpful if you work on a program that takes a long time for large input sizes. It’s often worth the effort to reduce the input size and check if the bug is still reproducible. And that saves a lot of debugging time. But this is not always easy nor possible right at the beginning of debugging. I keep trying it throughout my debugging so that I can save time as much as possible.
Strategy No. 4. Use the right tools.
Since the program in this case is written in C++ and runs on linux, there exists a fairly good debugging echo system. gdb from gnu and lldb from clang are both good. Mastering these tools often pay large benefits. For instance, gdb can be scripted using python and it can be extremely helpful while debugging long running programs. There are other gdb commands like checkpoint which are very useful too. Last I checked the reverse debugging in gdb wasn’t good. But I’ve heard good reviews about undoDB which is a reverse debugger. rr from Mozilla is another reverse debugger and I’m yet to try it.
For very long running programs checkpointing the program using dmtcp also helps. That way you don’t have to spend a lot of time just in waiting for the program to reach the bug. But often it takes some effort to integrate dmtcp with large code bases. But it may be worth the effort.
Sometimes, enabling log messages or putting new ones to better understand the state of the program can be much more helpful than a debugger. Because a debugger executes the program slowly and I can run the program much faster with an optimized build executable with the log messages turned on. Meaningful log messages are good tools for understanding the behaviour of the program in unknown areas of code.
In two of my most memorable debugging experiences, existing and adding new log messages helped me solve the bugs much faster than if I used only the debugger. And if some of the new log messages that I add for debugging purpose turn out to be helpful, I try to add them permanently in the code.
Strategy No. 5. Dig deeper to understand the reason of the bug.
In a well-written code base the bugs and their causes should be localized. Meaning, the behaviour due to an erroneous code shouldn’t occur somewhere far from the culprit line or lines of code. But it’s not a perfect world and is very easy to write code in C and C++ where the bug is introduced in one place and gets manifested somewhere far away. The initial debugging may show what is wrong in the code but it may take some time to figure out the why. For example in one of the bugs I saw a local variable inside a function getting corrupted even though there was nothing suspicious happening inside the function to indicate it. This was the what part of the bug – a corrupt variable. The journey to find why led me to a static buffer overflow in another function that corrupted the value of the local variable. During debugging I had added a new local variable before the corrupt local variable to count something. And that stopped the crash. But this was a hack and not a real solution. The real solution involved finding out why the local variable got corrupt in the first place. Do not try out a bunch of stuff randomly to make the bug go away.
Strategy No. 6. Understand the area of code where the bug is seen.
It may not be necessary always but more often than not it is. Spending some time and effort to understand the area of unfamiliar code where the bug is visible is helpful. This time I had to understand the functionality of the code, the individual functions in the call stack and the assumptions with which they were written. And a few days later there was another bug in the same code area and this time I could solve the bug much faster.
Strategy No. 7. Verify your fix by adding a test case.
I always make it a point to add a test case in the regression suite for the bug I fixed. Sometimes it’s possible to add a simple unit case for the bug and sometimes it may require a big system test case. But a test case in the regression suite will ensure that the same bug will not appear twice.
Strategy No. 8. Write a post-mortem report on the bug.
Or make a blog post out of it. I do it because I want to find out if I could have debugged it more efficiently. And I also like to write if I learned something new.
Why did I do it?
Many reasons. Most important of them being –
1. Outreachy has been the second most important experience in my life as a programmer(*). It’s a fantastic program. It helps the open source communities become more diverse and it opens up avenues of opportunities for the interns. I wanted to volunteer and engage with the Outreachy community.
2. The last conference related to diversity I attended was womENcourage in 2015 in Uppsala, Sweden and it was very inspiring to be able to meet other women in the field computing. Tapia conference has a focus on people of colour and people with disabilities and it was an amazing experience to be able to attend the conference.
3. My conference registration and travel costs are sponsored by Outreachy and I like talking about my Outreachy experience and encouraging others to apply for the internship. Therefore it was a complete win-win situation for me.
What are the good stuff about Tapia?
1. Diversity! It was heartening to see so many people of colour in computing.
2. The keynotes ranged from being relevant, thought-provoking to exciting research in the area of assistive technology. The keynotes by Shiri Azenkot and Moshe Vardi are definitely worth mentioning.
3. Amazing career fair with a very good mix of presence from both academia and the industry.
4. Networking opportunity with other women in the field.
What are the bad stuff about Tapia?
1. No photography policy. Some people don’t like to be photographed and a conference which celebrates diversity should be aware of it.
2. No meal option for vegans and vegetarians.
What did I get out if it?
1. I liked helping at the Outreachy booth. I got to talk to people that came to our booth, shared my experience of being an Outreachy intern and hopefully could inspire them to apply for it.
2. I liked interacting with the other Outreachy members present at the booth and getting to know them personally.
3. I learned new stuff from some of the sessions at the conference.
4. I came back recharged and inspired and hopefully that will see me through for the next few months.
Do I recommend it?
Yes! Specially if you are a woman or a person of colour or both. It will help you grow both personally and professionally.
Did I go to Disney world?
Yes! Visited Epcot. Should have planned earlier, should have enable fast pass first thing once reached Epcot, and should have stayed longer. But it was a fun day at the Disney World.
(*) The most important experience in my life as a programmer is my 12 weeks spent at the Recurse Center. Go apply if you want to get dramatically better at programming.
In the last few years, I met many women who were either on a break from their career in the software industry or were planning to take one. It’s not unusual to take a break from one’s career to take care of family, child-birth or for following other dreams and aspirations. I took 2 breaks in last 12 years. The first one was 3 years long – enough to raise a few eyebrows when I joined the industry again in a programming job. And the second one was only 3 months long and was more of an investment for being a better programmer than a break. But more on that later.
The first break
A gap of 3 years from employment in the software industry is a long one and I wasn’t doing anything remotely related to software engineering during those 3 years. I guess, this gives some amount of hope to others on a break. I can understand the intermittent feelings of insecurity that one goes through when one is away from employment since I too went through them. I hope this post helps them in some way.
After working as a programmer for 5 years in a very niche domain called Electronic Design Automation I decided to leave my job and prepare for the Indian civil services exam. It was one of the most important and difficult experiences of my life. It taught me to be focused and hard-working. And most important of all, it taught me to stand on my feet again even after an unthinkable failure.
By now, you have correctly guessed that I didn’t pass the exam. At the end of 3 years, I wasn’t quite sure if I wanted to be a programmer again. It was only through pure serendipity, I chanced upon an email asking for applications for 3-month long internship opportunities in open source projects. The program was then known as Outreach Program for Women. We used to call it OPW. Now it’s known as Outreachy. Outreachy offers paid internship opportunities to women for contributing to Free and Open Source Software projects.
I had already been a user of open source tools and operating systems for some years both at the workplace and at home. I applied for an internship with the Gnome Foundation and got accepted. You’ll find about my internship work here. Outreachy doesn’t only offer programming internships. It also offers internships in documentation and marketing among other areas.
The OPW internship experience, even though short, was one of the definitive experiences in my programming career. And I discovered that I liked programming. Towards the end of the internship, I started interviewing with a few software companies in India. One of them is a well-known professional networking company. By that time I had already made a few contributions to the geocode-glib library and I shared my code with them. But they refused to call me for an interview. The recruiter flatly told me that since programming is much like doing maths, 3 years of lack of practice must have rusted away my programming capabilities.
On hindsight, I can now probably understand what the hiring team at that company thought. Here was a woman, who didn’t code for 3 years, who didn’t have any reference in that company (I was contacted by a third party recruiter through an online job portal) and who might again go back to appearing for the civil services exam. Why take a risk with such a candidate!
I also asked a few ex-colleagues and friends who were still in the EDA industry to forward my resume to their respective companies. And that turned out to be the best thing I did. I got calls for interviews from 3 companies. I cleared the interviews with 2 of them and cancelled the interview process with the third one since I already got offers from the other 2 and I was less interested in the third company. I finally accepted the offer from Mentor Graphics. Mentor Graphics has a fairly standard interview process. It typically consists of five technical rounds in a single day with the teams that have requirements. If all goes well, the candidate is called again to chat with the manager and the HR manager to discuss work, role, and salary. At least that’s how the second part of the interview process went for me.
I prepared for the technical interviews by watching videos on algorithms on Coursera and Youtube. I practiced interview questions online. The OPW internship helped a lot because I was coding in C and EDA mostly runs on C and C++ on Linux. But the interview performances of a single day is not the only deciding factor for hiring somebody. And this I speak from my experience of being an interviewer later.
Even before a candidate comes for an interview, some impression has already been formed about him or her. It’s always a plus if the resume comes from a referral from within the company. It’s even more plus if an existing employee can vouch for the candidate. The resume is the second most important thing. It becomes even more important if somebody is coming back after a gap.
The second break
During my second break, I did a full batch at Recurse Center. Recurse Center is an educational retreat for programmers in New York. And it was way easier to get a job after I finished my batch at Recurse Center than the last time.
In short, interviewing for jobs is difficult and stressful. Joining the software industry after taking a break is difficult and stressful. But none of these are impossibly difficult. They take time and effort. Hence, if you are on a break and are thinking of joining the software industry –
* Think carefully do you really want to join a programming job? Is there anything else you would like to do? Assuming you have the privilege to explore other options you are really passionate about, I would urge you to explore them. Because once you join a job, it becomes very difficult to leave it again.
* If you want to come back to the software industry, then invest in building skills. For programmers, it’s actually easy. Start contributing to open source projects, join a coding boot camp, build a small project in an area you want to work on. If you want to improve as a programmer, then do consider joining Recurse Center. It’s an amazing place if you really like programming and want to get better at it.
* Connect with ex-colleagues and friends asking them if they have any requirements in their companies. Contact people with whom you worked. Start networking. Join local meetups. There are many online groups for women in STEM. Join them.
* Prepare for the technical interviews. There are numerous resources available both online and offline. Pick some and start practicing interview questions. Try to arrange mock interviews with friends who are interviewers in their companies.
Monday : Worked on the presentation “Life of a C Program” and presented it in the evening.
Tuesday : Paired with May to discuss the steps in “Life of a C Program” and in the process making the presentation better – bot visually and organization wise. Started reading the paper on Raft
Wednesday : Continued reading the paper on Raft. Discussed the Safety Argument section (5.4.3) of the Raft paper with others. Continued to to write the blog post on “How do the debuggers set breakpoint”.
Thursday : Understanding closure in Go.
Friday : Worked on the job profile and read about go routines.
Saturday : Worked with May and Juliano on the Sanctuary project at MD5 hackathon. Great experience learning about express, leaflet, and bower. Ended up contributing a few lines of code at the end of the day.
A breakpoint makes your program stop whenever a certain point in the program is reached.
What’s a debugger?
You can consider your debugger to be a program which forks() to create a child process and then calls execl() to load the process we want to debug. I used execl() in my code, but any of the system calls from the exec family of functions can be used.
And here’s the run_child() function which calls the execl() with the debuggee process’s executable name and path.
We see a call to ptrace() in run_child() function before calling execl(). Let’s, for the moment, not go into what ptrace() is, even though it’s very important to understand how does a debugger work. We will eventually come to it.
Now we have two processes running –
The debugger as the parent process.
And the debugee as the child process.
Let’s now try to think abstractly in a sort of hand-wavy manner what does a debugger need to do to set a breakpoint in the child process. The debugger needs the child process to stop where the breakpoint has been set. But how?
What does the debugger need to do to set a breakpoint?
Let’s examine the phrase “setting a breakpoint” a little closely. We say a process is running when the instructions of the process are being executed by the processor. Where are those instructions located? In the text/code section of the process’s virtual memory.
By setting a breakpoint we expect the debuggee process to halt at a certain point. Which means that we expect the debuggee process to stop just before executing some instruction. What can that instruction be? Well, if the user has set a breakpoint at the beginning of a function, it’s the first instruction at the start of the function. If the user has set a breakpoint at some line number in some file, the instruction is the first instruction in the series of instructions that correspond to that line.
Therefore the debugger needs to make the process halt right before executing that instruction.
In my project, I chose the instruction underlined in the following screenshot.
How can the debugger make the debuggee process halt right before executing a particular instruction?
The debugger replaces the instruction (at least part of the instruction) at the start of the debuggee process with an instruction that generates a software interrupt. Hence, when this modified instruction is executed by the processor, the SIGTRAP signal is delivered and that is all that is needed to make the process stop. I have skipped a lot of details here, but we will discover them as we go.
But let’s first discover how does the debugger modify an instruction?
The instructions reside at the process’s text section which is mapped to the virtual memory when a process is loaded. Hence to modify the instruction the debugger needs to know the address of that instruction.
How does the debugger find out the address of an instruction?
If you are compiling a C/C++ program, you pass “-g” option to the compiler to generate some extra information. Those extra information contain these mappings and they are stored in a format known as DWARF in the object file. On Linux, the DWARF format is used to store the debug information inside the ELF file. Isn’t that cool! ELF stands for Executable Linkable Format. It’s the format to represent object files, executable files and shared libraries.
What is the instruction that the debugger puts in place of the original instruction?
The debugger overrides the first byte of the location where the original instruction was with the instruction “int 3”. “int 3” has 1 byte opcode “0xcc”, hence the debugger touches only the first byte of the memory address.
What’s an “int 3” instruction? “int n” instructions generate a call to the exception handler specified by the destination operand. “int 3” generates a call to the debug exception handler. The exception handler is a part of Kernel code and it delivers the signal SIGTRAP to the process, in our example to the process we are debugging.
How and when does the debugger change an instruction of the debugee process?
And now the most exciting part of this post has arrived. Using a very powerful system call ptrace().
Let’s understand ptrace. What does ptrace() do? ptrace is a system call by which our debugger controls the execution of the process we are debugging. The debugger uses ptrace() to examine and change the memory and registers of the process that we are debugging.
If you look at the code here, before execl()’ing the debuggee process we called ptrace() with PTRACE_TRACEME which indicates that this process is to be traced by its parent process i.e the debugger.
The man page for ptrace mentions this –
If the PTRACE_O_TRACEEXEC option is not in effect, all successful calls to execl(2) by the traced process will cause it to be sent a SIGTRAP signal, giving the parent a chance to gain control before the new program begins execution.
Which simply means that the debugger gets notified via the wait()/waitpid() system call just before the debuggee process starts. And there the debugger has its golden chance to modify the text/code section of the process being debugged.
It also means that while being traced the debuggee process will stop each time a signal is being delivered and the tracer (the debugger) will be notified at its next call of waitpid(). Hence, when the signal SIGTRAP is delivered to the debugee process, it will stop and the debugger will be notified, which is exactly what we want. We want the debuggee process to stop and the debugger get notified when the debuggee process executes the “int 3” instruction. The debugger gets the information about the cause of the stopping of the debuggee process from the status returned by waitpid().
The default behaviour of SIGTRAP is to dump core and abort the process but we can not debug a process if it’s killed. Can we? Hence the debugger ignores the SIGTRAP signal and causes the debuggee process to continue.
Here’s the code to set “int 3” instruction at the first byte of the address that contained the original instruction. As you can see, the same old ptrace() function is used to first get the original instruction at the address so that we save the original instruction to restore it later. And then the same old ptrace() function is used with a different flag PTRACE_POKETEXT to set the new instruction which has “int 3” at its first byte.
What does the debugger do now that the “breakpoint has been hit”?
First, the debugger needs to restore the original instruction at the address where the breakpoint has been set.
Second, once it has been restored that original instruction should be executed once the debugger lets the debuggee process restart its execution.
How does the debugger restore the original instruction? Just the way the debugger sets “int 3” to set a breakpoint. Here’s the code. While setting the breakpoint we saved the original instruction, now all we need to do is to set it back at the given address.
How is the original instruction in the debuggee process then executed?
The program counter of the debuggee now points to the next instruction now that it has already executed the instruction at the address where we had out “int 3”.
To make the processor execute the original instruction in the debuggee process, we need to set the value of the program counter – %eip (in case of x86 machines) or %rip (in case of x86 64 machines) of the debuggee process to the address again.
And how can we set the instruction pointer of the debuggee process?
Using ptrace()! ptrace() has this super awesome capability of letting us “change the tracee’s memory and registers.” PTRACE_GETREGS makes ptrace copy the general purpose registers of the debuggee process into a struct. And PTRACE_SETREGS modifies the debuggee process’s general purpose registers. Here’s the code that does that
Once the debugger restored the debuggee process’s program counter it let’s the process continue and the way to do that is following –