How a leading Microsoft engineer extends culture to service resiliency

Credit to Author: Emma Jones| Date: Wed, 23 Mar 2022 16:00:00 +0000

It’s hard to underestimate the impact that people can have on us in our formative years. Huiwen Ru, who spent several years working in identity and access management and is now a Principal Software Engineering Manager on the Singularity team at Microsoft, is a living example of how important mentorship and allyship are to the future of our industry. Young people who have unique and extraordinary talents don’t always get the inspiration and support they need to develop them, but stories like Huiwen’s give me hope. From an early age, Huiwen loved math. With encouragement from her family, teachers, and friends, Huiwen channeled her love for math into an amazing education and a trailblazing career at Microsoft.

In some ways our stories are parallel. We both emigrated from China to study computer science in the United States and joined Microsoft full-time to work on technology that was just getting started—I worked on Remote Desktop while she worked on real-time communications. We both got to experience what it’s like to build a business over many years and then transition our skills to a very different area. Until recently, Huiwen led a group working on one of the most critical aspects of our service: platform resilience. She shares her expertise and experience with the next generation by mentoring them in math.

Huiwen’s interview with Nadim took place before she moved to her new role. It has been edited for clarity and length. We’ve included some video snippets so you can learn more about her personal journey, the work she did for Microsoft identity and access management, and why she finds being a mentor so fulfilling.

Nadim: Huiwen, I’m very pleased to share your experiences of getting into computer science, getting into the industry, and the work you do at Microsoft. What first got you into computer science?

Huiwen: When I was a little girl, I had always been good at math. In both middle and high school, I really enjoyed participating in math competitions. When I applied for college, since math was my best subject, I thought, “I’m just going to study math.” But then my brother said, “No, math is too boring and it’s too hard. Look at how many girls study math. It’s not a great path for you, and other new fields are booming. You should try computer science.” I listened to him, and I’ve never regretted it.

Nadim: What was the first programming language you learned that showed you how much you liked development and coding?

Huiwen: I learned BASIC in high school. Then I entered Tsinghua University, which was ranked both number one in engineering and in computer science in China. The first programming language we learned was Pascal.

Nadim: Cool. So, you went to the number one school, did computer science, and you liked it. What was your journey from there to Microsoft?

Huiwen: At that time, the top students in China would come to the United States for advanced study after graduating. I worked for Motorola China for a couple of years first. Then I came to the University of North Carolina at Chapel Hill for my PhD degree. The job market was so good that instead of doing my PhD I started working at a company in Newport Beach, California. But then a college classmate from Tsinghua University who had joined Microsoft submitted my resume. That’s how I came to Microsoft.

Nadim: And you worked on a number of products before you got to Azure Active Directory (Azure AD)?

Huiwen: I joined Windows networking 22 years ago in 1999 and soon transferred to Office real-time communications. That team merged into Windows networking, which also had a real-time communications group. I think it was called the Office Communications Server, which evolved into Lync Server. Today, it’s Skype services. I was in this group for 15 or 16 years.

When I joined, the product was almost starting from zero. It was like a startup. Back then people relied heavily on email, but people with insight saw the importance of real-time communication over chat, as well as video and audio for meetings and collaboration integrated with your presence, status availability, and all of that. This was the future of communication. So, from version one to version two, through many different milestones, we quickly evolved into a billion-dollar business. I stayed in this team for a very long time, but though it’s just one team, the experience was pretty rich because we grew from a very small business into a very large one, from an on-premises service that shipped once every two to three years to an online service.

Nadim: It’s an interesting journey and certainly one that speaks to the variety of experiences that are possible even in one space, because the space itself evolves so much. You grew and developed a whole set of skills, including transitioning from on-premises software to cloud. You now work on one of the world’s largest services and certainly the world’s largest commercial identity system, Azure AD. Tell us about your role.

Huiwen:
I came over three years ago. I was working in cloud services as part of Office 365. It was all bare metal machines with 32 cores, but the deployment and everything was super slow. So, I wanted to get a real taste of Azure, where things are fast and there are virtual machines. And that’s why I landed here.

I saw the job posted on the career website looking for the skills I had, so I applied. I was very fortunate to land a job working on a service called evolved security token service (ESTS), which is a token service for authentication security. It’s one of the most critical services for identity, and there are a lot of interesting problems to solve! I own the fundamentals area, which can be pretty broad. It covers performance, cost of goods sold (COGS), and also some key architectural migration. Basically, my team is in charge of how we run the service effectively with high reliability and at a low cost. This includes the tooling, frameworks, and pipelines.

Nadim: You were one of the people who led a fundamental restructuring of this service to improve its reliability. Could you tell us about the work you did on cell-based architecture? First of all, what is cell-based architecture and why is it so important?

Huiwen: Before this architecture–at least for ESTS, which is one of the largest identity services—we had over 10,000 nodes worldwide on any given day. And these nodes were separated into about 12 regions in three major geographies. Some larger regions had 2,000 nodes and some smaller regions had maybe 600 nodes. A customer’s request could hit any of the nodes in a particular region. This is a very coarse-grained isolation of the service. Now, if a misbehaving application or some data corruption on the backend causes a retry storm in a tenant, you’ll suddenly have millions of requests coming at you, which can destroy your entire capacity in that region. Before I joined, some of our largest tenants were hit by this issue.

With the cell-based architecture, we try to divide tenants into smaller cells, so that each tenant is only handled in one cell. If a tenant has a misbehaving app, then at worst it impacts co-tenants in the same cell while the other cells stay intact. So far, we have divided all the tenants into over 100 cells. This is a very significant improvement in our reliability and resilience.

Nadim: No more than 2 percent of users in our system are in any one cell. This is a unique capability, given the scale we run at, and it’s an example of the innovation that we’re continuing to drive. So, thank you for your leadership on that project and many others like it.

Switching topics, I heard you mentor and coach people even outside of work.

Huiwen: I had been a mentor with my previous team, in some cases for female employees and in others for my fellow Chinese employees. They have had quite good career growth—some are now managers or are going into senior or principal levels.

Then I started coaching math students. It started with 10 kids, most of them girls. It grew to 20 to 30 kids from my son’s school and other schools in the same school district. They formed math clubs and went to math competitions. This lasted for four years. I feel very lucky that I’m a Microsoft employee because we did the weekend classes in Microsoft buildings. We used Microsoft conference rooms with very nice large whiteboards. The kids all liked to have classes at Microsoft. It was really fun.

Nadim: That’s wonderful.

Huiwen: And I have more good news to share. This past summer, when I met with some of my students, they told me they started a math workshop for younger kids. One student used the materials I used when I taught her in my math class. I found this really rewarding.

I do feel kind of obligated to help the people who need help, especially back in my home country. I have given time and money for many years to an organization in China that helps kids in rural areas finish their education, sends them from high school through college, and provides guidance to the direction of their career or answers their questions about what to do in college to prepare themselves for their careers.

Nadim: What gets you excited to come to work every day?

Huiwen: I’d say it’s the impact we have on people around the world through the product we deliver. The work is really, really critical. Even my son signs in through our service to do his schoolwork in Microsoft Teams. This sense of impact and its importance is really rewarding. I’m also a first-tier manager. I see how working with junior team members as their mentor or coach influences their early careers. The impact I have on their career growth is very, very important to me.

Nadim: That’s very near and dear to my heart as well, including the criticality of what we work on and the responsibility we have to our customers. Thanks for sharing your story.

Huiwen: Thank you, Nadim. I’m very honored to have the opportunity.

Learn more

Learn more about Microsoft identity and access management.

To learn more about Microsoft Security solutions, visit our website. Bookmark the Security blog to keep up with our expert coverage on security matters. Also, follow us at @MSFTSecurity for the latest news and updates on cybersecurity.

The post How a leading Microsoft engineer extends culture to service resiliency appeared first on Microsoft Security Blog.

https://blogs.technet.microsoft.com/mmpc/feed/