July 24, 2018

Methodology

This analysis is based on data from a nationally representative survey of 4,573 adults, conducted online Aug. 8-21 and Sept. 14-28, 2017, using Pew Research Center’s American Trends Panel. Respondents were randomly assigned to answer one of four open-ended questions:

Respondents could list up to three traits or characteristics, which generated a total of 14,143 words. We then used Levenshtein distance and cosine similarity measures to identify and group words that started with the same letters but were not identical, like “attractive” and “attractiveness” and “honest” and “honesty.” Every word was also filtered through a linguistic database called WordNet to find known variations (called “synsets”) such as “honor” and “honorable.” These tools helped us compile a list of 3,685 pairs of words, which we then reviewed individually to determine whether each pair of words should be collapsed or not.

Additionally, we identified 40 common multiword phrases that we included in addition to the individual words that compose them – but only “multitasking” was used frequently enough to make it into our analysis. We also developed a list of 32 negation indicators – words, phrases or prefixes that invert the meaning of the word they precede, like “not,” “lack of,” “un-” and “dis-”. We used these patterns to identify and remove negated forms of words from our analysis. There were a handful of exceptions to these rules – such as “understanding” – that were preserved as-is. Finally, we removed a number of “stopwords” comprised of a standard set of English words that hold little meaning on their own – words like “the,” “at,” and “is” – as well as a set of additional terms that were either reiterations of words found in the prompt (“men,” “women,” “trait”) or words that didn’t represent meaningful traits (“overly,” “personally”).

We ended up with 1,586 unique words, which we classified as “positive” when used as an answer to the questions about traits and characteristics that people in our society value most in men and in women, and “negative” when used as an answer to the questions about traits and characteristics people in our society believe men and women should not have. We compared the number of times each word was used in a positive and negative way and plotted them along a continuum, from words that were used in a negative way 100% of the time to words that were used in a positive way 100% of the time, with the midpoint representing words that were used equally in both positive and negative contexts. We then filtered the words down to those that met any of the following criteria:

Finally, we used a series of logistic regressions to confirm that the differences observed were statistically significant for each of these sets of words. These regressions modeled the likelihood that a respondent would use each of these words depending on the type of prompt they received, represented by two independent variables: a flag indicating whether the prompt was about men or women, and another indicating whether the prompt asked for positive or negative traits. For the first set of words, we found that each word was significantly more likely to be used for one gender over the other (p ≤ 0.05). For the second set of words, we found that each word was not significantly more likely to be used in a positive or negative context. And for the third set of words, we included an interaction between the two types of prompts as a third independent variable. Some of the positive/negative gender differences in this set turned out to be statistically insignificant, so we excluded those words.

We were left with 10 words with significant positive/negative gender differences, four words with roughly equal positive/negative usage for one gender, 10 words that were used almost exclusively to describe women, and 14 words that were used almost exclusively to describe men, which we filtered down to the top 10 by frequency for space considerations. In addition to showing plots for each of these sets, we also introduce our analysis with a handpicked selection of words designed to illustrate a variety of patterns, including words from each of these sets and a couple of the most-used words overall (“honest” and “lazy”).

For more, see the complete data essay.