Skip to content

refactor(codegen): print string literals containing lone surrogates without reference to raw #10044

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

overlookmotel
Copy link
Contributor

@overlookmotel overlookmotel commented Mar 25, 2025

#10041 changed how lone surrogates are handled in StringLiterals.

StringLiterals which include lone surrogates now have the lone_surrogates flag set, and value encodes lone surrogates as \u{FFFD}XXXX, where XXXX is the code unit encoded as hex.

Codegen check the lone_surrogates flag and decode the lone surrogates if they're present. This means that:

  1. A StringLiteral no longer needs to have raw field populated, so you can (if you choose to for some reason) create a new StringLiteral containing lone surrogates.

  2. StringLiterals containing lone surrogates now have any other characters escaped same as how StringLiterals without lone surrogates are printed.

Copy link
Contributor Author

overlookmotel commented Mar 25, 2025


How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

  • 0-merge - adds this PR to the back of the merge queue
  • hotfix - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

Copy link

codspeed-hq bot commented Mar 25, 2025

CodSpeed Instrumentation Performance Report

Merging #10044 will not alter performance

Comparing 03-26-refactor_codegen_print_string_literals_containing_lone_surrogates_without_reference_to_raw_ (719742b) with main (d8e49a1)

Summary

✅ 33 untouched benchmarks

@overlookmotel overlookmotel force-pushed the 03-26-refactor_codegen_print_string_literals_containing_lone_surrogates_without_reference_to_raw_ branch from 93d7896 to 687f8bf Compare March 27, 2025 07:28
@overlookmotel overlookmotel force-pushed the 03-25-fix_parser_store_lone_surrogates_as_escape_sequence branch from dc171cd to 025be46 Compare March 27, 2025 07:28
@overlookmotel overlookmotel marked this pull request as ready for review March 27, 2025 07:36
@overlookmotel overlookmotel force-pushed the 03-25-fix_parser_store_lone_surrogates_as_escape_sequence branch from 025be46 to 22f9406 Compare March 27, 2025 07:44
@overlookmotel overlookmotel force-pushed the 03-26-refactor_codegen_print_string_literals_containing_lone_surrogates_without_reference_to_raw_ branch from 687f8bf to 61beb43 Compare March 27, 2025 07:44
@graphite-app graphite-app bot added the 0-merge Merge with Graphite Merge Queue label Mar 28, 2025
@graphite-app graphite-app bot force-pushed the 03-25-fix_parser_store_lone_surrogates_as_escape_sequence branch 2 times, most recently from 69f5a24 to 015a0a1 Compare March 28, 2025 05:09
@graphite-app graphite-app bot force-pushed the 03-26-refactor_codegen_print_string_literals_containing_lone_surrogates_without_reference_to_raw_ branch from 61beb43 to e3a2d8a Compare March 28, 2025 05:09
@overlookmotel overlookmotel force-pushed the 03-25-fix_parser_store_lone_surrogates_as_escape_sequence branch from 015a0a1 to 1162843 Compare March 28, 2025 10:30
@overlookmotel overlookmotel force-pushed the 03-26-refactor_codegen_print_string_literals_containing_lone_surrogates_without_reference_to_raw_ branch from e3a2d8a to 5cc0884 Compare March 28, 2025 10:30
@overlookmotel overlookmotel force-pushed the 03-26-refactor_codegen_print_string_literals_containing_lone_surrogates_without_reference_to_raw_ branch from 5cc0884 to ce4b22b Compare March 28, 2025 12:07
@overlookmotel overlookmotel force-pushed the 03-25-fix_parser_store_lone_surrogates_as_escape_sequence branch 2 times, most recently from 0aa89a2 to d822f65 Compare March 28, 2025 12:59
@overlookmotel overlookmotel force-pushed the 03-26-refactor_codegen_print_string_literals_containing_lone_surrogates_without_reference_to_raw_ branch from ce4b22b to 2e54383 Compare March 28, 2025 12:59
Copy link

graphite-app bot commented Mar 29, 2025

Merge activity

…ithout reference to `raw` (#10044)

#10041 changed how lone surrogates are handled in `StringLiteral`s.

`StringLiteral`s which include lone surrogates now have the `lone_surrogates` flag set, and `value` encodes lone surrogates as `\u{FFFD}XXXX`, where `XXXX` is the code unit encoded as hex.

Codegen check the `lone_surrogates` flag and decode the lone surrogates if they're present. This means that:

1. A `StringLiteral` no longer needs to have `raw` field populated, so you can (if you choose to for some reason) create a new `StringLiteral` containing lone surrogates.

2. `StringLiteral`s containing lone surrogates now have any other characters escaped same as how `StringLiteral`s without lone surrogates are printed.
@graphite-app graphite-app bot force-pushed the 03-25-fix_parser_store_lone_surrogates_as_escape_sequence branch from d822f65 to f0e1510 Compare March 29, 2025 12:48
@graphite-app graphite-app bot force-pushed the 03-26-refactor_codegen_print_string_literals_containing_lone_surrogates_without_reference_to_raw_ branch from 2e54383 to 719742b Compare March 29, 2025 12:49
@graphite-app graphite-app bot removed the 0-merge Merge with Graphite Merge Queue label Mar 29, 2025
Base automatically changed from 03-25-fix_parser_store_lone_surrogates_as_escape_sequence to main March 29, 2025 13:04
@graphite-app graphite-app bot merged commit 719742b into main Mar 29, 2025
26 checks passed
@graphite-app graphite-app bot deleted the 03-26-refactor_codegen_print_string_literals_containing_lone_surrogates_without_reference_to_raw_ branch March 29, 2025 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area - Code Generation C-cleanup Category - technical debt or refactoring. Solution not expected to change behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant